使用自定義 C++ 類擴充套件 PyTorch#
本教程介紹了一個將 C++ 類繫結到 PyTorch 的 API。該 API 與 pybind11 非常相似,如果您熟悉該系統,大多數概念都將適用。
在 C++ 中實現和繫結類#
在本教程中,我們將定義一個簡單的 C++ 類,該類在一個成員變數中維護持久狀態。
// This header is all you need to do the C++ portions of this
// tutorial
#include <torch/script.h>
// This header is what defines the custom class registration
// behavior specifically. script.h already includes this, but
// we include it here so you know it exists in case you want
// to look at the API or implementation.
#include <torch/custom_class.h>
#include <string>
#include <vector>
template <class T>
struct MyStackClass : torch::CustomClassHolder {
std::vector<T> stack_;
MyStackClass(std::vector<T> init) : stack_(init.begin(), init.end()) {}
void push(T x) {
stack_.push_back(x);
}
T pop() {
auto val = stack_.back();
stack_.pop_back();
return val;
}
c10::intrusive_ptr<MyStackClass> clone() const {
return c10::make_intrusive<MyStackClass>(stack_);
}
void merge(const c10::intrusive_ptr<MyStackClass>& c) {
for (auto& elem : c->stack_) {
push(elem);
}
}
};
有幾點需要注意
要透過自定義類擴充套件 PyTorch,您需要包含
torch/custom_class.h標頭檔案。請注意,每當我們處理自定義類的例項時,我們都是透過
c10::intrusive_ptr<>的例項來操作的。您可以將intrusive_ptr視為智慧指標,類似於std::shared_ptr,但引用計數直接儲存在物件本身中,而不是像std::shared_ptr那樣儲存在單獨的元資料塊中。torch::Tensor內部使用相同的指標型別;自定義類也必須使用此指標型別,以便我們能夠一致地管理不同的物件型別。第二點需要注意是,使用者定義的類必須繼承自
torch::CustomClassHolder。這確保了自定義類有空間來儲存引用計數。
現在讓我們看看如何使這個類對 PyTorch 可見,這個過程稱為繫結類。
// Notice a few things:
// - We pass the class to be registered as a template parameter to
// `torch::class_`. In this instance, we've passed the
// specialization of the MyStackClass class ``MyStackClass<std::string>``.
// In general, you cannot register a non-specialized template
// class. For non-templated classes, you can just pass the
// class name directly as the template parameter.
// - The arguments passed to the constructor make up the "qualified name"
// of the class. In this case, the registered class will appear in
// Python and C++ as `torch.classes.my_classes.MyStackClass`. We call
// the first argument the "namespace" and the second argument the
// actual class name.
TORCH_LIBRARY(my_classes, m) {
m.class_<MyStackClass<std::string>>("MyStackClass")
// The following line registers the contructor of our MyStackClass
// class that takes a single `std::vector<std::string>` argument,
// i.e. it exposes the C++ method `MyStackClass(std::vector<T> init)`.
// Currently, we do not support registering overloaded
// constructors, so for now you can only `def()` one instance of
// `torch::init`.
.def(torch::init<std::vector<std::string>>())
// The next line registers a stateless (i.e. no captures) C++ lambda
// function as a method. Note that a lambda function must take a
// `c10::intrusive_ptr<YourClass>` (or some const/ref version of that)
// as the first argument. Other arguments can be whatever you want.
.def("top", [](const c10::intrusive_ptr<MyStackClass<std::string>>& self) {
return self->stack_.back();
})
// The following four lines expose methods of the MyStackClass<std::string>
// class as-is. `torch::class_` will automatically examine the
// argument and return types of the passed-in method pointers and
// expose these to Python and TorchScript accordingly. Finally, notice
// that we must take the *address* of the fully-qualified method name,
// i.e. use the unary `&` operator, due to C++ typing rules.
.def("push", &MyStackClass<std::string>::push)
.def("pop", &MyStackClass<std::string>::pop)
.def("clone", &MyStackClass<std::string>::clone)
.def("merge", &MyStackClass<std::string>::merge)
;
}
使用 CMake 將示例構建為 C++ 專案#
現在,我們將使用 CMake 構建系統來構建上述 C++ 程式碼。首先,將我們到目前為止介紹的所有 C++ 程式碼放入一個名為 class.cpp 的檔案中。然後,編寫一個簡單的 CMakeLists.txt 檔案並將其放在同一目錄下。以下是 CMakeLists.txt 的樣子:
cmake_minimum_required(VERSION 3.1 FATAL_ERROR)
project(custom_class)
find_package(Torch REQUIRED)
# Define our library target
add_library(custom_class SHARED class.cpp)
set(CMAKE_CXX_STANDARD 14)
# Link against LibTorch
target_link_libraries(custom_class "${TORCH_LIBRARIES}")
另外,建立一個 build 目錄。您的檔案樹應該如下所示:
custom_class_project/
class.cpp
CMakeLists.txt
build/
繼續呼叫 cmake 然後 make 來構建專案。
$ cd build
$ cmake -DCMAKE_PREFIX_PATH="$(python -c 'import torch.utils; print(torch.utils.cmake_prefix_path)')" ..
-- The C compiler identification is GNU 7.3.1
-- The CXX compiler identification is GNU 7.3.1
-- Check for working C compiler: /opt/rh/devtoolset-7/root/usr/bin/cc
-- Check for working C compiler: /opt/rh/devtoolset-7/root/usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /opt/rh/devtoolset-7/root/usr/bin/c++
-- Check for working CXX compiler: /opt/rh/devtoolset-7/root/usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found torch: /torchbind_tutorial/libtorch/lib/libtorch.so
-- Configuring done
-- Generating done
-- Build files have been written to: /torchbind_tutorial/build
$ make -j
Scanning dependencies of target custom_class
[ 50%] Building CXX object CMakeFiles/custom_class.dir/class.cpp.o
[100%] Linking CXX shared library libcustom_class.so
[100%] Built target custom_class
您會發現在 build 目錄中有一個(除其他內容外)動態庫檔案。在 Linux 上,它可能名為 libcustom_class.so。因此,檔案樹應該看起來像:
custom_class_project/
class.cpp
CMakeLists.txt
build/
libcustom_class.so
從 Python 使用 C++ 類#
現在我們已經將類及其註冊編譯成一個 .so 檔案,我們可以將該 .so 檔案載入到 Python 中並進行嘗試。這是一個演示指令碼:
import torch
# `torch.classes.load_library()` allows you to pass the path to your .so file
# to load it in and make the custom C++ classes available to both Python and
# TorchScript
torch.classes.load_library("build/libcustom_class.so")
# You can query the loaded libraries like this:
print(torch.classes.loaded_libraries)
# prints {'/custom_class_project/build/libcustom_class.so'}
# We can find and instantiate our custom C++ class in python by using the
# `torch.classes` namespace:
#
# This instantiation will invoke the MyStackClass(std::vector<T> init)
# constructor we registered earlier
s = torch.classes.my_classes.MyStackClass(["foo", "bar"])
# We can call methods in Python
s.push("pushed")
assert s.pop() == "pushed"
# Test custom operator
s.push("pushed")
torch.ops.my_classes.manipulate_instance(s) # acting as s.pop()
assert s.top() == "bar"
# Returning and passing instances of custom classes works as you'd expect
s2 = s.clone()
s.merge(s2)
for expected in ["bar", "foo", "bar", "foo"]:
assert s.pop() == expected
# We can also use the class in TorchScript
# For now, we need to assign the class's type to a local in order to
# annotate the type on the TorchScript function. This may change
# in the future.
MyStackClass = torch.classes.my_classes.MyStackClass
@torch.jit.script
def do_stacks(s: MyStackClass): # We can pass a custom class instance
# We can instantiate the class
s2 = torch.classes.my_classes.MyStackClass(["hi", "mom"])
s2.merge(s) # We can call a method on the class
# We can also return instances of the class
# from TorchScript function/methods
return s2.clone(), s2.top()
stack, top = do_stacks(torch.classes.my_classes.MyStackClass(["wow"]))
assert top == "wow"
for expected in ["wow", "mom", "hi"]:
assert stack.pop() == expected
為自定義 C++ 類定義序列化/反序列化方法#
如果您嘗試儲存一個 ScriptModule,其中一個自定義繫結的 C++ 類作為屬性,您將收到以下錯誤:
# export_attr.py
import torch
torch.classes.load_library('build/libcustom_class.so')
class Foo(torch.nn.Module):
def __init__(self):
super().__init__()
self.stack = torch.classes.my_classes.MyStackClass(["just", "testing"])
def forward(self, s: str) -> str:
return self.stack.pop() + s
scripted_foo = torch.jit.script(Foo())
scripted_foo.save('foo.pt')
loaded = torch.jit.load('foo.pt')
print(loaded.stack.pop())
$ python export_attr.py
RuntimeError: Cannot serialize custom bound C++ class __torch__.torch.classes.my_classes.MyStackClass. Please define serialization methods via def_pickle for this class. (pushIValueImpl at ../torch/csrc/jit/pickler.cpp:128)
這是因為 PyTorch 無法自動確定您要從 C++ 類中儲存哪些資訊。您必須手動指定。方法是使用 class_ 上的特殊 def_pickle 方法在類上定義 __getstate__ 和 __setstate__ 方法。
注意
__getstate__ 和 __setstate__ 的語義等同於 Python pickle 模組的語義。您可以 閱讀更多關於我們如何使用這些方法的資訊。
以下是我們可以在 MyStackClass 的註冊中新增的 def_pickle 呼叫的示例,以包含序列化方法:
// class_<>::def_pickle allows you to define the serialization
// and deserialization methods for your C++ class.
// Currently, we only support passing stateless lambda functions
// as arguments to def_pickle
.def_pickle(
// __getstate__
// This function defines what data structure should be produced
// when we serialize an instance of this class. The function
// must take a single `self` argument, which is an intrusive_ptr
// to the instance of the object. The function can return
// any type that is supported as a return value of the TorchScript
// custom operator API. In this instance, we've chosen to return
// a std::vector<std::string> as the salient data to preserve
// from the class.
[](const c10::intrusive_ptr<MyStackClass<std::string>>& self)
-> std::vector<std::string> {
return self->stack_;
},
// __setstate__
// This function defines how to create a new instance of the C++
// class when we are deserializing. The function must take a
// single argument of the same type as the return value of
// `__getstate__`. The function must return an intrusive_ptr
// to a new instance of the C++ class, initialized however
// you would like given the serialized state.
[](std::vector<std::string> state)
-> c10::intrusive_ptr<MyStackClass<std::string>> {
// A convenient way to instantiate an object and get an
// intrusive_ptr to it is via `make_intrusive`. We use
// that here to allocate an instance of MyStackClass<std::string>
// and call the single-argument std::vector<std::string>
// constructor with the serialized state.
return c10::make_intrusive<MyStackClass<std::string>>(std::move(state));
});
注意
我們在 pickle API 中採取了與 pybind11 不同的方法。pybind11 有一個特殊的函式 pybind11::pickle(),您將其傳遞給 class_::def(),而我們有一個單獨的 def_pickle 方法用於此目的。這是因為 torch::jit::pickle 這個名字已經被佔用了,我們不想引起混淆。
一旦我們這樣定義了(反)序列化行為,我們的指令碼現在就可以成功運行了。
$ python ../export_attr.py
testing
定義接受或返回繫結 C++ 類的自定義運算子#
一旦您定義了自定義 C++ 類,您還可以將該類用作自定義運算子(即自由函式)的引數或返回值。假設您有以下自由函式:
c10::intrusive_ptr<MyStackClass<std::string>> manipulate_instance(const c10::intrusive_ptr<MyStackClass<std::string>>& instance) {
instance->pop();
return instance;
}
您可以透過在 TORCH_LIBRARY 塊中執行以下程式碼來註冊它:
m.def(
"manipulate_instance(__torch__.torch.classes.my_classes.MyStackClass x) -> __torch__.torch.classes.my_classes.MyStackClass Y",
manipulate_instance
);
完成此操作後,您可以像下面的示例一樣使用該運算子:
class TryCustomOp(torch.nn.Module):
def __init__(self):
super(TryCustomOp, self).__init__()
self.f = torch.classes.my_classes.MyStackClass(["foo", "bar"])
def forward(self):
return torch.ops.my_classes.manipulate_instance(self.f)
注意
接受 C++ 類作為引數的運算子的註冊要求該自定義類已經註冊。您可以透過確保自定義類註冊和您的自由函式定義在同一個 TORCH_LIBRARY 塊中,並且自定義類註冊在前來強制執行此操作。將來,我們可能會放寬此要求,以便可以按任何順序註冊這些。
結論#
本教程介紹瞭如何將 C++ 類公開給 PyTorch,如何註冊其方法,如何從 Python 使用該類,以及如何使用該類儲存和載入程式碼並在獨立的 C++ 程序中執行該程式碼。您現在可以擴充套件您的 PyTorch 模型,使用與第三方 C++ 庫互動的 C++ 類,或實現任何其他需要平滑融合 Python 和 C++ 之間界限的用例。
一如既往,如果您遇到任何問題或有疑問,可以使用我們的 論壇 或 GitHub issues 進行聯絡。此外,我們的 常見問題解答 (FAQ) 頁面 可能包含有用的資訊。