AOTInductor：用於 Torch.Exported 模型的即時編譯#

創建於：2025年6月13日 | 最後更新於：2025年8月14日

警告

AOTInductor 及其相關功能處於原型階段，可能會發生向後相容性破壞性更改。

AOTInductor 是 TorchInductor 的一個專用版本，旨在處理匯出的 PyTorch 模型，對其進行最佳化，並生成共享庫以及其他相關構件。這些編譯後的構件專門用於在非 Python 環境中部署，這些環境常用於伺服器端的推理部署。

在本教程中，您將深入瞭解如何獲取 PyTorch 模型、匯出它、將其編譯成構件，以及使用 C++ 進行模型預測。

模型編譯#

要使用 AOTInductor 編譯模型，我們首先需要使用 torch.export.export() 將給定的 PyTorch 模型捕獲到計算圖中。torch.export 提供了健全性保證和對捕獲的 IR 的嚴格規範，AOTInductor 依賴於此。

然後，我們將使用 torch._inductor.aoti_compile_and_package() 使用 TorchInductor 編譯匯出的程式，並將編譯後的構件儲存到一個包中。該包的格式遵循 PT2 Archive Spec。

注意

如果您的機器上有支援 CUDA 的裝置，並且您安裝了支援 CUDA 的 PyTorch，下面的程式碼將把模型編譯成用於 CUDA 執行的共享庫。否則，編譯後的構件將在 CPU 上執行。為了在 CPU 推理期間獲得更好的效能，建議透過設定 export TORCHINDUCTOR_FREEZING=1 來啟用凍結，然後再執行下面的 Python 指令碼。在帶有 Intel® GPU 的環境中，同樣的行為也適用。

import os
import torch

class Model(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = torch.nn.Linear(10, 16)
        self.relu = torch.nn.ReLU()
        self.fc2 = torch.nn.Linear(16, 1)
        self.sigmoid = torch.nn.Sigmoid()

    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        x = self.sigmoid(x)
        return x

with torch.no_grad():
    device = "cuda" if torch.cuda.is_available() else "cpu"
    model = Model().to(device=device)
    example_inputs=(torch.randn(8, 10, device=device),)
    batch_dim = torch.export.Dim("batch", min=1, max=1024)
    # [Optional] Specify the first dimension of the input x as dynamic.
    exported = torch.export.export(model, example_inputs, dynamic_shapes={"x": {0: batch_dim}})
    # [Note] In this example we directly feed the exported module to aoti_compile_and_package.
    # Depending on your use case, e.g. if your training platform and inference platform
    # are different, you may choose to save the exported model using torch.export.save and
    # then load it back using torch.export.load on your inference platform to run AOT compilation.
    output_path = torch._inductor.aoti_compile_and_package(
        exported,
        # [Optional] Specify the generated shared library path. If not specified,
        # the generated artifact is stored in your system temp directory.
        package_path=os.path.join(os.getcwd(), "model.pt2"),
    )

在這個示例中，Dim 引數用於將輸入變數“x”的第一個維度指定為動態。值得注意的是，編譯後的庫的路徑和名稱未指定，導致共享庫儲存在臨時目錄中。為了從 C++ 端訪問此路徑，我們將其儲存到檔案中，以便在 C++ 程式碼中稍後檢索。

Python 中的推理#

部署編譯後的構件進行推理有多種方法，其中一種是使用 Python。我們在 Python 中提供了一個方便的實用 API torch._inductor.aoti_load_package() 來載入和執行構件，如下例所示：

import os
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"
model = torch._inductor.aoti_load_package(os.path.join(os.getcwd(), "model.pt2"))
print(model(torch.randn(8, 10, device=device)))

推理時輸入的大小、dtype 和 stride 應與匯出時輸入的大小、dtype 和 stride 相同。

C++ 中的推理#

接下來，我們使用以下示例 C++ 檔案 inference.cpp 來載入編譯後的構件，從而使我們能夠直接在 C++ 環境中進行模型預測。

#include <iostream>
#include <vector>

#include <torch/torch.h>
#include <torch/csrc/inductor/aoti_package/model_package_loader.h>

int main() {
    c10::InferenceMode mode;

    torch::inductor::AOTIModelPackageLoader loader("model.pt2");
    // Assume running on CUDA
    std::vector<torch::Tensor> inputs = {torch::randn({8, 10}, at::kCUDA)};
    std::vector<torch::Tensor> outputs = loader.run(inputs);
    std::cout << "Result from the first inference:"<< std::endl;
    std::cout << outputs[0] << std::endl;

    // The second inference uses a different batch size and it works because we
    // specified that dimension as dynamic when compiling model.pt2.
    std::cout << "Result from the second inference:"<< std::endl;
    // Assume running on CUDA
    std::cout << loader.run({torch::randn({1, 10}, at::kCUDA)})[0] << std::endl;

    return 0;
}

要構建 C++ 檔案，您可以使用提供的 CMakeLists.txt 檔案，該檔案會自動呼叫 python model.py 來為模型進行 AOT 編譯，並將 inference.cpp 編譯成名為 aoti_example 的可執行二進位制檔案。

cmake_minimum_required(VERSION 3.18 FATAL_ERROR)
project(aoti_example)

find_package(Torch REQUIRED)

add_executable(aoti_example inference.cpp model.pt2)

add_custom_command(
    OUTPUT model.pt2
    COMMAND python ${CMAKE_CURRENT_SOURCE_DIR}/model.py
    DEPENDS model.py
)

target_link_libraries(aoti_example "${TORCH_LIBRARIES}")
set_property(TARGET aoti_example PROPERTY CXX_STANDARD 17)

假設目錄結構如下所示，您可以執行以下命令來構建二進位制檔案。請務必注意，CMAKE_PREFIX_PATH 變數對於 CMake 查詢 LibTorch 庫至關重要，並且應設定為絕對路徑。請注意，您的路徑可能與本示例中的路徑不同。

aoti_example/
    CMakeLists.txt
    inference.cpp
    model.py

$ mkdir build
$ cd build
$ CMAKE_PREFIX_PATH=/path/to/python/install/site-packages/torch/share/cmake cmake ..
$ cmake --build . --config Release

在 build 目錄中生成 aoti_example 二進位制檔案後，執行它將顯示類似如下的結果：

$ ./aoti_example
Result from the first inference:
0.4866
0.5184
0.4462
0.4611
0.4744
0.4811
0.4938
0.4193
[ CUDAFloatType{8,1} ]
Result from the second inference:
0.4883
0.4703
[ CUDAFloatType{2,1} ]

故障排除#

以下是一些用於除錯 AOT Inductor 的有用工具。

除錯工具

要啟用對輸入的執行時檢查，請將環境變數 AOTI_RUNTIME_CHECK_INPUTS 設定為 1。如果編譯後模型的輸入在大小、資料型別或 stride 上與匯出時使用的輸入不同，這將引發 RuntimeError。

API 參考#

torch._inductor.aoti_compile_and_package(exported_program, _deprecated_unused_args=None, _deprecated_unused_kwargs=None, *, package_path=None, inductor_configs=None)[source]#

使用 AOTInductor 編譯匯出的程式，並將其打包成由 input package_path 指定的 .pt2 構件。要載入包，您可以呼叫 torch._inductor.aoti_load_package(package_path)。

用法示例：

ep = torch.export.export(M(), ...)
aoti_file = torch._inductor.aoti_compile_and_package(
    ep, package_path="my_package.pt2"
)
compiled_model = torch._inductor.aoti_load_package("my_package.pt2")

要將多個模型編譯並儲存到單個 .pt2 構件中，您可以執行以下操作：

ep1 = torch.export.export(M1(), ...)
aoti_file1 = torch._inductor.aot_compile(
    ep1, ..., options={"aot_inductor.package": True}
)
ep2 = torch.export.export(M2(), ...)
aoti_file2 = torch._inductor.aot_compile(
    ep2, ..., options={"aot_inductor.package": True}
)

from torch._inductor.package import package_aoti, load_package

package_aoti("my_package.pt2", {"model1": aoti_file1, "model2": aoti_file2})

compiled_model1 = load_package("my_package.pt2", "model1")
compiled_model2 = load_package("my_package.pt2", "model2")

引數

exported_program (ExportedProgram) – 透過呼叫 torch.export 建立的匯出程式
package_path (Optional[FileLike]) – 可選的生成 .pt2 構件的指定路徑。
inductor_configs (Optional[dict[str, Any]]) – 可選的配置字典，用於控制 inductor。

返回

生成構件的路徑

返回型別

str

torch._inductor.aoti_load_package(path, run_single_threaded=False, device_index=-1)[source]#

從 PT2 包載入模型。

如果 PT2 包中打包了多個模型，這將載入預設模型。要載入特定模型，您可以直接呼叫載入 API。

from torch._inductor.package import load_package

compiled_model1 = load_package("my_package.pt2", "model1")
compiled_model2 = load_package("my_package.pt2", "model2")

引數

path (FileLike) – .pt2 包的路徑
run_single_threaded (bool) – 模型是否應在沒有執行緒同步邏輯的情況下執行。這有助於避免與 CUDAGraphs 衝突。
device_index (int) – 要將 PT2 包載入到的裝置索引。預設情況下，使用 device_index=-1，在使用 CUDA 時對應於 cuda 裝置。例如，傳遞 device_index=1 會將包載入到 cuda:1。

返回型別

AOTICompiledModel

AOTInductor：用於 Torch.Exported 模型的即時編譯#

模型編譯#

Python 中的推理#

C++ 中的推理#

故障排除#

API 參考#

文件

教程

資源