評價此頁

★ ★ ★ ★ ★

beginner/onnx/onnx_registry_tutorial

在 Google Colab 中執行

注意

請跳轉到末尾下載完整的示例程式碼。

ONNX 簡介 || 將 PyTorch 模型匯出到 ONNX || 擴充套件 ONNX 匯出器運算子支援 || 將包含控制流的模型匯出到 ONNX

擴充套件 ONNX 匯出器運算子支援#

建立時間：2023年10月06日 | 最後更新：2025年03月05日 | 最後驗證：2024年11月05日

作者： Ti-Tai Wang, Justin Chu

概述#

本教程描述瞭如何為不受支援的 PyTorch 運算子建立 ONNX 實現，或用您自己的實現替換現有實現。

我們將涵蓋需要擴充套件 ONNX 匯出器運算子支援的三個場景：

重寫現有 PyTorch 運算子的實現
使用自定義 ONNX 運算子
支援自定義 PyTorch 運算子

您將學到什麼

如何重寫或新增對 ONNX 中 PyTorch 運算子的支援。
如何為專用執行時整合自定義 ONNX 運算子。
如何實現和轉換自定義 PyTorch 運算子到 ONNX。

先決條件#

在開始本教程之前，請確保您已完成以下先決條件：

torch >= 2.6
目標 PyTorch 運算子
已完成 ONNX Script 教程，然後繼續。
ONNX Script 的運算子實現。

重寫現有 PyTorch 運算子的實現#

儘管 ONNX 匯出器團隊盡最大努力支援所有 PyTorch 運算子，但其中一些可能尚未得到支援。在本節中，我們將演示如何將不受支援的 PyTorch 運算子新增到 ONNX Registry。

注意

實現不受支援的 PyTorch 運算子的步驟與使用自定義實現替換現有 PyTorch 運算子的實現步驟相同。因為本教程中實際上沒有不受支援的 PyTorch 運算子，所以我們將利用這一點，以同樣的方式替換 `torch.ops.aten.add.Tensor` 的實現，就像該運算子未被 ONNX 匯出器實現一樣。

當模型因不受支援的運算子而無法匯出到 ONNX 時，ONNX 匯出器將顯示類似以下的錯誤訊息：

No decompositions registered for [...]

錯誤訊息表明不受支援的 PyTorch 運算子是 `torch.ops.aten.add.Tensor`。該運算子的型別是 ``，我們將使用此運算子作為註冊自定義實現的目標。

import torch
import onnxscript

# Opset 18 is the standard supported version as of PyTorch 2.6
from onnxscript import opset18 as op


# Create a model that uses the operator torch.ops.aten.add.Tensor
class Model(torch.nn.Module):
    def forward(self, input_x, input_y):
        return torch.ops.aten.add.Tensor(input_x, input_y)


# NOTE: The function signature (including parameter names) must match the signature of the unsupported PyTorch operator.
# https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/native_functions.yaml
# All attributes must be annotated with type hints.
def custom_aten_add(self, other, alpha: float = 1.0):
    if alpha != 1.0:
        alpha = op.CastLike(alpha, other)
        other = op.Mul(other, alpha)
    # To distinguish the custom implementation from the builtin one, we switch the order of the inputs
    return op.Add(other, self)


x = torch.tensor([1.0])
y = torch.tensor([2.0])

# Then we provide the custom implementation to the ONNX exporter as a ``custom_translation_table``.
onnx_program = torch.onnx.export(
    Model().eval(),
    (x, y),
    dynamo=True,
    custom_translation_table={
        torch.ops.aten.add.Tensor: custom_aten_add,
    },
)
# Optimize the ONNX graph to remove redundant nodes
onnx_program.optimize()

[torch.onnx] Obtain model graph for `Model()` with `torch.export.export(..., strict=False)`...
[torch.onnx] Obtain model graph for `Model()` with `torch.export.export(..., strict=False)`... ✅
[torch.onnx] Run decomposition...
[torch.onnx] Run decomposition... ✅
[torch.onnx] Translate the graph into ONNX...
[torch.onnx] Translate the graph into ONNX... ✅

現在讓我們檢查模型並驗證模型是否使用了自定義實現。

print(onnx_program.model)

<
    ir_version=10,
    opset_imports={'': 20},
    producer_name='pytorch',
    producer_version='2.9.0+cu128',
    domain=None,
    model_version=None,
>
graph(
    name=main_graph,
    inputs=(
        %"input_x"<FLOAT,[1]>,
        %"input_y"<FLOAT,[1]>
    ),
    outputs=(
        %"add"<FLOAT,[1]>
    ),
) {
    0 |  # node_add
         %"add"<FLOAT,[1]> ⬅️ ::Add(%"input_y", %"input_x")
    return %"add"<FLOAT,[1]>
}

轉換使用了我們的自定義實現：在節點 `node_Add_0` 中，`input_y` 現在位於前面，`input_x` 位於後面。

我們可以使用 ONNX Runtime 透過直接在輸入張量上呼叫 torch.onnx.ONNXProgram 來執行模型並驗證結果。

result = onnx_program(x, y)[0]
torch.testing.assert_close(result, torch.tensor([3.0]))

使用自定義 ONNX 運算子#

在這種情況下，我們建立了一個包含標準 PyTorch 運算子的模型，但執行時（如 Microsoft 的 ONNX Runtime）可以為該核心提供自定義實現，從而有效地替換現有實現。

在以下示例中，我們使用了 ONNX Runtime 提供的 `com.microsoft.Gelu` 運算子，這與 ONNX 規範中的 `Gelu` 不同。

class GeluModel(torch.nn.Module):
    def forward(self, input_x):
        return torch.ops.aten.gelu(input_x)


# Create a namespace for the custom operator using ONNX Script
# ``com.microsoft`` is an official ONNX Runtime namespace
microsoft_op = onnxscript.values.Opset(domain="com.microsoft", version=1)

# NOTE: The function signature (including parameter names) must match the signature of the unsupported PyTorch operator.
# https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/native_functions.yaml
# NOTE: All attributes must be annotated with type hints.
# The function must be scripted using the ``@onnxscript.script()`` decorator when
# using operators from custom domains. This may be improved in future versions.
from onnxscript import FLOAT


@onnxscript.script(microsoft_op)
def custom_aten_gelu(self: FLOAT, approximate: str = "none") -> FLOAT:
    return microsoft_op.Gelu(self)


onnx_program = torch.onnx.export(
    GeluModel().eval(),
    (x,),
    dynamo=True,
    custom_translation_table={
        torch.ops.aten.gelu.default: custom_aten_gelu,
    },
)

# Optimize the ONNX graph to remove redundant nodes
onnx_program.optimize()

[torch.onnx] Obtain model graph for `GeluModel()` with `torch.export.export(..., strict=False)`...
[torch.onnx] Obtain model graph for `GeluModel()` with `torch.export.export(..., strict=False)`... ✅
[torch.onnx] Run decomposition...
[torch.onnx] Run decomposition... ✅
[torch.onnx] Translate the graph into ONNX...
[torch.onnx] Translate the graph into ONNX... ✅

讓我們檢查模型並驗證模型是否使用了來自 `com.microsoft` 名稱空間的 `Gelu` op_type。

print(onnx_program.model)

<
    ir_version=10,
    opset_imports={'com.microsoft': 1, '': 20},
    producer_name='pytorch',
    producer_version='2.9.0+cu128',
    domain=None,
    model_version=None,
>
graph(
    name=main_graph,
    inputs=(
        %"input_x"<FLOAT,[1]>
    ),
    outputs=(
        %"gelu"<FLOAT,[1]>
    ),
) {
    0 |  # n0
         %"gelu"<FLOAT,[1]> ⬅️ com.microsoft::Gelu(%"input_x")
    return %"gelu"<FLOAT,[1]>
}

與前面的示例類似，我們可以使用 ONNX Runtime 來執行模型並驗證結果。

result = onnx_program(x)[0]
torch.testing.assert_close(result, torch.ops.aten.gelu(x))

支援自定義 PyTorch 運算子#

在這種情況下，該運算子是使用者實現並註冊到 PyTorch 的運算子。

在以下示例中，我們希望使用一個自定義運算子，該運算子接受一個張量輸入並返回一個輸出。該運算子將輸入與其自身相加，並返回四捨五入的結果。

首先，我們假設自定義運算子已使用 `torch.library.custom_op()` 實現並註冊。您可以參考 Python 中建立新的自定義運算子以獲取建立自定義運算子的詳細指南。

# Define and use the operator in PyTorch
@torch.library.custom_op("mylibrary::add_and_round_op", mutates_args=())
def add_and_round_op(input: torch.Tensor) -> torch.Tensor:
    return torch.round(input + input)


@add_and_round_op.register_fake
def _add_and_round_op_fake(tensor_x):
    return torch.empty_like(tensor_x)


class AddAndRoundModel(torch.nn.Module):
    def forward(self, input):
        return add_and_round_op(input)


# Implement the custom operator in ONNX using ONNX Script
def onnx_add_and_round(input):
    return op.Round(op.Add(input, input))


onnx_program = torch.onnx.export(
    AddAndRoundModel().eval(),
    (x,),
    dynamo=True,
    custom_translation_table={
        torch.ops.mylibrary.add_and_round_op.default: onnx_add_and_round,
    },
)

# Optimize the ONNX graph to remove redundant nodes
onnx_program.optimize()
print(onnx_program)

[torch.onnx] Obtain model graph for `AddAndRoundModel()` with `torch.export.export(..., strict=False)`...
[torch.onnx] Obtain model graph for `AddAndRoundModel()` with `torch.export.export(..., strict=False)`... ✅
[torch.onnx] Run decomposition...
[torch.onnx] Run decomposition... ✅
[torch.onnx] Translate the graph into ONNX...
[torch.onnx] Translate the graph into ONNX... ✅
ONNXProgram(
    model=
        <
            ir_version=10,
            opset_imports={'': 20},
            producer_name='pytorch',
            producer_version='2.9.0+cu128',
            domain=None,
            model_version=None,
        >
        graph(
            name=main_graph,
            inputs=(
                %"input"<FLOAT,[1]>
            ),
            outputs=(
                %"add_and_round_op"<FLOAT,[1]>
            ),
        ) {
            0 |  # node_Add_0
                 %"val_0"<FLOAT,[1]> ⬅️ ::Add(%"input", %"input")
            1 |  # node_add_and_round_op
                 %"add_and_round_op"<FLOAT,[1]> ⬅️ ::Round(%"val_0")
            return %"add_and_round_op"<FLOAT,[1]>
        }


    ,
    exported_program=
        ExportedProgram:
            class GraphModule(torch.nn.Module):
                def forward(self, input: "f32[1]"):
                    input_1 = input

                     # File: /var/lib/workspace/beginner_source/onnx/onnx_registry_tutorial.py:215 in forward, code: return add_and_round_op(input)
                    add_and_round_op: "f32[1]" = torch.ops.mylibrary.add_and_round_op.default(input_1);  input_1 = None
                    return (add_and_round_op,)

        Graph signature:
            # inputs
            input: USER_INPUT

            # outputs
            add_and_round_op: USER_OUTPUT

        Range constraints: {}

)

轉換使用了我們的自定義實現，將 `torch.export.ExportedProgram` 中的 `torch.ops.mylibrary.add_and_round_op.default` 運算子轉換為 ONNX 運算子 `Add` 和 `Round`。

最後，我們驗證結果。

result = onnx_program(x)[0]
torch.testing.assert_close(result, add_and_round_op(x))

結論#

恭喜！在本教程中，我們探索了 `custom_translation_table` 選項，並學習瞭如何使用 ONNX Script 為不受支援或現有的 PyTorch 運算子建立自定義實現。

最後，我們利用 ONNX Runtime 執行模型並與 PyTorch 的結果進行比較，從而全面瞭解了在 ONNX 生態系統中處理不受支援的運算子。

延伸閱讀#

下面的列表引用了從基本示例到高階場景的教程，不一定按列出的順序。您可以隨時跳轉到您感興趣的特定主題，或者坐下來，享受學習 ONNX 匯出器所有知識的樂趣。

將 PyTorch 模型匯出到 ONNX
擴充套件 ONNX 匯出器運算元支援
將帶控制流的模型匯出到 ONNX

指令碼總執行時間： (0 分鐘 2.891 秒)