Accelerator Integration#

Created On: Sep 02, 2025 | Last Updated On: Sep 02, 2025

自 PyTorch 2.1 起，社群在簡化將新加速器整合到 PyTorch 生態系統的過程中取得了重大進展。這些改進包括但不限於：對 PrivateUse1 Dispatch Key 的改進、核心子系統擴充套件機制的引入和增強，以及關鍵模組（例如 torch.accelerator、memory management）的裝置無關重構。總而言之，這些進步為加速器整合提供了一個強大、靈活且對開發者友好的途徑基礎。

Why Does This Matter?#

This integration pathway offers several major benefits

Speed: Extensibility is built into all core PyTorch modules. Developers can integrate new accelerators into their downstream codebases independently—without modifying upstream code and without being limited by community review bandwidth.
Future-proofing: This is the default integration path for all future PyTorch features, meaning that as new modules and features are added, they will automatically support scaling to new accelerators if this path is followed.
Autonomy: Vendors maintain full control over their accelerator integration timelines, enabling fast iteration cycles and reducing reliance on upstream coordination.

About This Document#

This guide aims to provide a comprehensive overview of the modern integration pathway for new accelerator in PyTorch. It walks through the full integration surface, from low-level device primitives to higher-level domain modules like compilation and quantization. The structure follows a modular and scenario-driven approach, where each topic is paired with corresponding code examples from torch_openreg, an official reference implementation.

The goal is to help developers

Understand the full scope of accelerator integration;
Follow best practices to quickly launch new accelerators;
Avoid common pitfalls through clear, targeted examples.

Target Audience#

This document is intended for

Accelerator Developers who are integrating accelerator into PyTorch;
Advanced PyTorch Users interested in the inner workings of key modules;

Quick Overview#

This document outlines the key processes and practical scenarios involved in integrating new devices into PyTorch, providing developers with a comprehensive and detailed guide for bringing up new backends. The discussion is structured around four major axes

Runtime: Covers core components such as Event, Stream, Memory, Generator, Guard, Hooks, as well as the supporting C++ scaffolding.
Operators: Involve the minimum necessary set of operators, forward and backward operators, fallback operators, fallthroughs, STUBs, etc. in both C++ and Python implementations.
Python Frontend: Focuses on Python bindings for modules and device-agnostic APIs.
High-level Modules: Explores integration with major subsystems such as AMP, Compiler, ONNX, and Distributed and so on.

Next, we will officially embark on the integration journey for a new PyTorch accelerator.

注意

This guide is a work in progress. For more details, please refer to the roadmap.

運算元註冊

Accelerator Integration#

Why Does This Matter?#

About This Document#

Target Audience#

Quick Overview#

文件

教程

資源