評價此頁

★ ★ ★ ★ ★

recipes/torch_compile_caching_tutorial

在 Google Colab 中執行

torch.compile 中的編譯時快取#

創建於: 2024 年 6 月 20 日 | 最後更新: 2025 年 6 月 24 日 | 最後驗證: 2024 年 11 月 05 日

作者: Oguz Ulgen

簡介#

PyTorch Compiler 提供多種快取選項以減少編譯延遲。本教程將詳細解釋這些選項，幫助使用者為自己的用例選擇最佳方案。

有關如何配置這些快取，請檢視編譯時快取配置。

還可以檢視我們的快取基準測試，網址為 PT CacheBench 基準測試。

先決條件#

在開始此秘籍之前，請確保您已具備以下條件

對 torch.compile 有基本瞭解。請參閱
PyTorch 2.4 或更高版本

快取選項#

torch.compile 提供以下快取選項：

端到端快取（也稱為 Mega-Cache）
TorchDynamo、TorchInductor 和 Triton 的模組化快取

需要注意的是，快取會驗證快取工件是否與相同的 PyTorch 和 Triton 版本一起使用，以及在裝置設定為 cuda 時是否使用相同的 GPU。

`torch.compile` 端到端快取（`Mega-Cache`）#

端到端快取（以下簡稱 Mega-Cache）是為尋求可移植快取解決方案的使用者提供的理想方案，該解決方案可以儲存在資料庫中，並可能在另一臺機器上檢索。

Mega-Cache 提供兩個編譯器 API：

torch.compiler.save_cache_artifacts()
torch.compiler.load_cache_artifacts()

預期用例是在編譯和執行模型後，使用者呼叫 torch.compiler.save_cache_artifacts()，它將以可移植形式返回編譯器工件。之後，可能在不同的機器上，使用者可以呼叫 torch.compiler.load_cache_artifacts() 並使用這些工件預填充 torch.compile 快取，以快速啟動其快取。

考慮以下示例。首先，編譯並儲存快取工件。

@torch.compile
def fn(x, y):
    return x.sin() @ y

a = torch.rand(100, 100, dtype=dtype, device=device)
b = torch.rand(100, 100, dtype=dtype, device=device)

result = fn(a, b)

artifacts = torch.compiler.save_cache_artifacts()

assert artifacts is not None
artifact_bytes, cache_info = artifacts

# Now, potentially store artifact_bytes in a database
# You can use cache_info for logging

之後，您可以透過以下方式快速啟動快取：

# Potentially download/fetch the artifacts from the database
torch.compiler.load_cache_artifacts(artifact_bytes)

此操作將填充下一節將討論的所有模組化快取，包括 PGO、AOTAutograd、Inductor、Triton 和 Autotuning。

`TorchDynamo`、`TorchInductor` 和 `Triton` 的模組化快取#

上述 Mega-Cache 由可以在沒有任何使用者干預的情況下使用的各個元件組成。預設情況下，PyTorch Compiler 提供 TorchDynamo、TorchInductor 和 Triton 的本地磁碟快取。這些快取包括：

FXGraphCache：編譯中使用的基於圖的 IR 元件的快取。
TritonCache：Triton 編譯結果的快取，包括 Triton 生成的 cubin 檔案和其他快取工件。
InductorCache：FXGraphCache 和 Triton 快取的集合。
AOTAutogradCache：聯合圖工件的快取。
PGO-cache：動態形狀決策的快取，以減少重新編譯次數。
AutotuningCache:
- Inductor 生成 Triton 核心並對其進行基準測試以選擇最快的核心。
- torch.compile 的內建 AutotuningCache 會快取這些結果。

所有這些快取工件都寫入 TORCHINDUCTOR_CACHE_DIR，預設情況下，它看起來像 /tmp/torchinductor_myusername。

遠端快取#

我們還為希望利用基於 Redis 的快取的使用者提供了遠端快取選項。有關如何啟用基於 Redis 的快取的更多資訊，請檢視編譯時快取配置。

結論#

在本教程中，我們瞭解到 PyTorch Inductor 的快取機制透過利用本地和遠端快取，顯著減少了編譯延遲，這些快取無縫地在後臺執行，無需使用者干預。

torch.compile 中的編譯時快取#

簡介#

先決條件#

快取選項#

`torch.compile` 端到端快取（`Mega-Cache`）#

`TorchDynamo`、`TorchInductor` 和 `Triton` 的模組化快取#

遠端快取#

結論#

文件

教程

資源

torch.compile 中的編譯時快取#

簡介#

先決條件#

快取選項#

torch.compile 端到端快取（Mega-Cache）#

TorchDynamo、TorchInductor 和 Triton 的模組化快取#

遠端快取#

結論#

文件

教程

資源

`torch.compile` 端到端快取（`Mega-Cache`）#

`TorchDynamo`、`TorchInductor` 和 `Triton` 的模組化快取#