注意
轉到末尾 以下載完整的示例程式碼。
(beta) 使用 LR Scheduler 執行編譯後的最佳化器#
建立日期:2024 年 5 月 21 日 | 最後更新日期:2024 年 5 月 21 日 | 最後驗證日期:2024 年 11 月 05 日
作者: Michael Lazos
最佳化器是訓練任何深度學習模型的關鍵演算法。在本例中,我們將展示如何將使用 torch.compile 編譯的最佳化器與 LR 排程器配對,以加速訓練收斂。
注意
本教程需要 PyTorch 2.3.0 或更高版本。
模型設定#
在此示例中,我們將使用一個簡單的線性層序列。
import torch
# Create simple model
model = torch.nn.Sequential(
*[torch.nn.Linear(1024, 1024, False, device="cuda") for _ in range(10)]
)
input = torch.rand(1024, device="cuda")
# run forward pass
output = model(input)
# run backward to populate the grads for our optimizer below
output.sum().backward()
設定和執行帶 LR 排程器的編譯後最佳化器#
在本節中,我們將使用 Adam 最佳化器和 LinearLR 排程器,並建立一個輔助函式來包裝 torch.compile() 中每個的 step() 呼叫。
注意
torch.compile 僅支援計算能力為 7.0 或更高的 CUDA 裝置。
# exit cleanly if we are on a device that doesn't support ``torch.compile``
if torch.cuda.get_device_capability() < (7, 0):
print("Exiting because torch.compile is not supported on this device.")
import sys
sys.exit(0)
# !!! IMPORTANT !!! Wrap the lr in a Tensor if we are pairing the
# the optimizer with an LR Scheduler.
# Without this, torch.compile will recompile as the value of the LR
# changes.
opt = torch.optim.Adam(model.parameters(), lr=torch.tensor(0.01))
sched = torch.optim.lr_scheduler.LinearLR(opt, total_iters=5)
@torch.compile(fullgraph=False)
def fn():
opt.step()
sched.step()
# Warmup runs to compile the function
for _ in range(5):
fn()
print(opt.param_groups[0]["lr"])
tensor(0.0047)
tensor(0.0060)
tensor(0.0073)
tensor(0.0087)
tensor(0.0100)
擴充套件:非張量 LR 會怎樣?#
如果您好奇,我們將展示當 LR 未被包裝成張量時,torch.compile 的行為。
# No longer wrap the LR in a tensor here
opt = torch.optim.Adam(model.parameters(), lr=0.01)
sched = torch.optim.lr_scheduler.LinearLR(opt, total_iters=5)
@torch.compile(fullgraph=False)
def fn():
opt.step()
sched.step()
# Setup logging to view recompiles
torch._logging.set_logs(recompiles=True)
# Warmup runs to compile the function
# We will now recompile on each iteration
# as the value of the lr is mutated.
for _ in range(5):
fn()
V1015 19:17:55.642000 23021 torch/_dynamo/guards.py:4181] [1/1] [__recompiles] Recompiling function wrapper in /usr/local/lib/python3.10/dist-packages/torch/optim/optimizer.py:497
V1015 19:17:55.642000 23021 torch/_dynamo/guards.py:4181] [1/1] [__recompiles] triggered by the following guard failure(s):
V1015 19:17:55.642000 23021 torch/_dynamo/guards.py:4181] [1/1] [__recompiles] - 1/0: Cache line invalidated because L['args'][0] got deallocated
V1015 19:17:55.682000 23021 torch/_dynamo/guards.py:4181] [2/1] [__recompiles] Recompiling function step in /usr/local/lib/python3.10/dist-packages/torch/optim/adam.py:213
V1015 19:17:55.682000 23021 torch/_dynamo/guards.py:4181] [2/1] [__recompiles] triggered by the following guard failure(s):
V1015 19:17:55.682000 23021 torch/_dynamo/guards.py:4181] [2/1] [__recompiles] - 2/0: Cache line invalidated because L['self'] got deallocated
V1015 19:18:01.219000 23021 torch/_dynamo/guards.py:4181] [2/2] [__recompiles] Recompiling function step in /usr/local/lib/python3.10/dist-packages/torch/optim/adam.py:213
V1015 19:18:01.219000 23021 torch/_dynamo/guards.py:4181] [2/2] [__recompiles] triggered by the following guard failure(s):
V1015 19:18:01.219000 23021 torch/_dynamo/guards.py:4181] [2/2] [__recompiles] - 2/1: ___as_tensor(self.param_groups[0]['lr']).item() == 0.003333333333333333 # (unknown source ___as_tensor(self.param_groups[0]['lr']).item(), please file a bug)
V1015 19:18:01.219000 23021 torch/_dynamo/guards.py:4181] [2/2] [__recompiles] - 2/0: Cache line invalidated because L['self'] got deallocated
V1015 19:18:04.679000 23021 torch/_dynamo/guards.py:4181] [2/3] [__recompiles] Recompiling function step in /usr/local/lib/python3.10/dist-packages/torch/optim/adam.py:213
V1015 19:18:04.679000 23021 torch/_dynamo/guards.py:4181] [2/3] [__recompiles] triggered by the following guard failure(s):
V1015 19:18:04.679000 23021 torch/_dynamo/guards.py:4181] [2/3] [__recompiles] - 2/2: ___as_tensor(self.param_groups[0]['lr']).item() == 0.004666666666666667 # (unknown source ___as_tensor(self.param_groups[0]['lr']).item(), please file a bug)
V1015 19:18:04.679000 23021 torch/_dynamo/guards.py:4181] [2/3] [__recompiles] - 2/1: ___as_tensor(self.param_groups[0]['lr']).item() == 0.003333333333333333 # (unknown source ___as_tensor(self.param_groups[0]['lr']).item(), please file a bug)
V1015 19:18:04.679000 23021 torch/_dynamo/guards.py:4181] [2/3] [__recompiles] - 2/0: Cache line invalidated because L['self'] got deallocated
V1015 19:18:07.903000 23021 torch/_dynamo/guards.py:4181] [2/4] [__recompiles] Recompiling function step in /usr/local/lib/python3.10/dist-packages/torch/optim/adam.py:213
V1015 19:18:07.903000 23021 torch/_dynamo/guards.py:4181] [2/4] [__recompiles] triggered by the following guard failure(s):
V1015 19:18:07.903000 23021 torch/_dynamo/guards.py:4181] [2/4] [__recompiles] - 2/3: ___as_tensor(self.param_groups[0]['lr']).item() == 0.006000000000000001 # (unknown source ___as_tensor(self.param_groups[0]['lr']).item(), please file a bug)
V1015 19:18:07.903000 23021 torch/_dynamo/guards.py:4181] [2/4] [__recompiles] - 2/2: ___as_tensor(self.param_groups[0]['lr']).item() == 0.004666666666666667 # (unknown source ___as_tensor(self.param_groups[0]['lr']).item(), please file a bug)
V1015 19:18:07.903000 23021 torch/_dynamo/guards.py:4181] [2/4] [__recompiles] - 2/1: ___as_tensor(self.param_groups[0]['lr']).item() == 0.003333333333333333 # (unknown source ___as_tensor(self.param_groups[0]['lr']).item(), please file a bug)
V1015 19:18:07.903000 23021 torch/_dynamo/guards.py:4181] [2/4] [__recompiles] - 2/0: Cache line invalidated because L['self'] got deallocated
V1015 19:18:11.129000 23021 torch/_dynamo/guards.py:4181] [2/5] [__recompiles] Recompiling function step in /usr/local/lib/python3.10/dist-packages/torch/optim/adam.py:213
V1015 19:18:11.129000 23021 torch/_dynamo/guards.py:4181] [2/5] [__recompiles] triggered by the following guard failure(s):
V1015 19:18:11.129000 23021 torch/_dynamo/guards.py:4181] [2/5] [__recompiles] - 2/4: ___as_tensor(self.param_groups[0]['lr']).item() == 0.007333333333333335 # (unknown source ___as_tensor(self.param_groups[0]['lr']).item(), please file a bug)
V1015 19:18:11.129000 23021 torch/_dynamo/guards.py:4181] [2/5] [__recompiles] - 2/3: ___as_tensor(self.param_groups[0]['lr']).item() == 0.006000000000000001 # (unknown source ___as_tensor(self.param_groups[0]['lr']).item(), please file a bug)
V1015 19:18:11.129000 23021 torch/_dynamo/guards.py:4181] [2/5] [__recompiles] - 2/2: ___as_tensor(self.param_groups[0]['lr']).item() == 0.004666666666666667 # (unknown source ___as_tensor(self.param_groups[0]['lr']).item(), please file a bug)
V1015 19:18:11.129000 23021 torch/_dynamo/guards.py:4181] [2/5] [__recompiles] - 2/1: ___as_tensor(self.param_groups[0]['lr']).item() == 0.003333333333333333 # (unknown source ___as_tensor(self.param_groups[0]['lr']).item(), please file a bug)
V1015 19:18:11.129000 23021 torch/_dynamo/guards.py:4181] [2/5] [__recompiles] - 2/0: Cache line invalidated because L['self'] got deallocated
透過這個例子,我們可以看到由於 param_groups[0] 中的 lr 發生守衛失敗,我們重新編譯了幾次最佳化器。
結論#
在本教程中,我們展示瞭如何將使用 torch.compile 編譯的最佳化器與 LR 排程器配對,以加速訓練收斂。我們使用了一個由簡單線性層序列組成的模型,並搭配了 Adam 最佳化器和 LinearLR 排程器,以演示 LR 在迭代過程中如何變化。
另請參閱
編譯後的最佳化器教程 - 編譯後最佳化器的介紹。
使用 PT2 編譯最佳化器 - 關於編譯後最佳化器的更深入技術細節。
指令碼總執行時間: (0 分 23.734 秒)