Modules#

創建於：2021年02月04日 | 最後更新於：2024年11月08日

PyTorch 使用模組來表示神經網路。模組是

有狀態計算的基本構建塊。 PyTorch 提供了一個強大的模組庫，並允許輕鬆定義新的自定義模組，從而可以輕鬆構建複雜的、多層神經網路。
與 PyTorch 的 autograd 系統緊密整合。 模組可以輕鬆地指定 PyTorch 最佳化器要更新的可學習引數。
易於使用和轉換。 模組可以輕鬆地儲存和恢復、在 CPU / GPU / TPU 裝置之間傳輸、剪枝、量化等。

本文件介紹了模組，面向所有 PyTorch 使用者。由於模組在 PyTorch 中非常基礎，因此本文件中的許多主題將在其他文件或教程中進一步闡述，並且此處也提供了許多連結到這些文件。

一個簡單的自定義模組
模組作為構建塊
使用模組進行神經網路訓練
模組狀態
模組初始化
模組鉤子
高階功能

一個簡單的自定義模組 #

為了入門，讓我們來看一個 PyTorch 的 Linear 模組的簡化自定義版本。該模組對輸入應用仿射變換。

import torch
from torch import nn

class MyLinear(nn.Module):
  def __init__(self, in_features, out_features):
    super().__init__()
    self.weight = nn.Parameter(torch.randn(in_features, out_features))
    self.bias = nn.Parameter(torch.randn(out_features))

  def forward(self, input):
    return (input @ self.weight) + self.bias

這個簡單的模組具有模組的以下基本特徵：

它繼承自基礎的 Module 類。 所有模組都應該繼承自 Module，以便與其他模組組合。
它定義了一些用於計算的“狀態”。 在這裡，狀態由隨機初始化的 weight 和 bias 張量組成，它們定義了仿射變換。由於這些都定義為 Parameter，因此它們會被註冊到模組中，並且會由對 parameters() 的呼叫自動跟蹤和返回。引數可以被認為是模組計算的“可學習”方面（稍後將詳細介紹）。請注意，模組不一定需要有狀態，也可以是無狀態的。
它定義了一個執行計算的 forward() 函式。 對於這個仿射變換模組，輸入透過矩陣乘法與 weight 引數相乘（使用 @ 簡寫表示法），並加上 bias 引數以產生輸出。更一般地，模組的 forward() 實現可以執行涉及任意數量輸入和輸出的任意計算。

這個簡單的模組演示了模組如何將狀態和計算打包在一起。可以透過構造並呼叫此模組的例項來完成

m = MyLinear(4, 3)
sample_input = torch.randn(4)
m(sample_input)
: tensor([-0.3037, -1.0413, -4.2057], grad_fn=<AddBackward0>)

請注意，模組本身是可呼叫的，呼叫它會呼叫其 forward() 函式。這個名字是對“前向傳播”和“反向傳播”概念的引用，這些概念適用於每個模組。“前向傳播”負責將模組表示的計算應用於給定的輸入（如上面的程式碼片段所示）。“反向傳播”計算模組輸出相對於其輸入的梯度，這些梯度可用於透過梯度下降方法“訓練”引數。PyTorch 的 autograd 系統會自動處理這種反向傳播計算，因此無需為每個模組手動實現 backward() 函式。透過 successive 前向/反向傳播訓練模組引數的過程將在使用模組進行神經網路訓練中詳細介紹。

透過呼叫 parameters() 或 named_parameters() 可以遍歷模組註冊的全部引數，後者包含每個引數的名稱。

for parameter in m.named_parameters():
  print(parameter)
: ('weight', Parameter containing:
tensor([[ 1.0597,  1.1796,  0.8247],
        [-0.5080, -1.2635, -1.1045],
        [ 0.0593,  0.2469, -1.4299],
        [-0.4926, -0.5457,  0.4793]], requires_grad=True))
('bias', Parameter containing:
tensor([ 0.3634,  0.2015, -0.8525], requires_grad=True))

通常，模組註冊的引數是模組計算中應被“學習”的方面。本文件的後續部分將介紹如何使用 PyTorch 的最佳化器之一來更新這些引數。但在那之前，讓我們先研究一下模組如何相互組合。

模組作為構建塊 #

模組可以包含其他模組，使它們成為開發更復雜功能的有用構建塊。最簡單的方法是使用 Sequential 模組。它允許我們將多個模組連結在一起。

net = nn.Sequential(
  MyLinear(4, 3),
  nn.ReLU(),
  MyLinear(3, 1)
)

sample_input = torch.randn(4)
net(sample_input)
: tensor([-0.6749], grad_fn=<AddBackward0>)

請注意，Sequential 會自動將第一個 MyLinear 模組的輸出作為輸入傳遞給 ReLU，然後將後者的輸出作為輸入傳遞給第二個 MyLinear 模組。如所示，它僅限於具有單個輸入和輸出的模組的順序連結。

總的來說，建議對於除了最簡單用例之外的任何情況都定義一個自定義模組，因為這可以為模組的計算中如何使用子模組提供完全的靈活性。

例如，這是一個作為自定義模組實現的簡單神經網路：

import torch.nn.functional as F

class Net(nn.Module):
  def __init__(self):
    super().__init__()
    self.l0 = MyLinear(4, 3)
    self.l1 = MyLinear(3, 1)
  def forward(self, x):
    x = self.l0(x)
    x = F.relu(x)
    x = self.l1(x)
    return x

該模組由兩個“子模組”或“子模組”（l0 和 l1）組成，它們定義了神經網路的層並在模組的 forward() 方法中用於計算。模組的直接子模組可以透過呼叫 children() 或 named_children() 進行迭代。

net = Net()
for child in net.named_children():
  print(child)
: ('l0', MyLinear())
('l1', MyLinear())

要深入到直接子模組之外，modules() 和 named_modules() 會遞迴地迭代一個模組及其子模組。

class BigNet(nn.Module):
  def __init__(self):
    super().__init__()
    self.l1 = MyLinear(5, 4)
    self.net = Net()
  def forward(self, x):
    return self.net(self.l1(x))

big_net = BigNet()
for module in big_net.named_modules():
  print(module)
: ('', BigNet(
  (l1): MyLinear()
  (net): Net(
    (l0): MyLinear()
    (l1): MyLinear()
  )
))
('l1', MyLinear())
('net', Net(
  (l0): MyLinear()
  (l1): MyLinear()
))
('net.l0', MyLinear())
('net.l1', MyLinear())

有時，模組需要動態定義子模組。 ModuleList 和 ModuleDict 模組在這裡很有用；它們從列表或字典中註冊子模組。

class DynamicNet(nn.Module):
  def __init__(self, num_layers):
    super().__init__()
    self.linears = nn.ModuleList(
      [MyLinear(4, 4) for _ in range(num_layers)])
    self.activations = nn.ModuleDict({
      'relu': nn.ReLU(),
      'lrelu': nn.LeakyReLU()
    })
    self.final = MyLinear(4, 1)
  def forward(self, x, act):
    for linear in self.linears:
      x = linear(x)
      x = self.activations[act](x)
    x = self.final(x)
    return x

dynamic_net = DynamicNet(3)
sample_input = torch.randn(4)
output = dynamic_net(sample_input, 'relu')

對於任何給定的模組，其引數包括其直接引數以及所有子模組的引數。這意味著對 parameters() 和 named_parameters() 的呼叫將遞迴地包含子引數，從而方便地最佳化網路中的所有引數。

for parameter in dynamic_net.named_parameters():
  print(parameter)
: ('linears.0.weight', Parameter containing:
tensor([[-1.2051,  0.7601,  1.1065,  0.1963],
        [ 3.0592,  0.4354,  1.6598,  0.9828],
        [-0.4446,  0.4628,  0.8774,  1.6848],
        [-0.1222,  1.5458,  1.1729,  1.4647]], requires_grad=True))
('linears.0.bias', Parameter containing:
tensor([ 1.5310,  1.0609, -2.0940,  1.1266], requires_grad=True))
('linears.1.weight', Parameter containing:
tensor([[ 2.1113, -0.0623, -1.0806,  0.3508],
        [-0.0550,  1.5317,  1.1064, -0.5562],
        [-0.4028, -0.6942,  1.5793, -1.0140],
        [-0.0329,  0.1160, -1.7183, -1.0434]], requires_grad=True))
('linears.1.bias', Parameter containing:
tensor([ 0.0361, -0.9768, -0.3889,  1.1613], requires_grad=True))
('linears.2.weight', Parameter containing:
tensor([[-2.6340, -0.3887, -0.9979,  0.0767],
        [-0.3526,  0.8756, -1.5847, -0.6016],
        [-0.3269, -0.1608,  0.2897, -2.0829],
        [ 2.6338,  0.9239,  0.6943, -1.5034]], requires_grad=True))
('linears.2.bias', Parameter containing:
tensor([ 1.0268,  0.4489, -0.9403,  0.1571], requires_grad=True))
('final.weight', Parameter containing:
tensor([[ 0.2509], [-0.5052], [ 0.3088], [-1.4951]], requires_grad=True))
('final.bias', Parameter containing:
tensor([0.3381], requires_grad=True))

使用 to() 可以輕鬆地將所有引數移動到不同的裝置或更改其精度。

# Move all parameters to a CUDA device
dynamic_net.to(device='cuda')

# Change precision of all parameters
dynamic_net.to(dtype=torch.float64)

dynamic_net(torch.randn(5, device='cuda', dtype=torch.float64))
: tensor([6.5166], device='cuda:0', dtype=torch.float64, grad_fn=<AddBackward0>)

更一般地，可以透過使用 apply() 函式將任意函式遞迴地應用於模組及其子模組。例如，為模組及其子模組的引數應用自定義初始化。

# Define a function to initialize Linear weights.
# Note that no_grad() is used here to avoid tracking this computation in the autograd graph.
@torch.no_grad()
def init_weights(m):
  if isinstance(m, nn.Linear):
    nn.init.xavier_normal_(m.weight)
    m.bias.fill_(0.0)

# Apply the function recursively on the module and its submodules.
dynamic_net.apply(init_weights)

這些示例展示瞭如何透過模組組合來形成複雜的神經網路並方便地進行操作。為了能夠以最少的樣板程式碼快速輕鬆地構建神經網路，PyTorch 在 torch.nn 名稱空間中提供了大量高效能模組，這些模組執行常見的神經網路操作，如池化、卷積、損失函式等。

在下一節中，我們將提供一個完整的神經網路訓練示例。

有關更多資訊，請檢視

PyTorch 提供的模組庫：torch.nn
定義神經網路模組：https://pytorch.com.tw/tutorials/beginner/examples_nn/polynomial_module.html

使用模組進行神經網路訓練 #

一旦構建了網路，就必須對其進行訓練，並且可以使用 torch.optim 中的 PyTorch 最佳化器之一輕鬆最佳化其引數。

# Create the network (from previous section) and optimizer
net = Net()
optimizer = torch.optim.SGD(net.parameters(), lr=1e-4, weight_decay=1e-2, momentum=0.9)

# Run a sample training loop that "teaches" the network
# to output the constant zero function
for _ in range(10000):
  input = torch.randn(4)
  output = net(input)
  loss = torch.abs(output)
  net.zero_grad()
  loss.backward()
  optimizer.step()

# After training, switch the module to eval mode to do inference, compute performance metrics, etc.
# (see discussion below for a description of training and evaluation modes)
...
net.eval()
...

在這個簡化的例子中，網路學會了簡單地輸出零，因為任何非零輸出都會根據其絕對值被“懲罰”，透過使用 torch.abs() 作為損失函式。雖然這不是一個非常有趣的任務，但訓練的關鍵部分是存在的：

建立了一個網路。
建立了一個最佳化器（在本例中是隨機梯度下降最佳化器），並將網路的引數與之關聯。
一個訓練迴圈……
- 獲取輸入，
- 執行網路，
- 計算損失，
- 清零網路引數的梯度，
- 呼叫 loss.backward() 來更新引數的梯度，
- 呼叫 optimizer.step() 來將梯度應用於引數。

在上述程式碼片段執行後，請注意網路的引數已發生變化。特別是，檢查 l1 的 weight 引數的值表明其值現在更接近於 0（正如預期的那樣）。

print(net.l1.weight)
: Parameter containing:
tensor([[-0.0013],
        [ 0.0030],
        [-0.0008]], requires_grad=True)

請注意，上述過程完全在網路模組處於“訓練模式”時進行。模組預設處於訓練模式，可以使用 train() 和 eval() 在訓練和評估模式之間切換。它們在不同模式下的行為可能不同。例如，BatchNorm 模組在訓練期間維護一個執行均值和方差，在評估模式下不更新。通常，模組在訓練期間應處於訓練模式，僅在推理或評估時切換到評估模式。下面是一個在兩種模式下行為不同的自定義模組示例：

class ModalModule(nn.Module):
  def __init__(self):
    super().__init__()

  def forward(self, x):
    if self.training:
      # Add a constant only in training mode.
      return x + 1.
    else:
      return x


m = ModalModule()
x = torch.randn(4)

print('training mode output: {}'.format(m(x)))
: tensor([1.6614, 1.2669, 1.0617, 1.6213, 0.5481])

m.eval()
print('evaluation mode output: {}'.format(m(x)))
: tensor([ 0.6614,  0.2669,  0.0617,  0.6213, -0.4519])

訓練神經網路常常是一項棘手的任務。有關更多資訊，請檢視：

使用最佳化器：https://pytorch.com.tw/tutorials/beginner/examples_nn/two_layer_net_optim.html。
神經網路訓練：https://pytorch.com.tw/tutorials/beginner/blitz/neural_networks_tutorial.html
Autograd 入門：https://pytorch.com.tw/tutorials/beginner/blitz/autograd_tutorial.html

模組狀態 #

在上一個章節中，我們演示了訓練模組的“引數”或可學習的計算方面。現在，如果我們想將訓練好的模型儲存到磁碟，可以透過儲存其 state_dict（即“狀態字典”）來完成。

# Save the module
torch.save(net.state_dict(), 'net.pt')

...

# Load the module later on
new_net = Net()
new_net.load_state_dict(torch.load('net.pt'))
: <All keys matched successfully>

模組的 state_dict 包含影響其計算的狀態。這包括但不限於模組的引數。對於某些模組，可能需要引數以外的狀態來影響模組計算但又不可學習。對於這些情況，PyTorch 提供了“緩衝區”的概念，包括“持久”和“非持久”兩種。以下是模組可以具有的各種狀態型別的概述：

引數：計算的可學習方面；包含在 state_dict 中。
緩衝區：計算的不可學習方面。
- 持久緩衝區：包含在 state_dict 中（即儲存和載入時進行序列化）。
- 非持久緩衝區：不包含在 state_dict 中（即在序列化時排除）。

作為緩衝區的用例的激勵性示例，考慮一個維護執行均值的簡單模組。我們希望執行均值的當前值被視為模組 state_dict 的一部分，以便在載入模組的序列化形式時進行恢復，但我們不希望它是可學習的。此程式碼段顯示瞭如何使用 register_buffer() 來實現此目的。

class RunningMean(nn.Module):
  def __init__(self, num_features, momentum=0.9):
    super().__init__()
    self.momentum = momentum
    self.register_buffer('mean', torch.zeros(num_features))
  def forward(self, x):
    self.mean = self.momentum * self.mean + (1.0 - self.momentum) * x
    return self.mean

現在，執行均值的當前值被視為模組 state_dict 的一部分，並且在從磁碟載入模組時會得到正確恢復。

m = RunningMean(4)
for _ in range(10):
  input = torch.randn(4)
  m(input)

print(m.state_dict())
: OrderedDict([('mean', tensor([ 0.1041, -0.1113, -0.0647,  0.1515]))]))

# Serialized form will contain the 'mean' tensor
torch.save(m.state_dict(), 'mean.pt')

m_loaded = RunningMean(4)
m_loaded.load_state_dict(torch.load('mean.pt'))
assert(torch.all(m.mean == m_loaded.mean))

如前所述，可以透過將緩衝區標記為非永續性來將其排除在模組的 state_dict 之外。

self.register_buffer('unserialized_thing', torch.randn(5), persistent=False)

透過 to() 應用於模型的全域性裝置/dtype 更改會同時影響持久和非持久緩衝區。

# Moves all module parameters and buffers to the specified device / dtype
m.to(device='cuda', dtype=torch.float64)

可以使用 buffers() 或 named_buffers() 迭代模組的緩衝區。

for buffer in m.named_buffers():
  print(buffer)

以下類演示了在模組中註冊引數和緩衝區的各種方法。

class StatefulModule(nn.Module):
  def __init__(self):
    super().__init__()
    # Setting a nn.Parameter as an attribute of the module automatically registers the tensor
    # as a parameter of the module.
    self.param1 = nn.Parameter(torch.randn(2))

    # Alternative string-based way to register a parameter.
    self.register_parameter('param2', nn.Parameter(torch.randn(3)))

    # Reserves the "param3" attribute as a parameter, preventing it from being set to anything
    # except a parameter. "None" entries like this will not be present in the module's state_dict.
    self.register_parameter('param3', None)

    # Registers a list of parameters.
    self.param_list = nn.ParameterList([nn.Parameter(torch.randn(2)) for i in range(3)])

    # Registers a dictionary of parameters.
    self.param_dict = nn.ParameterDict({
      'foo': nn.Parameter(torch.randn(3)),
      'bar': nn.Parameter(torch.randn(4))
    })

    # Registers a persistent buffer (one that appears in the module's state_dict).
    self.register_buffer('buffer1', torch.randn(4), persistent=True)

    # Registers a non-persistent buffer (one that does not appear in the module's state_dict).
    self.register_buffer('buffer2', torch.randn(5), persistent=False)

    # Reserves the "buffer3" attribute as a buffer, preventing it from being set to anything
    # except a buffer. "None" entries like this will not be present in the module's state_dict.
    self.register_buffer('buffer3', None)

    # Adding a submodule registers its parameters as parameters of the module.
    self.linear = nn.Linear(2, 3)

m = StatefulModule()

# Save and load state_dict.
torch.save(m.state_dict(), 'state.pt')
m_loaded = StatefulModule()
m_loaded.load_state_dict(torch.load('state.pt'))

# Note that non-persistent buffer "buffer2" and reserved attributes "param3" and "buffer3" do
# not appear in the state_dict.
print(m_loaded.state_dict())
: OrderedDict([('param1', tensor([-0.0322,  0.9066])),
               ('param2', tensor([-0.4472,  0.1409,  0.4852])),
               ('buffer1', tensor([ 0.6949, -0.1944,  1.2911, -2.1044])),
               ('param_list.0', tensor([ 0.4202, -0.1953])),
               ('param_list.1', tensor([ 1.5299, -0.8747])),
               ('param_list.2', tensor([-1.6289,  1.4898])),
               ('param_dict.bar', tensor([-0.6434,  1.5187,  0.0346, -0.4077])),
               ('param_dict.foo', tensor([-0.0845, -1.4324,  0.7022])),
               ('linear.weight', tensor([[-0.3915, -0.6176],
                                         [ 0.6062, -0.5992],
                                         [ 0.4452, -0.2843]])),
               ('linear.bias', tensor([-0.3710, -0.0795, -0.3947]))])

有關更多資訊，請檢視

儲存和載入：https://pytorch.com.tw/tutorials/beginner/saving_loading_models.html
序列化語義：https://pytorch.com.tw/docs/stable/notes/serialization.html
什麼是 state dict？ https://pytorch.com.tw/tutorials/recipes/recipes/what_is_state_dict.html

模組初始化 #

預設情況下，torch.nn 提供的模組的引數和浮點緩衝區在模組例項化期間被初始化為 CPU 上的 32 位浮點值，使用歷史證明對模組型別效果良好的初始化方案。對於某些用例，可能希望使用不同的 dtype、裝置（例如 GPU）或初始化技術。

示例

# Initialize module directly onto GPU.
m = nn.Linear(5, 3, device='cuda')

# Initialize module with 16-bit floating point parameters.
m = nn.Linear(5, 3, dtype=torch.half)

# Skip default parameter initialization and perform custom (e.g. orthogonal) initialization.
m = torch.nn.utils.skip_init(nn.Linear, 5, 3)
nn.init.orthogonal_(m.weight)

請注意，上面演示的裝置和 dtype 選項也適用於為模組註冊的任何浮點緩衝區。

m = nn.BatchNorm2d(3, dtype=torch.half)
print(m.running_mean)
: tensor([0., 0., 0.], dtype=torch.float16)

雖然模組編寫者可以在自定義模組中初始化引數時使用任何裝置或 dtype，但最佳實踐是預設也使用 dtype=torch.float 和 device='cpu'。可選地，可以透過遵循上面演示的 torch.nn 模組所遵循的約定，為自定義模組提供這些方面的完全靈活性。

提供一個 device 建構函式關鍵字引數，該引數適用於模組註冊的任何引數/緩衝區。
提供一個 dtype 建構函式關鍵字引數，該引數適用於模組註冊的任何引數/浮點緩衝區。
僅在模組建構函式中使用初始化函式（即來自 torch.nn.init 的函式）來處理引數和緩衝區。請注意，這僅在使用 skip_init() 時才需要；有關解釋，請參閱此頁面。

有關更多資訊，請檢視

跳過模組引數初始化：https://pytorch.com.tw/tutorials/prototype/skip_param_init.html

模組鉤子 #

在使用模組進行神經網路訓練中，我們演示了模組的訓練過程，該過程迭代地執行前向和後向傳播，並在每次迭代時更新模組引數。為了更好地控制此過程，PyTorch 提供了“鉤子”，可以在前向或後向傳播期間執行任意計算，甚至根據需要修改傳播方式。此功能的一些有用示例包括除錯、視覺化啟用、深入檢查梯度等。鉤子可以新增到您自己未編寫的模組中，這意味著此功能可以應用於第三方或 PyTorch 提供的模組。

PyTorch 為模組提供了兩種型別的鉤子：

前向鉤子在前向傳播期間呼叫。可以使用 register_forward_pre_hook() 和 register_forward_hook() 為給定模組安裝它們。這些鉤子分別在呼叫 forward 函式之前和之後呼叫。或者，可以使用等效的 register_module_forward_pre_hook() 和 register_module_forward_hook() 函式為所有模組全域性安裝這些鉤子。
後向鉤子在後向傳播期間呼叫。可以使用 register_full_backward_pre_hook() 和 register_full_backward_hook() 安裝它們。在計算完此模組的後向傳播後，將呼叫這些鉤子。 register_full_backward_pre_hook() 允許使用者訪問輸出的梯度，而 register_full_backward_hook() 允許使用者訪問輸入和輸出的梯度。或者，可以使用 register_module_full_backward_hook() 和 register_module_full_backward_pre_hook() 為所有模組全域性安裝它們。

所有鉤子都允許使用者返回一個更新後的值，該值將在其餘計算中使用。因此，這些鉤子可用於沿常規模組前向/後向傳播執行任意程式碼，或在不更改模組 forward() 函式的情況下修改某些輸入/輸出。

以下示例演示了前向和後向鉤子的用法。

torch.manual_seed(1)

def forward_pre_hook(m, inputs):
  # Allows for examination and modification of the input before the forward pass.
  # Note that inputs are always wrapped in a tuple.
  input = inputs[0]
  return input + 1.

def forward_hook(m, inputs, output):
  # Allows for examination of inputs / outputs and modification of the outputs
  # after the forward pass. Note that inputs are always wrapped in a tuple while outputs
  # are passed as-is.

  # Residual computation a la ResNet.
  return output + inputs[0]

def backward_hook(m, grad_inputs, grad_outputs):
  # Allows for examination of grad_inputs / grad_outputs and modification of
  # grad_inputs used in the rest of the backwards pass. Note that grad_inputs and
  # grad_outputs are always wrapped in tuples.
  new_grad_inputs = [torch.ones_like(gi) * 42. for gi in grad_inputs]
  return new_grad_inputs

# Create sample module & input.
m = nn.Linear(3, 3)
x = torch.randn(2, 3, requires_grad=True)

# ==== Demonstrate forward hooks. ====
# Run input through module before and after adding hooks.
print('output with no forward hooks: {}'.format(m(x)))
: output with no forward hooks: tensor([[-0.5059, -0.8158,  0.2390],
                                        [-0.0043,  0.4724, -0.1714]], grad_fn=<AddmmBackward>)

# Note that the modified input results in a different output.
forward_pre_hook_handle = m.register_forward_pre_hook(forward_pre_hook)
print('output with forward pre hook: {}'.format(m(x)))
: output with forward pre hook: tensor([[-0.5752, -0.7421,  0.4942],
                                        [-0.0736,  0.5461,  0.0838]], grad_fn=<AddmmBackward>)

# Note the modified output.
forward_hook_handle = m.register_forward_hook(forward_hook)
print('output with both forward hooks: {}'.format(m(x)))
: output with both forward hooks: tensor([[-1.0980,  0.6396,  0.4666],
                                          [ 0.3634,  0.6538,  1.0256]], grad_fn=<AddBackward0>)

# Remove hooks; note that the output here matches the output before adding hooks.
forward_pre_hook_handle.remove()
forward_hook_handle.remove()
print('output after removing forward hooks: {}'.format(m(x)))
: output after removing forward hooks: tensor([[-0.5059, -0.8158,  0.2390],
                                               [-0.0043,  0.4724, -0.1714]], grad_fn=<AddmmBackward>)

# ==== Demonstrate backward hooks. ====
m(x).sum().backward()
print('x.grad with no backwards hook: {}'.format(x.grad))
: x.grad with no backwards hook: tensor([[ 0.4497, -0.5046,  0.3146],
                                         [ 0.4497, -0.5046,  0.3146]])

# Clear gradients before running backward pass again.
m.zero_grad()
x.grad.zero_()

m.register_full_backward_hook(backward_hook)
m(x).sum().backward()
print('x.grad with backwards hook: {}'.format(x.grad))
: x.grad with backwards hook: tensor([[42., 42., 42.],
                                      [42., 42., 42.]])

Modules#

一個簡單的自定義模組 #

模組作為構建塊 #

使用模組進行神經網路訓練 #

模組狀態 #

模組初始化 #

模組鉤子 #

高階特性 #

分散式訓練 #

效能分析 #

透過量化提高效能 #

透過剪枝提高記憶體使用率 #

引數化 #

使用 FX 轉換模組 #

文件

教程

資源