SmoothL1Loss#

class torch.nn.SmoothL1Loss(size_average=None, reduce=None, reduction='mean', beta=1.0)[原始碼]#

建立一個準則，當元素級絕對誤差小於 beta 時使用平方項，否則使用 L1 項。它比 torch.nn.MSELoss 對離群點不那麼敏感，並且在某些情況下可以防止梯度爆炸（例如，請參閱 Ross Girshick 的論文 Fast R-CNN）。

對於大小為 $N$ 的批次，未約簡的損失可以描述為

\ell(x, y) = L = \{l_1, ..., l_N\}^T

替換

l_n = \begin{cases} 0.5 (x_n - y_n)^2 / beta, & \text{if } |x_n - y_n| < beta \\ |x_n - y_n| - 0.5 * beta, & \text{otherwise } \end{cases}

如果 reduction 不是 none，則

\ell(x, y) = \begin{cases} \operatorname{mean}(L), & \text{if reduction} = \text{`mean';}\\ \operatorname{sum}(L), & \text{if reduction} = \text{`sum'.} \end{cases}

注意

Smooth L1 loss 可以看作是 L1Loss 的精確形式，但將 $|x - y| < beta$ 的部分替換為二次函式，使其在 $|x - y| = beta$ 處的斜率為 1。二次函式段平滑了 $|x - y| = 0$ 附近的 L1 損失。

注意

Smooth L1 loss 與 HuberLoss 密切相關，相當於 $huber(x, y) / beta$ （請注意，Smooth L1 的 beta 超引數也稱為 Huber 的 delta）。這導致了以下差異：

當 beta -> 0 時，Smooth L1 loss 收斂到 L1Loss，而 HuberLoss 收斂到常數 0 loss。當 beta 為 0 時，Smooth L1 loss 等價於 L1 loss。
當 beta -> $+\infty$ 時，Smooth L1 loss 收斂到常數 0 loss，而 HuberLoss 收斂到 MSELoss。
對於 Smooth L1 loss，當 beta 變化時，loss 的 L1 段具有恆定的斜率 1。對於 HuberLoss，L1 段的斜率是 beta。

引數

size_average (bool, optional) – 已棄用 (參見 reduction)。預設情況下，損失值在批次中的每個損失元素上取平均值。請注意，對於某些損失，每個樣本有多個元素。如果欄位 size_average 設定為 False，則損失值在每個小批次中而是求和。當 reduce 為 False 時忽略。預設值：True
reduce (bool, optional) – 已棄用 (參見 reduction)。預設情況下，損失值在每個小批次中根據 size_average 對觀測值進行平均或求和。當 reduce 為 False 時，返回每個批次元素的損失值，並忽略 size_average。預設值：True
reduction (str, optional) – 指定要應用於輸出的歸約：'none' | 'mean' | 'sum'。'none'：不進行歸約，'mean'：輸出的總和將除以輸出中的元素數量，'sum'：將輸出相加。注意：size_average 和 reduce 正在被棄用，在此期間，指定這兩個引數中的任何一個都將覆蓋 reduction。預設值：'mean'
beta (float, optional) – 指定在 L1 和 L2 loss 之間切換的閾值。該值必須是非負數。預設值：1.0

形狀

forward(input, target)[原始碼]#

執行前向傳播。

文件