CrossEntropyLoss#

class torch.nn.modules.loss.CrossEntropyLoss(weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean', label_smoothing=0.0)[source]#

此準則計算輸入 logits 和 target 之間的交叉熵損失。

當訓練一個有 C 個類別的分類問題時，此函式非常有用。如果提供了可選引數 weight，它應該是一個 1D Tensor，為每個類別分配權重。這在訓練資料集不平衡時尤其有用。

對於未批次的輸入，input 期望包含每個類別的未歸一化 logits（通常 input 不需要是正數或總和為 1）。input 必須是一個大小為 $(C)$ 的 Tensor；對於批次輸入，大小為 $(minibatch, C)$ ；或者對於 $K \geq 1$ 的 K 維情況，大小為 $(minibatch, C, d_1, d_2, ..., d_K)$ 。最後一個形式對於更高維度的輸入很有用，例如為 2D 影像計算每畫素的交叉熵損失。

此標準（criterion）期望的 target 應該包含以下兩種情況之一：

類別索引，範圍為 $[0, C)$ ，其中 $C$ 是類別的數量；如果指定了 ignore_index，此損失也接受該類別的索引（該索引不一定在類別範圍內）。在這種情況下，未歸約的損失（即 reduction 設定為 'none'）可以描述為：

$\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad l_n = - w_{y_n} \log \frac{\exp(x_{n,y_n})}{\sum_{c=1}^C \exp(x_{n,c})} \cdot \mathbb{1}\{y_n \not= \text{ignore\_index}\}$
其中 $x$ 是輸入， $y$ 是目標， $w$ 是權重， $C$ 是類別的數量， $N$ 跨越了小批次維度以及 $d_1, ..., d_k$ 對於 K 維情況。如果 reduction 不是 'none'（預設為 'mean'），則：

$ℓ (x, y) = {\begin{cases} \sum_{n = 1}^{N} \frac{1}{\sum_{n = 1}^{N} w_{y_{n}} \cdot 1 {y_{n} ≠ ignore_index}} l_{n}, & if reduction = ‘mean’; \end{cases}$

∑n=1Nln,if reduction=‘sum’.\ell(x, y) = \begin{cases} \sum_{n=1}^N \frac{1}{\sum_{n=1}^N w_{y_n} \cdot \mathbb{1}\{y_n \not= \text{ignore\_index}\}} l_n, & \text{if reduction} = \text{`mean';}\\ \sum_{n=1}^N l_n, & \text{if reduction} = \text{`sum'.} \end{cases}

ℓ(x,y)={∑n=1N​∑n=1N​wyn​​⋅1{yn​=ignore_index}1​ln​,∑n=1N​ln​,​if reduction=‘mean’;if reduction=‘sum’.​

請注意，這種情況等同於對輸入應用 LogSoftmax，然後應用 NLLLoss。

每個類別的機率；當需要單個類別標籤以外的標籤時（例如，混合標籤、標籤平滑等），此選項很有用。在這種情況下，未歸約的損失（即 reduction 設定為 'none'）可以描述為：

\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad l_n = - \sum_{c=1}^C w_c \log \frac{\exp(x_{n,c})}{\sum_{i=1}^C \exp(x_{n,i})} y_{n,c}

其中 $x$ 是輸入， $y$ 是目標， $w$ 是權重， $C$ 是類別的數量， $N$ 跨越了小批次維度以及 $d_1, ..., d_k$ 對於 K 維情況。如果 reduction 不是 'none'（預設為 'mean'），則：

\ell(x, y) = \begin{cases} \frac{\sum_{n=1}^N l_n}{N}, & \text{if reduction} = \text{`mean';}\\ \sum_{n=1}^N l_n, & \text{if reduction} = \text{`sum'.} \end{cases}

注意

通常情況下，當 target 包含類別索引時，此標準（criterion）的效能更好，因為這樣可以進行最佳化計算。只有當每個小批次項只有一個類別標籤過於受限時，才考慮將 target 提供為類別機率。

引數

weight (Tensor, optional) – 為每個類別手動指定的重縮放權重。如果提供，則必須是一個大小為 C 的 Tensor。
size_average (bool, optional) – 已棄用 (參見 reduction)。預設情況下，損失值在批次中的每個損失元素上取平均值。請注意，對於某些損失，每個樣本有多個元素。如果欄位 size_average 設定為 False，則損失值在每個小批次中而是求和。當 reduce 為 False 時忽略。預設值：True
ignore_index (int, optional) – 指定一個被忽略的目標值，該值不計入輸入梯度。當 size_average 為 True 時，損失將根據非忽略的目標進行平均。請注意，ignore_index 僅在目標包含類別索引時適用。
reduce (bool, optional) – 已棄用 (參見 reduction)。預設情況下，損失值在每個小批次中根據 size_average 對觀測值進行平均或求和。當 reduce 為 False 時，返回每個批次元素的損失值，並忽略 size_average。預設值：True
reduction (str, optional) – 指定應用於輸出的歸約方式：'none' | 'mean' | 'sum'。'none'：不進行歸約，'mean'：取輸出的加權平均值，'sum'：對輸出進行求和。注意：size_average 和 reduce 正在被棄用，在此期間，指定其中任何一個引數都將覆蓋 reduction。預設為：'mean'
label_smoothing (float, optional) – 一個在 [0.0, 1.0] 範圍內的浮點數。指定計算損失時的平滑量，0.0 表示無平滑。目標變為原始真實標籤和均勻分佈的混合，如 Rethinking the Inception Architecture for Computer Vision 中所述。預設值： $0.0$ 。

形狀

輸入：形狀為 $(C)$ 、 $(N, C)$ 或 $(N, C, d_1, d_2, ..., d_K)$ 的 Tensor，其中 $K \geq 1$ 用於 K 維情況。
目標：如果包含類別索引，則形狀為 $()$ 、 $(N)$ 或 $(N, d_1, d_2, ..., d_K)$ 的 K 維情況，其中 $K \geq 1$ 。每個值應在 $[0, C)$ 範圍內。當使用類別索引時，目標資料型別必須是 long 型別。如果包含類別機率，目標必須與輸入具有相同的形狀，並且每個值應在 $[0, 1]$ 範圍內。這意味著當使用類別機率時，目標資料型別必須是 float 型別。請注意，PyTorch 不會嚴格強制執行類別機率的約束，使用者有責任確保 target 包含有效的機率分佈（有關更多詳細資訊，請參閱下面的示例部分）。
輸出：如果 reduction 是 'none'，則形狀為 $()$ 、 $(N)$ 或 $(N, d_1, d_2, ..., d_K)$ 的 K 維情況，取決於輸入的形狀。否則，為標量。

其中

\begin{aligned} C ={} & \text{number of classes} \\ N ={} & \text{batch size} \\ \end{aligned}

示例

>>> # Example of target with class indices
>>> loss = nn.CrossEntropyLoss()
>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = torch.empty(3, dtype=torch.long).random_(5)
>>> output = loss(input, target)
>>> output.backward()
>>>
>>> # Example of target with class probabilities
>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = torch.randn(3, 5).softmax(dim=1)
>>> output = loss(input, target)
>>> output.backward()

注意

當 target 包含類別機率時，它應該由軟標籤組成——也就是說，每個 target 條目應該代表給定資料樣本的可能類別上的機率分佈，其中單個機率在 [0,1] 之間，並且整個分佈的總和為 1。這就是為什麼在上面的類別機率示例中對 target 應用了 softmax() 函式。

PyTorch 不會驗證 target 中提供的值是否在 [0,1] 範圍內，也不會驗證每個資料樣本的分佈是否總和為 1。不會發出警告，使用者有責任確保 target 包含有效的機率分佈。提供任意值可能會在訓練過程中產生誤導性的損失值和不穩定的梯度。

示例

>>> # Example of target with incorrectly specified class probabilities
>>> loss = nn.CrossEntropyLoss()
>>> torch.manual_seed(283)
>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = torch.randn(3, 5)
>>> # Provided target class probabilities are not in range [0,1]
>>> target
tensor([[ 0.7105,  0.4446,  2.0297,  0.2671, -0.6075],
        [-1.0496, -0.2753, -0.3586,  0.9270,  1.0027],
        [ 0.7551,  0.1003,  1.3468, -0.3581, -0.9569]])
>>> # Provided target class probabilities do not sum to 1
>>> target.sum(axis=1)
tensor([2.8444, 0.2462, 0.8873])
>>> # No error message and possible misleading loss value
>>> loss(input, target).item()
4.6379876136779785
>>>
>>> # Example of target with correctly specified class probabilities
>>> # Use .softmax() to ensure true probability distribution
>>> target_new = target.softmax(dim=1)
>>> # New target class probabilities all in range [0,1]
>>> target_new
tensor([[0.1559, 0.1195, 0.5830, 0.1000, 0.0417],
        [0.0496, 0.1075, 0.0990, 0.3579, 0.3860],
        [0.2607, 0.1355, 0.4711, 0.0856, 0.0471]])
>>> # New target class probabilities sum to 1
>>> target_new.sum(axis=1)
tensor([1.0000, 1.0000, 1.0000])
>>> loss(input, target_new).item()
2.55349063873291

forward(input, target)[source]#

執行前向傳播。

返回型別: 張量

CrossEntropyLoss#

文件

教程

資源