Int8DynActInt4WeightQATLinear¶

class torchao.quantization.qat.linear.Int8DynActInt4WeightQATLinear(in_features: int, out_features: int, bias: bool = False, device: device = None, groupsize: int = 256, precision: dtype = torch.float32, scales_precision: dtype = torch.float32)[原始碼]¶

該模組實現了一個線性層，該層具有 int8 動態每 token 偽量化啟用和 int4 偽量化分組每通道權重。

引數:

groupsize – 權重的每個量化組中的元素數量
precision – 權重的精度
scales_precision – 每組尺度和零點的精度

注意：我們硬編碼啟用尺度以使用 torch.fp32，但允許使用者指定權重尺度（預設為 torch.fp32）。為了與 Int8DynamicActivationInt4WeightConfig 獲得完全相同的數值匹配，使用者必須為權重和尺度使用相同的 dtype。此處 scales_precision 僅指權重尺度，而不指啟用尺度。

Int8DynActInt4WeightQATLinear¶

文件

教程

資源