quantize¶
- torchao.quantization.quantize_(model: Module, config: AOBaseConfig, filter_fn: Optional[Callable[[Module, str], bool]] = None, device: Optional[Union[device, str, int]] = None)[原始碼]¶
使用 config 轉換模型中線性模組的權重,模型將被就地修改。
- 引數:
model (torch.nn.Module) – 輸入模型
config (AOBaseConfig) – 工作流配置物件。
filter_fn (Optional[Callable[[torch.nn.Module, str], bool]]) – 一個函式,接受 nn.Module 例項和模組的完全限定名,如果希望執行 config 則返回 True。
module (the weight of the) –
device (device, optional) – 在應用 filter_fn 之前將模組移動到的裝置。可以設定為 “cuda” 來加速量化。最終模型將在指定的 device 上。預設為 None(不更改裝置)。
示例
import torch import torch.nn as nn from torchao import quantize_ # quantize with some predefined `config` method that corresponds to # optimized execution paths or kernels (e.g. int4 tinygemm kernel) # also customizable with arguments # currently options are # int8_dynamic_activation_int4_weight (for executorch) # int8_dynamic_activation_int8_weight (optimized with int8 mm op and torch.compile) # int4_weight_only (optimized with int4 tinygemm kernel and torch.compile) # int8_weight_only (optimized with int8 mm op and torch.compile from torchao.quantization.quant_api import int4_weight_only m = nn.Sequential(nn.Linear(32, 1024), nn.Linear(1024, 32)) quantize_(m, int4_weight_only(group_size=32))