autoquant¶

torchao.quantization.autoquant(model, example_input=None, qtensor_class_list=[<class 'torchao.quantization.autoquant.AQDefaultLinearWeight'>, <class 'torchao.quantization.autoquant.AQInt8WeightOnlyQuantizedLinearWeight'>, <class 'torchao.quantization.autoquant.AQInt8WeightOnlyQuantizedLinearWeight2'>, <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'>], filter_fn=None, mode=['interpolate', 0.85], manual=False, set_inductor_config=True, supress_autoquant_errors=True, min_sqnr=None, **aq_kwargs)[源]¶

自動量化是一個過程，它在一組潛在的量化張量子類中，識別出對模型的每一層進行量化的最快方法。

自動量化分為三個步驟

1-準備模型：在模型中搜索 Linear 層，將其權重替換為 AutoQuantizableLinearWeight。

2-形狀校準：使用者在一個或多個輸入上執行模型，記錄 AutoQuantizableLinearWeight 所看到的啟用形狀/dtype 的詳細資訊，以便我們知道在步驟 3 中最佳化量化操作時要使用的形狀/dtype。

3-完成自動量化：對於每個 AutoQuantizableLinearWeight，針對 qtensor_class_list 中的每個成員，在每種形狀/dtype 上執行基準測試。: 選擇最快的選項，從而得到一個高效能的模型。

此 autoquant 函式執行步驟 1。步驟 2 和 3 可以透過簡單地執行模型來完成。如果提供了 example_input，此函式也會執行模型（完成步驟 2 和 3）。此 autoquant API 可以處理已經應用了 torch.compile 的模型，在這種情況下，一旦模型執行並量化，torch.compile 過程通常也會繼續進行。

為了最佳化輸入形狀/dtype 的組合，使用者可以將 manual=True，使用所有所需的形狀/dtype 執行模型，然後在所需的輸入集已記錄後呼叫 model.finalize_autoquant 來完成量化。

引數:

model (torch.nn.Module) – 要自動量化的模型。
example_input (任意, 可選) – 模型的示例輸入。如果提供，函式將對此輸入執行前向傳播（除非 manual=True，否則將完全自動量化模型）。預設為 None。
qtensor_class_list (列表, 可選) – 用於量化的張量類列表。預設為 DEFAULT_AUTOQUANT_CLASS_LIST。
filter_fn (可呼叫物件, 可選) – 應用於模型引數的過濾函式。預設為 None。
mode (列表, 可選) – 包含量化模式設定的列表。第一個元素是模式型別（例如，“interpolate”），第二個元素是模式值（例如，0.85）。預設為 [“interpolate”, .85]。
manual (布林值, 可選) – 是否在單次執行後停止形狀校準並進行自動量化（預設，False），或者等待使用者呼叫 model.finalize_autoquant (True) 以便記錄具有多個形狀/dtype 的輸入。
set_inductor_config (布林值, 可選) – 是否自動使用推薦的 Inductor 配置設定（預設為 True）。
supress_autoquant_errors (布林值, 可選) – 是否在自動量化期間抑制錯誤。（預設為 True）。
min_sqnr (浮點數, 可選) – 量化層輸出與非量化層輸出之間的最小可接受信噪比（https://en.wikipedia.org/wiki/Signal-to-quantization-noise_ratio），用於過濾。
impact (導致過大數值的量化方法) –
合理 (使用者可以從一個合理的數字開始) –
結果 (比如 40，並根據結果進行調整) –
**aq_kwargs – 自動量化過程的額外關鍵字引數。

返回:

自動量化並封裝的模型。如果提供了 example_input，函式將對輸入執行前向傳播。: 並返回前向傳播的結果。

返回型別:

torch.nn.Module

使用示例

torchao.autoquant(torch.compile(model)) model(*example_input)

# 多個輸入形狀 torchao.autoquant(model, manual=True) model(*example_input1) model(*example_input2) model.finalize_autoquant()

autoquant¶

文件

教程

資源