torchao.quantization
Main Quantization APIs
Quantization APIs for quantize_
alias of |
|
alias of |
|
alias of |
|
alias of |
|
alias of |
|
alias of |
|
alias of |
|
alias of |
|
alias of |
|
alias of |
|
alias of |
Quantization Primitives
|
|
A variant of |
|
|
|
Quantizes the float32 high precision floating point tensor to low precision floating point number and converts the result to unpacked floating point format with the format of 00SEEEMM (for fp6_e3m2) where S means sign bit, e means exponent bit and m means mantissa bit |
|
|
|
General fake quantize op for quantization-aware training (QAT). |
|
General fake quantize op for quantization-aware training (QAT). |
|
Performs a safe integer matrix multiplication, considering different paths for torch.compile, cublas, and fallback cases. |
|
Performs scaled integer matrix multiplication. |
|
How floating point number is mapped to integer number |
|
Enum that indicate whether zero_point is in integer domain or floating point domain |
|
Placeholder for dtypes that do not exist in PyTorch core yet. |
Other
Replaces linear layers in the model with their SmoothFakeDynamicallyQuantizedLinear equivalents. |
|
Prepares the model for inference by calculating the smoothquant scale for each SmoothFakeDynamicallyQuantizedLinear layer. |