Shortcuts

Quantization API Reference

torch.quantization

This module contains Eager mode quantization APIs.

Top level APIs

quantize

Quantize the input float model with post training static quantization.

quantize_dynamic

Converts a float model to dynamic (i.e.

quantize_qat

Do quantization aware training and output a quantized model

prepare

Prepares a copy of the model for quantization calibration or quantization-aware training.

prepare_qat

Prepares a copy of the model for quantization calibration or quantization-aware training and converts it to quantized version.

convert

Converts submodules in input module to a different module according to mapping by calling from_float method on the target module class.

Preparing model for quantization

fuse_modules

Fuses a list of modules into a single module

QuantStub

Quantize stub module, before calibration, this is same as an observer, it will be swapped as nnq.Quantize in convert.

DeQuantStub

Dequantize stub module, before calibration, this is same as identity, this will be swapped as nnq.DeQuantize in convert.

QuantWrapper

A wrapper class that wraps the input module, adds QuantStub and DeQuantStub and surround the call to module with call to quant and dequant modules.

add_quant_dequant

Wrap the leaf child module in QuantWrapper if it has a valid qconfig Note that this function will modify the children of module inplace and it can return a new module which wraps the input module as well.

Utility functions

add_observer_

Add observer for the leaf child of the module.

swap_module

Swaps the module if it has a quantized counterpart and it has an observer attached.

propagate_qconfig_

Propagate qconfig through the module hierarchy and assign qconfig attribute on each leaf module

default_eval_fn

Default evaluation function takes a torch.utils.data.Dataset or a list of input Tensors and run the model on the dataset

get_observer_dict

Traverse the modules and save all observers into dict.

torch.quantization.quantize_fx

This module contains FX graph mode quantization APIs (prototype).

prepare_fx

Prepare a model for post training static quantization

prepare_qat_fx

Prepare a model for quantization aware training

convert_fx

Convert a calibrated or trained model to a quantized model

fuse_fx

Fuse modules like conv+bn, conv+bn+relu etc, model must be in eval mode.

torch.quantization.observer

This module contains observers which are used to collect statistics about the values observed during calibration (PTQ) or training (QAT).

ObserverBase

Base observer Module.

MinMaxObserver

Observer module for computing the quantization parameters based on the running min and max values.

MovingAverageMinMaxObserver

Observer module for computing the quantization parameters based on the moving average of the min and max values.

PerChannelMinMaxObserver

Observer module for computing the quantization parameters based on the running per channel min and max values.

MovingAveragePerChannelMinMaxObserver

Observer module for computing the quantization parameters based on the running per channel min and max values.

HistogramObserver

The module records the running histogram of tensor values along with min/max values.

PlaceholderObserver

Observer that doesn’t do anything and just passes its configuration to the quantized module’s .from_float().

RecordingObserver

The module is mainly for debug and records the tensor values during runtime.

NoopObserver

Observer that doesn’t do anything and just passes its configuration to the quantized module’s .from_float().

get_observer_state_dict

Returns the state dict corresponding to the observer stats.

load_observer_state_dict

Given input model and a state_dict containing model observer stats, load the stats back into the model.

default_observer

default_placeholder_observer

Default placeholder observer, usually used for quantization to torch.float16.

default_debug_observer

Default debug-only observer.

default_weight_observer

default_histogram_observer

default_per_channel_weight_observer

default_dynamic_quant_observer

default_float_qparams_observer

torch.quantization.fake_quantize

This module implements modules which are used to perform fake quantization during QAT.

FakeQuantizeBase

Base fake quantize module Any fake quantize implementation should derive from this class.

FakeQuantize

Simulate the quantize and dequantize operations in training time. The output of this module is given by::.

FixedQParamsFakeQuantize

Simulate quantize and dequantize with fixed quantization parameters in training time.

FusedMovingAvgObsFakeQuantize

Fused module that is used to observe the input tensor (compute min/max), compute scale/zero_point and fake_quantize the tensor.

default_fake_quant

default_weight_fake_quant

default_per_channel_weight_fake_quant

default_histogram_fake_quant

default_fused_act_fake_quant

default_fused_wt_fake_quant

default_fused_per_channel_wt_fake_quant

disable_fake_quant

Disable fake quantization for this module, if applicable. Example usage::.

enable_fake_quant

Enable fake quantization for this module, if applicable. Example usage::.

disable_observer

Disable observation for this module, if applicable. Example usage::.

enable_observer

Enable observation for this module, if applicable. Example usage::.

torch.quantization.qconfig

This module defines QConfig objects which are used to configure quantization settings for individual ops.

QConfig

Describes how to quantize a layer or a part of the network by providing settings (observer classes) for activations and weights respectively.

default_qconfig

Describes how to quantize a layer or a part of the network by providing settings (observer classes) for activations and weights respectively.

default_debug_qconfig

Describes how to quantize a layer or a part of the network by providing settings (observer classes) for activations and weights respectively.

default_per_channel_qconfig

Describes how to quantize a layer or a part of the network by providing settings (observer classes) for activations and weights respectively.

default_dynamic_qconfig

Describes how to quantize a layer or a part of the network by providing settings (observer classes) for activations and weights respectively.

float16_dynamic_qconfig

Describes how to quantize a layer or a part of the network by providing settings (observer classes) for activations and weights respectively.

float16_static_qconfig

Describes how to quantize a layer or a part of the network by providing settings (observer classes) for activations and weights respectively.

per_channel_dynamic_qconfig

Describes how to quantize a layer or a part of the network by providing settings (observer classes) for activations and weights respectively.

float_qparams_weight_only_qconfig

Describes how to quantize a layer or a part of the network by providing settings (observer classes) for activations and weights respectively.

default_qat_qconfig

Describes how to quantize a layer or a part of the network by providing settings (observer classes) for activations and weights respectively.

default_weight_only_qconfig

Describes how to quantize a layer or a part of the network by providing settings (observer classes) for activations and weights respectively.

default_activation_only_qconfig

Describes how to quantize a layer or a part of the network by providing settings (observer classes) for activations and weights respectively.

default_qat_qconfig_v2

Describes how to quantize a layer or a part of the network by providing settings (observer classes) for activations and weights respectively.

torch.nn.intrinsic

This module implements the combined (fused) modules conv + relu which can then be quantized.

ConvReLU1d

This is a sequential container which calls the Conv1d and ReLU modules.

ConvReLU2d

This is a sequential container which calls the Conv2d and ReLU modules.

ConvReLU3d

This is a sequential container which calls the Conv3d and ReLU modules.

LinearReLU

This is a sequential container which calls the Linear and ReLU modules.

ConvBn1d

This is a sequential container which calls the Conv 1d and Batch Norm 1d modules.

ConvBn2d

This is a sequential container which calls the Conv 2d and Batch Norm 2d modules.

ConvBn3d

This is a sequential container which calls the Conv 3d and Batch Norm 3d modules.

ConvBnReLU1d

This is a sequential container which calls the Conv 1d, Batch Norm 1d, and ReLU modules.

ConvBnReLU2d

This is a sequential container which calls the Conv 2d, Batch Norm 2d, and ReLU modules.

ConvBnReLU3d

This is a sequential container which calls the Conv 3d, Batch Norm 3d, and ReLU modules.

BNReLU2d

This is a sequential container which calls the BatchNorm 2d and ReLU modules.

BNReLU3d

This is a sequential container which calls the BatchNorm 3d and ReLU modules.

torch.nn.intrinsic.qat

This module implements the versions of those fused operations needed for quantization aware training.

LinearReLU

A LinearReLU module fused from Linear and ReLU modules, attached with FakeQuantize modules for weight, used in quantization aware training.

ConvBn1d

A ConvBn1d module is a module fused from Conv1d and BatchNorm1d, attached with FakeQuantize modules for weight, used in quantization aware training.

ConvBnReLU1d

A ConvBnReLU1d module is a module fused from Conv1d, BatchNorm1d and ReLU, attached with FakeQuantize modules for weight, used in quantization aware training.

ConvBn2d

A ConvBn2d module is a module fused from Conv2d and BatchNorm2d, attached with FakeQuantize modules for weight, used in quantization aware training.

ConvBnReLU2d

A ConvBnReLU2d module is a module fused from Conv2d, BatchNorm2d and ReLU, attached with FakeQuantize modules for weight, used in quantization aware training.

ConvReLU2d

A ConvReLU2d module is a fused module of Conv2d and ReLU, attached with FakeQuantize modules for weight for quantization aware training.

ConvBn3d

A ConvBn3d module is a module fused from Conv3d and BatchNorm3d, attached with FakeQuantize modules for weight, used in quantization aware training.

ConvBnReLU3d

A ConvBnReLU3d module is a module fused from Conv3d, BatchNorm3d and ReLU, attached with FakeQuantize modules for weight, used in quantization aware training.

ConvReLU3d

A ConvReLU3d module is a fused module of Conv3d and ReLU, attached with FakeQuantize modules for weight for quantization aware training.

update_bn_stats

freeze_bn_stats

torch.nn.intrinsic.quantized

This module implements the quantized implementations of fused operations like conv + relu. No BatchNorm variants as it’s usually folded into convolution for inference.

BNReLU2d

A BNReLU2d module is a fused module of BatchNorm2d and ReLU

BNReLU3d

A BNReLU3d module is a fused module of BatchNorm3d and ReLU

ConvReLU1d

A ConvReLU1d module is a fused module of Conv1d and ReLU

ConvReLU2d

A ConvReLU2d module is a fused module of Conv2d and ReLU

ConvReLU3d

A ConvReLU3d module is a fused module of Conv3d and ReLU

LinearReLU

A LinearReLU module fused from Linear and ReLU modules

torch.nn.intrinsic.quantized.dynamic

This module implements the quantized dynamic implementations of fused operations like linear + relu.

LinearReLU

A LinearReLU module fused from Linear and ReLU modules that can be used for dynamic quantization.

torch.nn.qat

This module implements versions of the key nn modules Conv2d() and Linear() which run in FP32 but with rounding applied to simulate the effect of INT8 quantization.

Conv2d

A Conv2d module attached with FakeQuantize modules for weight, used for quantization aware training.

Conv3d

A Conv3d module attached with FakeQuantize modules for weight, used for quantization aware training.

Linear

A linear module attached with FakeQuantize modules for weight, used for quantization aware training.

torch.nn.qat.dynamic

This module implements versions of the key nn modules such as Linear() which run in FP32 but with rounding applied to simulate the effect of INT8 quantization and will be dynamically quantized during inference.

Linear

A linear module attached with FakeQuantize modules for weight, used for dynamic quantization aware training.

torch.nn.quantized

This module implements the quantized versions of the nn layers such as ~`torch.nn.Conv2d` and torch.nn.ReLU.

ReLU6

Applies the element-wise function:

Hardswish

This is the quantized version of Hardswish.

ELU

This is the quantized equivalent of ELU.

LeakyReLU

This is the quantized equivalent of LeakyReLU.

Sigmoid

This is the quantized equivalent of Sigmoid.

BatchNorm2d

This is the quantized version of BatchNorm2d.

BatchNorm3d

This is the quantized version of BatchNorm3d.

Conv1d

Applies a 1D convolution over a quantized input signal composed of several quantized input planes.

Conv2d

Applies a 2D convolution over a quantized input signal composed of several quantized input planes.

Conv3d

Applies a 3D convolution over a quantized input signal composed of several quantized input planes.

ConvTranspose1d

Applies a 1D transposed convolution operator over an input image composed of several input planes.

ConvTranspose2d

Applies a 2D transposed convolution operator over an input image composed of several input planes.

ConvTranspose3d

Applies a 3D transposed convolution operator over an input image composed of several input planes.

Embedding

A quantized Embedding module with quantized packed weights as inputs.

EmbeddingBag

A quantized EmbeddingBag module with quantized packed weights as inputs.

FloatFunctional

State collector class for float operations.

FXFloatFunctional

module to replace FloatFunctional module before FX graph mode quantization, since activation_post_process will be inserted in top level module directly

QFunctional

Wrapper class for quantized operations.

Linear

A quantized linear module with quantized tensor as inputs and outputs.

LayerNorm

This is the quantized version of LayerNorm.

GroupNorm

This is the quantized version of GroupNorm.

InstanceNorm1d

This is the quantized version of InstanceNorm1d.

InstanceNorm2d

This is the quantized version of InstanceNorm2d.

InstanceNorm3d

This is the quantized version of InstanceNorm3d.

torch.nn.quantized.functional

This module implements the quantized versions of the functional layers such as ~`torch.nn.functional.conv2d` and torch.nn.functional.relu. Note: relu() supports quantized inputs.

avg_pool2d

Applies 2D average-pooling operation in kH×kWkH \times kW regions by step size sH×sWsH \times sW steps.

avg_pool3d

Applies 3D average-pooling operation in kD timeskH×kWkD \ times kH \times kW regions by step size sD×sH×sWsD \times sH \times sW steps.

adaptive_avg_pool2d

Applies a 2D adaptive average pooling over a quantized input signal composed of several quantized input planes.

adaptive_avg_pool3d

Applies a 3D adaptive average pooling over a quantized input signal composed of several quantized input planes.

conv1d

Applies a 1D convolution over a quantized 1D input composed of several input planes.

conv2d

Applies a 2D convolution over a quantized 2D input composed of several input planes.

conv3d

Applies a 3D convolution over a quantized 3D input composed of several input planes.

interpolate

Down/up samples the input to either the given size or the given scale_factor

linear

Applies a linear transformation to the incoming quantized data: y=xAT+by = xA^T + b.

max_pool1d

Applies a 1D max pooling over a quantized input signal composed of several quantized input planes.

max_pool2d

Applies a 2D max pooling over a quantized input signal composed of several quantized input planes.

celu

Applies the quantized CELU function element-wise.

leaky_relu

Applies element-wise, LeakyReLU(x)=max(0,x)+negative_slopemin(0,x)\text{LeakyReLU}(x) = \max(0, x) + \text{negative\_slope} * \min(0, x)

hardtanh

This is the quantized version of hardtanh().

hardswish

This is the quantized version of hardswish().

threshold

Applies the quantized version of the threshold function element-wise:

elu

This is the quantized version of elu().

hardsigmoid

This is the quantized version of hardsigmoid().

clamp

float(input, min_, max_) -> Tensor

upsample

Upsamples the input to either the given size or the given scale_factor

upsample_bilinear

Upsamples the input, using bilinear upsampling.

upsample_nearest

Upsamples the input, using nearest neighbours’ pixel values.

torch.nn.quantized.dynamic

Dynamically quantized Linear, LSTM, LSTMCell, GRUCell, and RNNCell.

Linear

A dynamic quantized linear module with floating point tensor as inputs and outputs.

LSTM

A dynamic quantized LSTM module with floating point tensor as inputs and outputs.

GRU

Applies a multi-layer gated recurrent unit (GRU) RNN to an input sequence.

RNNCell

An Elman RNN cell with tanh or ReLU non-linearity.

LSTMCell

A long short-term memory (LSTM) cell.

GRUCell

A gated recurrent unit (GRU) cell

Quantized dtypes and quantization schemes

Note that operator implementations currently only support per channel quantization for weights of the conv and linear operators. Furthermore, the input data is mapped linearly to the the quantized data and vice versa as follows:

Quantization:Qout=clamp(xinput/s+z,Qmin,Qmax)Dequantization:xout=(Qinputz)s\begin{aligned} \text{Quantization:}&\\ &Q_\text{out} = \text{clamp}(x_\text{input}/s+z, Q_\text{min}, Q_\text{max})\\ \text{Dequantization:}&\\ &x_\text{out} = (Q_\text{input}-z)*s \end{aligned}

where clamp(.)\text{clamp}(.) is the same as clamp() while the scale ss and zero point zz are then computed as decribed in MinMaxObserver, specifically:

if Symmetric:s=2max(xmin,xmax)/(QmaxQmin)z={0if dtype is qint8128otherwiseOtherwise:s=(xmaxxmin)/(QmaxQmin)z=Qminround(xmin/s)\begin{aligned} \text{if Symmetric:}&\\ &s = 2 \max(|x_\text{min}|, x_\text{max}) / \left( Q_\text{max} - Q_\text{min} \right) \\ &z = \begin{cases} 0 & \text{if dtype is qint8} \\ 128 & \text{otherwise} \end{cases}\\ \text{Otherwise:}&\\ &s = \left( x_\text{max} - x_\text{min} \right ) / \left( Q_\text{max} - Q_\text{min} \right ) \\ &z = Q_\text{min} - \text{round}(x_\text{min} / s) \end{aligned}

where [xmin,xmax][x_\text{min}, x_\text{max}] denotes the range of the input data while QminQ_\text{min} and QmaxQ_\text{max} are respectively the minimum and maximum values of the quantized dtype.

Note that the choice of ss and zz implies that zero is represented with no quantization error whenever zero is within the range of the input data or symmetric quantization is being used.

Additional data types and quantization schemes can be implemented through the custom operator mechanism.

  • torch.qscheme — Type to describe the quantization scheme of a tensor. Supported types:

    • torch.per_tensor_affine — per tensor, asymmetric

    • torch.per_channel_affine — per channel, asymmetric

    • torch.per_tensor_symmetric — per tensor, symmetric

    • torch.per_channel_symmetric — per channel, symmetric

  • torch.dtype — Type to describe the data. Supported types:

    • torch.quint8 — 8-bit unsigned integer

    • torch.qint8 — 8-bit signed integer

    • torch.qint32 — 32-bit signed integer

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources