MinMaxObserver¶
- class torch.ao.quantization.observer.MinMaxObserver(dtype=torch.quint8, qscheme=torch.per_tensor_affine, reduce_range=False, quant_min=None, quant_max=None, factory_kwargs=None, eps=1.1920928955078125e-07, is_dynamic=False, **kwargs)[source][source]¶
Observer module for computing the quantization parameters based on the running min and max values.
This observer uses the tensor min/max statistics to compute the quantization parameters. The module records the running minimum and maximum of incoming tensors, and uses this statistic to compute the quantization parameters.
- Parameters
dtype – dtype argument to the quantize node needed to implement the reference model spec.
qscheme – Quantization scheme to be used
reduce_range – Reduces the range of the quantized data type by 1 bit
quant_min – Minimum quantization value. If unspecified, it will follow the 8-bit setup.
quant_max – Maximum quantization value. If unspecified, it will follow the 8-bit setup.
eps (Tensor) – Epsilon value for float32, Defaults to torch.finfo(torch.float32).eps.
Given running min/max as and , scale and zero point are computed as:
The running minimum/maximum is computed as:
where is the observed tensor.
The scale and zero point are then computed as:
where and are the minimum and maximum of the quantized data type.
Warning
dtype
can only taketorch.qint8
ortorch.quint8
.Note
If the running minimum equals to the running maximum, the scale and zero_point are set to 1.0 and 0.