HistogramObserver(bins=2048, upsample_rate=128, dtype=torch.quint8, qscheme=torch.per_tensor_affine, reduce_range=False, quant_min=None, quant_max=None, factory_kwargs=None, eps=1.1920928955078125e-07)¶
The module records the running histogram of tensor values along with min/max values.
calculate_qparamswill calculate scale and zero_point.
bins – Number of bins to use for the histogram
upsample_rate – Factor by which the histograms are upsampled, this is used to interpolate histograms with varying ranges across observations
dtype – Quantized data type
qscheme – Quantization scheme to be used
reduce_range – Reduces the range of the quantized data type by 1 bit
eps – Epsilon value for float32, Defaults to torch.finfo(torch.float32).eps.
The scale and zero point are computed as follows:
- Create the histogram of the incoming inputs.
The histogram is computed continuously, and the ranges per bin change with every new tensor observed.
- Search the distribution in the histogram for optimal min/max values.
The search for the min/max values ensures the minimization of the quantization error with respect to the floating point model.
- Compute the scale and zero point the same way as in the