HistogramObserver

class torch.ao.quantization.observer.HistogramObserver(bins=2048, dtype=torch.quint8, qscheme=torch.per_tensor_affine, reduce_range=False, quant_min=None, quant_max=None, factory_kwargs=None, eps=1.1920928955078125e-07, is_dynamic=False, **kwargs)[source][source]

The module records the running histogram of tensor values along with min/max values. calculate_qparams will calculate scale and zero_point.

Parameters

bins (int) – Number of bins to use for the histogram
dtype (dtype) – dtype argument to the quantize node needed to implement the reference model spec
qscheme – Quantization scheme to be used
reduce_range – Reduces the range of the quantized data type by 1 bit
eps (Tensor) – Epsilon value for float32, Defaults to torch.finfo(torch.float32).eps.

The scale and zero point are computed as follows:

Create the histogram of the incoming inputs.
The histogram is computed continuously, and the ranges per bin change with every new tensor observed.
Search the distribution in the histogram for optimal min/max values.
The search for the min/max values ensures the minimization of the quantization error with respect to the floating point model.
Compute the scale and zero point the same way as in the
MinMaxObserver

HistogramObserver

Docs

Tutorials

Resources