Shortcuts

FakeQuantize

class torch.quantization.fake_quantize.FakeQuantize(observer=<class 'torch.ao.quantization.observer.MovingAverageMinMaxObserver'>, quant_min=0, quant_max=255, **observer_kwargs)[source]

Simulate the quantize and dequantize operations in training time. The output of this module is given by:

x_out = (
  clamp(round(x/scale + zero_point), quant_min, quant_max) - zero_point
) * scale
  • scale defines the scale factor used for quantization.

  • zero_point specifies the quantized value to which 0 in floating point maps to

  • quant_min specifies the minimum allowable quantized value.

  • quant_max specifies the maximum allowable quantized value.

  • fake_quant_enabled controls the application of fake quantization on tensors, note that statistics can still be updated.

  • observer_enabled controls statistics collection on tensors

  • dtype specifies the quantized dtype that is being emulated with fake-quantization,

    allowable values are torch.qint8 and torch.quint8. The values of quant_min and quant_max should be chosen to be consistent with the dtype

Parameters
  • observer (module) – Module for observing statistics on input tensors and calculating scale and zero-point.

  • quant_min (int) – The minimum allowable quantized value.

  • quant_max (int) – The maximum allowable quantized value.

  • observer_kwargs (optional) – Arguments for the observer module

Variables

~FakeQuantize.observer (Module) – User provided module that collects statistics on the input tensor and provides a method to calculate scale and zero-point.

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources