.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "prototype/fx_graph_mode_ptq_dynamic.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_prototype_fx_graph_mode_ptq_dynamic.py: (prototype) FX Graph Mode Post Training Dynamic Quantization ============================================================ **Author**: `Jerry Zhang `_ This tutorial introduces the steps to do post training dynamic quantization in graph mode based on ``torch.fx``. We have a separate tutorial for `FX Graph Mode Post Training Static Quantization `_, comparison between FX Graph Mode Quantization and Eager Mode Quantization can be found in the `quantization docs `_ tldr; The FX Graph Mode API for dynamic quantization looks like the following: .. code:: python import torch from torch.ao.quantization import default_dynamic_qconfig, QConfigMapping # Note that this is temporary, we'll expose these functions to torch.ao.quantization after official releasee from torch.quantization.quantize_fx import prepare_fx, convert_fx float_model.eval() # The old 'fbgemm' is still available but 'x86' is the recommended default. qconfig = get_default_qconfig("x86") qconfig_mapping = QConfigMapping().set_global(qconfig) prepared_model = prepare_fx(float_model, qconfig_mapping, example_inputs) # fuse modules and insert observers # no calibration is required for dynamic quantization quantized_model = convert_fx(prepared_model) # convert the model to a dynamically quantized model In this tutorial, we’ll apply dynamic quantization to an LSTM-based next word-prediction model, closely following the word language model from the PyTorch examples. We will copy the code from `Dynamic Quantization on an LSTM Word Language Model `_ and omit the descriptions. .. GENERATED FROM PYTHON SOURCE LINES 37-57 1. Define the Model, Download Data and Model -------------------------------------------- Download the `data `_ and unzip to data folder .. code:: mkdir data cd data wget https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-v1.zip unzip wikitext-2-v1.zip Download model to the data folder: .. code:: wget https://s3.amazonaws.com/pytorch-tutorial-assets/word_language_model_quantize.pth Define the model: .. GENERATED FROM PYTHON SOURCE LINES 57-226 .. code-block:: default # imports # Model Definition # Load Text Data # Load Pretrained Model # create test data set # Evaluation functions .. GENERATED FROM PYTHON SOURCE LINES 227-231 2. Post Training Dynamic Quantization ------------------------------------- Now we can dynamically quantize the model. We can use the same function as post training static quantization but with a dynamic qconfig. .. GENERATED FROM PYTHON SOURCE LINES 231-269 .. code-block:: default # Full docs for supported qconfig for floating point modules/ops can be found in `quantization docs `_ # Full docs for `QConfigMapping `_ # Load model to create the original model because quantization api changes the model inplace and we want # to keep the original model for future comparison .. GENERATED FROM PYTHON SOURCE LINES 270-279 For dynamically quantized objects, we didn't do anything in ``prepare_fx`` for modules, but will insert observers for weight for dynamically quantizable forunctionals and torch ops. We also fuse the modules like Conv + Bn, Linear + ReLU. In convert we'll convert the float modules to dynamically quantized modules and convert float ops to dynamically quantized ops. We can see in the example model, ``nn.Embedding``, ``nn.Linear`` and ``nn.LSTM`` are dynamically quantized. Now we can compare the size and runtime of the quantized model. .. GENERATED FROM PYTHON SOURCE LINES 289-291 There is a 4x size reduction because we quantized all the weights in the model (nn.Embedding, nn.Linear and nn.LSTM) from float (4 bytes) to quantized int (1 byte). .. GENERATED FROM PYTHON SOURCE LINES 304-311 There is a roughly 2x speedup for this model. Also note that the speedup may vary depending on model, device, build, input batch sizes, threading etc. 3. Conclusion ------------- This tutorial introduces the api for post training dynamic quantization in FX Graph Mode, which dynamically quantizes the same modules as Eager Mode Quantization. .. GENERATED FROM PYTHON SOURCE LINES 311-312 .. code-block:: default # %%%%%%RUNNABLE_CODE_REMOVED%%%%%% .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 0.000 seconds) .. _sphx_glr_download_prototype_fx_graph_mode_ptq_dynamic.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: fx_graph_mode_ptq_dynamic.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: fx_graph_mode_ptq_dynamic.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_