Inference
----------------------------------

TorchRec provides easy-to-use APIs for transforming an authored TorchRec model
into an optimized inference model for distributed inference, via eager module swaps.

This transforms TorchRec modules like ``EmbeddingBagCollection`` in the model to
a quantized, sharded version that can be compiled using torch.fx and TorchScript
for inference in a C++ environment.

The intended use is calling ``quantize_inference_model`` on the model followed by
``shard_quant_model``.

.. codeblock::

.. automodule:: torchrec.inference.modules

.. autofunction:: quantize_inference_model
.. autofunction:: shard_quant_model