Shortcuts

torchrec.inference

Torchrec Inference

Torchrec inference provides a Torch.Deploy based library for GPU inference.

These includes:
  • Model packaging in Python
    • PredictModule and PredictFactory are the contracts between the Python model authoring and the C++ model serving.

    • PredictFactoryPackager can be used to package a PredictFactory class using torch.package.

  • Model serving in C++
    • BatchingQueue is a generalized config-based request tensor batching implementation.

    • GPUExecutor handles the forward call into the inference model inside Torch.Deploy.

We implemented an example of how to use this library with the TorchRec DLRM model.
  • examples/dlrm/inference/dlrm_packager.py: this demonstrates how to export the DLRM model as a torch.package.

  • examples/dlrm/inference/dlrm_predict.py: this shows how to use PredictModule and PredictFactory based on an existing model.

torchrec.inference.model_packager

class torchrec.inference.model_packager.PredictFactoryPackager

Bases: object

classmethod save_predict_factory(predict_factory: ~typing.Type[~torchrec.inference.modules.PredictFactory], configs: ~typing.Dict[str, ~typing.Any], output: ~typing.Union[str, ~pathlib.Path, ~typing.BinaryIO], extra_files: ~typing.Dict[str, ~typing.Union[str, bytes]], loader_code: str = '\nimport %PACKAGE%\n\nMODULE_FACTORY=%PACKAGE%.%CLASS%\n', package_importer: ~typing.Union[~torch.package.importer.Importer, ~typing.List[~torch.package.importer.Importer]] = <torch.package.importer._SysImporter object>) None
abstract classmethod set_extern_modules()

A decorator indicating abstract classmethods.

Deprecated, use ‘classmethod’ with ‘abstractmethod’ instead.

abstract classmethod set_mocked_modules()

A decorator indicating abstract classmethods.

Deprecated, use ‘classmethod’ with ‘abstractmethod’ instead.

torchrec.inference.model_packager.load_config_text(name: str) str
torchrec.inference.model_packager.load_pickle_config(name: str, clazz: Type[T]) T

torchrec.inference.modules

class torchrec.inference.modules.BatchingMetadata(type: str, device: str, pinned: List[str])

Bases: object

Metadata class for batching, this should be kept in sync with the C++ definition.

device: str
pinned: List[str]
type: str
class torchrec.inference.modules.PredictFactory

Bases: ABC

Creates a model (with already learned weights) to be used inference time.

abstract batching_metadata() Dict[str, BatchingMetadata]

Returns a dict from input name to BatchingMetadata. This infomation is used for batching for input requests.

batching_metadata_json() str

Serialize the batching metadata to JSON, for ease of parsing with torch::deploy environments.

abstract create_predict_module() Module

Returns already sharded model with allocated weights. state_dict() must match TransformModule.transform_state_dict(). It assumes that torch.distributed.init_process_group was already called and will shard model according to torch.distributed.get_world_size().

model_inputs_data() Dict[str, Any]

Returns a dict of various data for benchmarking input generation.

qualname_metadata() Dict[str, QualNameMetadata]

Returns a dict from qualname (method name) to QualNameMetadata. This is additional information for execution of specific methods of the model.

qualname_metadata_json() str

Serialize the qualname metadata to JSON, for ease of parsing with torch::deploy environments.

abstract result_metadata() str

Returns a string which represents the result type. This information is used for result split.

abstract run_weights_dependent_transformations(predict_module: Module) Module

Run transformations that depends on weights of the predict module. e.g. lowering to a backend.

abstract run_weights_independent_tranformations(predict_module: Module) Module

Run transformations that don’t rely on weights of the predict module. e.g. fx tracing, model split etc.

class torchrec.inference.modules.PredictModule(module: Module)

Bases: Module

Interface for modules to work in a torch.deploy based backend. Users should override predict_forward to convert batch input format to module input format.

Call Args:

batch: a dict of input tensors

Returns:

a dict of output tensors

Return type:

output

Parameters:
  • module – the actual predict module

  • device – the primary device for this module that will be used in forward calls.

Example:

module = PredictModule(torch.device("cuda", torch.cuda.current_device()))
forward(batch: Dict[str, Tensor]) Any

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

abstract predict_forward(batch: Dict[str, Tensor]) Any
property predict_module: Module
state_dict(destination: Optional[Dict[str, Any]] = None, prefix: str = '', keep_vars: bool = False) Dict[str, Any]

Return a dictionary containing references to the whole state of the module.

Both parameters and persistent buffers (e.g. running averages) are included. Keys are corresponding parameter and buffer names. Parameters and buffers set to None are not included.

Note

The returned object is a shallow copy. It contains references to the module’s parameters and buffers.

Warning

Currently state_dict() also accepts positional arguments for destination, prefix and keep_vars in order. However, this is being deprecated and keyword arguments will be enforced in future releases.

Warning

Please avoid the use of argument destination as it is not designed for end-users.

Parameters:
  • destination (dict, optional) – If provided, the state of module will be updated into the dict and the same object is returned. Otherwise, an OrderedDict will be created and returned. Default: None.

  • prefix (str, optional) – a prefix added to parameter and buffer names to compose the keys in state_dict. Default: ''.

  • keep_vars (bool, optional) – by default the Tensor s returned in the state dict are detached from autograd. If it’s set to True, detaching will not be performed. Default: False.

Returns:

a dictionary containing a whole state of the module

Return type:

dict

Example:

>>> # xdoctest: +SKIP("undefined vars")
>>> module.state_dict().keys()
['bias', 'weight']
training: bool
class torchrec.inference.modules.QualNameMetadata(need_preproc: bool)

Bases: object

need_preproc: bool
torchrec.inference.modules.quantize_dense(predict_module: PredictModule, dtype: dtype, additional_embedding_module_type: List[Type[Module]] = []) Module
torchrec.inference.modules.quantize_embeddings(module: Module, dtype: dtype, inplace: bool, additional_qconfig_spec_keys: Optional[List[Type[Module]]] = None, additional_mapping: Optional[Dict[Type[Module], Type[Module]]] = None, output_dtype: dtype = torch.float32, per_table_weight_dtype: Optional[Dict[str, dtype]] = None) Module
torchrec.inference.modules.quantize_feature(module: Module, inputs: Tuple[Tensor, ...]) Tuple[Tensor, ...]
torchrec.inference.modules.trim_torch_package_prefix_from_typename(typename: str) str

Module contents

Torchrec Inference

Torchrec inference provides a Torch.Deploy based library for GPU inference.

These includes:
  • Model packaging in Python
    • PredictModule and PredictFactory are the contracts between the Python model authoring and the C++ model serving.

    • PredictFactoryPackager can be used to package a PredictFactory class using torch.package.

  • Model serving in C++
    • BatchingQueue is a generalized config-based request tensor batching implementation.

    • GPUExecutor handles the forward call into the inference model inside Torch.Deploy.

We implemented an example of how to use this library with the TorchRec DLRM model.
  • examples/dlrm/inference/dlrm_packager.py: this demonstrates how to export the DLRM model as a torch.package.

  • examples/dlrm/inference/dlrm_predict.py: this shows how to use PredictModule and PredictFactory based on an existing model.

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources