torch_tensorrt.fx.compile(module: torch.nn.modules.module.Module, input, min_acc_module_size: int = 10, max_batch_size: int = 2048, max_workspace_size=33554432, explicit_batch_dimension=False, lower_precision=LowerPrecision.FP16, verbose_log=False, timing_cache_prefix='', save_timing_cache=False, cuda_graph_batch_size=- 1, dynamic_batch=True, is_aten=False, use_experimental_fx_rt=False) torch.nn.modules.module.Module[source]

Takes in original module, input and lowering setting, run lowering workflow to turn module into lowered module, or so called TRTModule.

  • module – Original module for lowering.

  • input – Input for module.

  • max_batch_size – Maximum batch size (must be >= 1 to be set, 0 means not set)

  • min_acc_module_size – Minimal number of nodes for an accelerated submodule

  • max_workspace_size – Maximum size of workspace given to TensorRT.

  • explicit_batch_dimension – Use explicit batch dimension in TensorRT if set True, otherwise use implicit batch dimension.

  • lower_precision – lower_precision config given to TRTModule.

  • verbose_log – Enable verbose log for TensorRT if set True.

  • timing_cache_prefix – Timing cache file name for timing cache used by fx2trt.

  • save_timing_cache – Update timing cache with current timing cache data if set to True.

  • cuda_graph_batch_size – Cuda graph batch size, default to be -1.

  • dynamic_batch – batch dimension (dim=0) is dynamic.

  • use_experimental_fx_rt – Uses the next generation TRTModule which supports both Python and TorchScript based execution (including in C++).


A torch.nn.Module lowered by TensorRT.


class torch_tensorrt.fx.TRTModule(engine=None, input_names=None, output_names=None, cuda_graph_batch_size=- 1)[source]
class torch_tensorrt.fx.InputTensorSpec(shape: Sequence[int], dtype: torch.dtype, device: torch.device = device(type='cpu'), shape_ranges: List[Tuple[Sequence[int], Sequence[int], Sequence[int]]] = [], has_batch_dim: bool = True)[source]

This class contains the information of a input tensor.

shape: shape of the tensor.

dtype: dtyep of the tensor.

device: device of the tensor. This is only used to generate inputs to the given model

in order to run shape prop. For TensorRT engine, inputs have to be on cuda device.

shape_ranges: If dynamic shape is needed (shape has dimensions of -1), then this field

has to be provided (default is empty list). Every shape_range is a tuple of three tuples ((min_input_shape), (optimized_input_shape), (max_input_shape)). Each shape_range is used to populate a TensorRT optimization profile. e.g. If the input shape varies from (1, 224) to (100, 224) and we want to optimize for (25, 224) because it’s the most common input shape, then we set shape_ranges to ((1, 224), (25, 225), (100, 224)).

has_batch_dim: Whether the shape includes batch dimension. Batch dimension has to be provided

if the engine want to run with dynamic shape.

class torch_tensorrt.fx.TRTInterpreter(module: torch.fx.graph_module.GraphModule, input_specs: List[torch_tensorrt.fx.input_tensor_spec.InputTensorSpec], explicit_batch_dimension: bool = False, explicit_precision: bool = False, logger_level=None)[source]
class torch_tensorrt.fx.TRTInterpreterResult(engine, input_names, output_names, serialized_cache)[source]


Access comprehensive developer documentation for PyTorch

View Docs


Get in-depth tutorials for beginners and advanced developers

View Tutorials


Find development resources and get your questions answered

View Resources