torch_tensorrt.fx

Functions

torch_tensorrt.fx.compile(module: Module, input, min_acc_module_size: int = 10, max_batch_size: int = 2048, max_workspace_size=33554432, explicit_batch_dimension=False, lower_precision=LowerPrecision.FP16, verbose_log=False, timing_cache_prefix='', save_timing_cache=False, cuda_graph_batch_size=- 1, dynamic_batch=True, is_aten=False, use_experimental_fx_rt=False, correctness_atol=0.1, correctness_rtol=0.1) → Module[source]

Takes in original module, input and lowering setting, run lowering workflow to turn module into lowered module, or so called TRTModule.

Parameters

module – Original module for lowering.
input – Input for module.
max_batch_size – Maximum batch size (must be >= 1 to be set, 0 means not set)
min_acc_module_size – Minimal number of nodes for an accelerated submodule
max_workspace_size – Maximum size of workspace given to TensorRT.
explicit_batch_dimension – Use explicit batch dimension in TensorRT if set True, otherwise use implicit batch dimension.
lower_precision – lower_precision config given to TRTModule.
verbose_log – Enable verbose log for TensorRT if set True.
timing_cache_prefix – Timing cache file name for timing cache used by fx2trt.
save_timing_cache – Update timing cache with current timing cache data if set to True.
cuda_graph_batch_size – Cuda graph batch size, default to be -1.
dynamic_batch – batch dimension (dim=0) is dynamic.
use_experimental_fx_rt – Uses the next generation TRTModule which supports both Python and TorchScript based execution (including in C++).

Returns

A torch.nn.Module lowered by TensorRT.

Classes

class torch_tensorrt.fx.TRTModule(engine=None, input_names=None, output_names=None, cuda_graph_batch_size=- 1)[source]

class torch_tensorrt.fx.InputTensorSpec(shape: Sequence[int], dtype: dtype, device: device = device(type='cpu'), shape_ranges: List[Tuple[Sequence[int], Sequence[int], Sequence[int]]] = [], has_batch_dim: bool = True)[source]

This class contains the information of a input tensor.

shape: shape of the tensor.

dtype: dtyep of the tensor.

device: device of the tensor. This is only used to generate inputs to the given model: in order to run shape prop. For TensorRT engine, inputs have to be on cuda device.
shape_ranges: If dynamic shape is needed (shape has dimensions of -1), then this field: has to be provided (default is empty list). Every shape_range is a tuple of three tuples ((min_input_shape), (optimized_input_shape), (max_input_shape)). Each shape_range is used to populate a TensorRT optimization profile. e.g. If the input shape varies from (1, 224) to (100, 224) and we want to optimize for (25, 224) because it’s the most common input shape, then we set shape_ranges to ((1, 224), (25, 225), (100, 224)).
has_batch_dim: Whether the shape includes batch dimension. Batch dimension has to be provided: if the engine want to run with dynamic shape.

class torch_tensorrt.fx.TRTInterpreter(module: GraphModule, input_specs: List[InputTensorSpec], explicit_batch_dimension: bool = False, explicit_precision: bool = False, logger_level=None)[source]

class torch_tensorrt.fx.TRTInterpreterResult(engine, input_names, output_names, serialized_cache)[source]

torch_tensorrt.fx

Functions

Classes

Docs

Tutorials

Resources