torch_tensorrt

Functions

torch_tensorrt.compile(module: Any, ir: str = 'default', inputs: Optional[Sequence[Input | torch.Tensor | InputTensorSpec]] = None, arg_inputs: Optional[Sequence[Sequence[Any]]] = None, kwarg_inputs: Optional[dict[Any, Any]] = None, enabled_precisions: Optional[Set[Union[dtype, dtype]]] = None, **kwargs: Any) → Union[Module, ScriptModule, GraphModule, Callable[[...], Any]][source]

Compile a PyTorch module for NVIDIA GPUs using TensorRT

Takes a existing PyTorch module and a set of settings to configure the compiler and using the path specified in ir lower and compile the module to TensorRT returning a PyTorch Module back

Converts specifically the forward method of a Module

Parameters

module (Union(torch.nn.Module,torch.jit.ScriptModule) – Source module

Keyword Arguments

inputs (List[Union(Input, torch.Tensor)]) –

Required List of specifications of input shape, dtype and memory layout for inputs to the module. This argument is required. Input Sizes can be specified as torch sizes, tuples or lists. dtypes can be specified using torch datatypes or torch_tensorrt datatypes and you can use either torch devices or the torch_tensorrt device type enum to select device type.

inputs=[
    torch_tensorrt.Input((1, 3, 224, 224)), # Static NCHW input shape for input #1
    torch_tensorrt.Input(
        min_shape=(1, 224, 224, 3),
        opt_shape=(1, 512, 512, 3),
        max_shape=(1, 1024, 1024, 3),
        dtype=torch.int32
        format=torch.channel_last
    ), # Dynamic input shape for input #2
    torch.randn((1, 3, 224, 244)) # Use an example tensor and let torch_tensorrt infer settings
]

arg_inputs (Tuple[Any, ...]) – Same as inputs. Alias for better understanding with kwarg_inputs.
kwarg_inputs (dict[Any, ...]) – Optional, kwarg inputs to the module forward function.
enabled_precision (Set(Union(torch.dpython:type, torch_tensorrt.dpython:type))) – The set of datatypes that TensorRT can use when selecting kernels
ir (str) – The requested strategy to compile. (Options: default - Let Torch-TensorRT decide, ts - TorchScript with scripting path)
**kwargs – Additional settings for the specific requested strategy (See submodules for more info)

Returns

Compiled Module, when run it will execute via TensorRT

Return type

torch.nn.Module

torch_tensorrt.convert_method_to_trt_engine(module: Any, method_name: str = 'forward', inputs: Optional[Sequence[Input | torch.Tensor | InputTensorSpec]] = None, arg_inputs: Optional[Sequence[Sequence[Any]]] = None, kwarg_inputs: Optional[dict[Any, Any]] = None, ir: str = 'default', enabled_precisions: Optional[Set[Union[dtype, dtype]]] = None, **kwargs: Any) → bytes[source]

Convert a TorchScript module method to a serialized TensorRT engine

Converts a specified method of a module to a serialized TensorRT engine given a dictionary of conversion settings

Parameters

module (Union(torch.nn.Module,torch.jit.ScriptModule) – Source module

Keyword Arguments

inputs (List[Union(Input, torch.Tensor)]) –

Required List of specifications of input shape, dtype and memory layout for inputs to the module. This argument is required. Input Sizes can be specified as torch sizes, tuples or lists. dtypes can be specified using torch datatypes or torch_tensorrt datatypes and you can use either torch devices or the torch_tensorrt device type enum to select device type.

input=[
    torch_tensorrt.Input((1, 3, 224, 224)), # Static NCHW input shape for input #1
    torch_tensorrt.Input(
        min_shape=(1, 224, 224, 3),
        opt_shape=(1, 512, 512, 3),
        max_shape=(1, 1024, 1024, 3),
        dtype=torch.int32
        format=torch.channel_last
    ), # Dynamic input shape for input #2
    torch.randn((1, 3, 224, 244)) # Use an example tensor and let torch_tensorrt infer settings
]

arg_inputs (Tuple[Any, ...]) – Same as inputs. Alias for better understanding with kwarg_inputs.
kwarg_inputs (dict[Any, ...]) – Optional, kwarg inputs to the module forward function.
enabled_precision (Set(Union(torch.dpython:type, torch_tensorrt.dpython:type))) – The set of datatypes that TensorRT can use when selecting kernels
ir (str) – The requested strategy to compile. (Options: default - Let Torch-TensorRT decide, ts - TorchScript with scripting path)
**kwargs – Additional settings for the specific requested strategy (See submodules for more info)

Returns

Serialized TensorRT engine, can either be saved to a file or deserialized via TensorRT APIs

Return type

bytes

torch_tensorrt.cross_compile_for_windows(module: Module, file_path: str, inputs: Optional[Sequence[Input | torch.Tensor]] = None, arg_inputs: Optional[Sequence[Sequence[Any]]] = None, kwarg_inputs: Optional[dict[Any, Any]] = None, enabled_precisions: Optional[Set[Union[dtype, dtype]]] = None, **kwargs: Any) → None[source]

Compile a PyTorch module using TensorRT in Linux for Inference in Windows

Takes an existing PyTorch module and a set of settings to configure the compiler and it will convert methods to AOT graphs which call equivalent TensorRT serialized engine info into the disk in the specified file_path user provided. It will then allow user to load the deserialized model from the disk in Windows. Note: the model cross compiled for windows in Linux environmen can only be loaded in Windows.

Argument:: module (torch.nn.Module): Source module file_path (str): the file path to store the serialized module into the disk

Keyword Arguments

inputs (List[Union(Input, torch.Tensor)]) –

Required List of specifications of input shape, dtype and memory layout for inputs to the module. This argument is required. Input Sizes can be specified as torch sizes, tuples or lists. dtypes can be specified using torch datatypes or torch_tensorrt datatypes and you can use either torch devices or the torch_tensorrt device type enum to select device type.

inputs=[
    torch_tensorrt.Input((1, 3, 224, 224)), # Static NCHW input shape for input #1
    torch_tensorrt.Input(
        min_shape=(1, 224, 224, 3),
        opt_shape=(1, 512, 512, 3),
        max_shape=(1, 1024, 1024, 3),
        dtype=torch.int32
        format=torch.channel_last
    ), # Dynamic input shape for input #2
    torch.randn((1, 3, 224, 244)) # Use an example tensor and let torch_tensorrt infer settings
]

arg_inputs (Tuple[Any, ...]) – Same as inputs. Alias for better understanding with kwarg_inputs.
kwarg_inputs (dict[Any, ...]) – Optional, kwarg inputs to the module forward function.
enabled_precision (Set(Union(torch.dpython:type, torch_tensorrt.dpython:type))) – The set of datatypes that TensorRT can use when selecting kernels
**kwargs – Additional settings for the specific requested strategy (See submodules for more info)

torch_tensorrt.load_cross_compiled_exported_program(file_path: str = '') → Any[source]

Load an ExportedProgram file in Windows which was previously cross compiled in Linux

Parameters: file_path (str) – Path to file on the disk
Raises: ValueError – If the api is not called in windows or there is no file or the file is not a valid ExportedProgram file

torch_tensorrt.save(module: Any, file_path: str = '', *, output_format: str = 'exported_program', inputs: Optional[Sequence[Tensor]] = None, arg_inputs: Optional[Sequence[Tensor]] = None, kwarg_inputs: Optional[dict[str, Any]] = None, retrace: bool = False) → None[source]

Save the model to disk in the specified output format.

Parameters

module (Optional(torch.jit.ScriptModule | torch.export.ExportedProgram | torch.fx.GraphModule | CudaGraphsTorchTensorRTModule)) – Compiled Torch-TensorRT module
inputs (torch.Tensor) – Torch input tensors
arg_inputs (Tuple[Any, ...]) – Same as inputs. Alias for better understanding with kwarg_inputs.
kwarg_inputs (dict[Any, ...]) – Optional, kwarg inputs to the module forward function.
output_format (str) – Format to save the model. Options include exported_program | torchscript.
retrace (bool) – When the module type is a fx.GraphModule, this option re-exports the graph using torch.export.export(strict=False) to save it. This flag is experimental for now.

torch_tensorrt.load(file_path: str = '') → Any[source]

Load either a Torchscript model or ExportedProgram.

Loads a TorchScript or ExportedProgram file from disk. File type will be detect the type using try, except.

Parameters: file_path (str) – Path to file on the disk
Raises: ValueError – If there is no file or the file is not either a TorchScript file or ExportedProgram file

Classes

class torch_tensorrt.MutableTorchTensorRTModule(pytorch_model: Module, *, device: Optional[Union[Device, device, str]] = None, disable_tf32: bool = False, assume_dynamic_shape_support: bool = False, sparse_weights: bool = False, enabled_precisions: Set[Union[dtype, dtype]] = {dtype.f32}, engine_capability: EngineCapability = EngineCapability.STANDARD, immutable_weights: bool = False, debug: bool = False, num_avg_timing_iters: int = 1, workspace_size: int = 0, dla_sram_size: int = 1048576, dla_local_dram_size: int = 1073741824, dla_global_dram_size: int = 536870912, truncate_double: bool = False, require_full_compilation: bool = False, min_block_size: int = 5, torch_executed_ops: Optional[Collection[Union[Callable[[...], Any], str]]] = None, torch_executed_modules: Optional[List[str]] = None, pass_through_build_failures: bool = False, max_aux_streams: Optional[int] = None, version_compatible: bool = False, optimization_level: Optional[int] = None, use_python_runtime: bool = False, use_fast_partitioner: bool = True, enable_experimental_decompositions: bool = False, dryrun: bool = False, hardware_compatible: bool = False, timing_cache_path: str = '/tmp/torch_tensorrt_engine_cache/timing_cache.bin', **kwargs: Any)[source]

Initialize a MutableTorchTensorRTModule to seamlessly manipulate it like a regular PyTorch module. All TensorRT compilation and refitting processes are handled automatically as you work with the module. Any changes to its attributes or loading a different state_dict will trigger refitting or recompilation, which will be managed during the next forward pass.

The MutableTorchTensorRTModule takes a PyTorch module and a set of configuration settings for the compiler. Once compilation is complete, the module maintains the connection between the TensorRT graph module and the original PyTorch module. Any modifications made to the MutableTorchTensorRTModule will be reflected in both the TensorRT graph module and the original PyTorch module.

__init__(pytorch_model: Module, *, device: Optional[Union[Device, device, str]] = None, disable_tf32: bool = False, assume_dynamic_shape_support: bool = False, sparse_weights: bool = False, enabled_precisions: Set[Union[dtype, dtype]] = {dtype.f32}, engine_capability: EngineCapability = EngineCapability.STANDARD, immutable_weights: bool = False, debug: bool = False, num_avg_timing_iters: int = 1, workspace_size: int = 0, dla_sram_size: int = 1048576, dla_local_dram_size: int = 1073741824, dla_global_dram_size: int = 536870912, truncate_double: bool = False, require_full_compilation: bool = False, min_block_size: int = 5, torch_executed_ops: Optional[Collection[Union[Callable[[...], Any], str]]] = None, torch_executed_modules: Optional[List[str]] = None, pass_through_build_failures: bool = False, max_aux_streams: Optional[int] = None, version_compatible: bool = False, optimization_level: Optional[int] = None, use_python_runtime: bool = False, use_fast_partitioner: bool = True, enable_experimental_decompositions: bool = False, dryrun: bool = False, hardware_compatible: bool = False, timing_cache_path: str = '/tmp/torch_tensorrt_engine_cache/timing_cache.bin', **kwargs: Any) → None[source]

Parameters

pytorch_model (torch.nn.module) – Source module that needs to be accelerated

Keyword Arguments

device (Union(Device, torch.device, dict)) –
Target device for TensorRT engines to run on
```
device=torch_tensorrt.Device("dla:1", allow_gpu_fallback=True)
```
disable_tf32 (bool) – Force FP32 layers to use traditional as FP32 format vs the default behavior of rounding the inputs to 10-bit mantissas before multiplying, but accumulates the sum using 23-bit mantissas
assume_dynamic_shape_support (bool) – Setting this to true enables the converters work for both dynamic and static shapes. Default: False
sparse_weights (bool) – Enable sparsity for convolution and fully connected layers.
enabled_precision (Set(Union(torch.dpython:type, torch_tensorrt.dpython:type))) – The set of datatypes that TensorRT can use when selecting kernels
immutable_weights (bool) – Build non-refittable engines. This is useful for some layers that are not refittable.
debug (bool) – Enable debuggable engine
capability (EngineCapability) – Restrict kernel selection to safe gpu kernels or safe dla kernels
num_avg_timing_iters (python:int) – Number of averaging timing iterations used to select kernels
workspace_size (python:int) – Maximum size of workspace given to TensorRT
dla_sram_size (python:int) – Fast software managed RAM used by DLA to communicate within a layer.
dla_local_dram_size (python:int) – Host RAM used by DLA to share intermediate tensor data across operations
dla_global_dram_size (python:int) – Host RAM used by DLA to store weights and metadata for execution
truncate_double (bool) – Truncate weights provided in double (float64) to float32
calibrator (Union(torch_tensorrt._C.IInt8Calibrator, tensorrt.IInt8Calibrator)) – Calibrator object which will provide data to the PTQ system for INT8 Calibration
require_full_compilation (bool) – Require modules to be compiled end to end or return an error as opposed to returning a hybrid graph where operations that cannot be run in TensorRT are run in PyTorch
min_block_size (python:int) – The minimum number of contiguous TensorRT convertible operations in order to run a set of operations in TensorRT
torch_executed_ops (Collection[Target]) – Set of aten operators that must be run in PyTorch. An error will be thrown if this set is not empty but require_full_compilation is True
torch_executed_modules (List[str]) – List of modules that must be run in PyTorch. An error will be thrown if this list is not empty but require_full_compilation is True
pass_through_build_failures (bool) – Error out if there are issues during compilation (only applicable to torch.compile workflows)
max_aux_stream (Optional[python:int]) – Maximum streams in the engine
version_compatible (bool) – Build the TensorRT engines compatible with future versions of TensorRT (Restrict to lean runtime operators to provide version forward compatibility for the engines)
optimization_level – (Optional[int]): Setting a higher optimization level allows TensorRT to spend longer engine building time searching for more optimization options. The resulting engine may have better performance compared to an engine built with a lower optimization level. The default optimization level is 3. Valid values include integers from 0 to the maximum optimization level, which is currently 5. Setting it to be greater than the maximum level results in identical behavior to the maximum level.
use_python_runtime – (bool): Return a graph using a pure Python runtime, reduces options for serialization
use_fast_partitioner – (bool): Use the adjacency based partitioning scheme instead of the global partitioner. Adjacency partitioning is faster but may not be optimal. Use the global paritioner (False) if looking for best performance
enable_experimental_decompositions (bool) – Use the full set of operator decompositions. These decompositions may not be tested but serve to make the graph easier to convert to TensorRT, potentially increasing the amount of graphs run in TensorRT.
dryrun (bool) – Toggle for “Dryrun” mode, running everything except conversion to TRT and logging outputs
hardware_compatible (bool) – Build the TensorRT engines compatible with GPU architectures other than that of the GPU on which the engine was built (currently works for NVIDIA Ampere and newer)
timing_cache_path (str) – Path to the timing cache if it exists (or) where it will be saved after compilation
lazy_engine_init (bool) – Defer setting up engines until the compilation of all engines is complete. Can allow larger models with multiple graph breaks to compile but can lead to oversubscription of GPU memory at runtime.
**kwargs – Any,

Returns

MutableTorchTensorRTModule

compile() → None[source]: (Re)compile the TRT graph module using the PyTorch module. This function should be called whenever the weight structure get changed (shape, more layers…) MutableTorchTensorRTModule automatically catches weight value updates and call this function to recompile. If it fails to catch the changes, please call this function manually to recompile the TRT graph module.

refit_gm() → None[source]: Refit the TRT graph module with any updates. This function should be called whenever the weight values get changed but the weight structure remains the same. MutableTorchTensorRTModule automatically catches weight value updates and call this function to refit the module. If it fails to catch the changes, please call this function manually to update the TRT graph module.

set_expected_dynamic_shape_range(args_dynamic_shape: tuple[dict[Any, Any]], kwargs_dynamic_shape: dict[str, Any]) → None[source]

Set the dynamic shape range. The shape hint should EXACTLY follow arg_inputs and kwarg_inputs passed to the forward function and should not omit any entries (except None in the kwarg_inputs). If there is a nested dict/list in the input, the dynamic shape for that entry should also be an nested dict/list. If the dynamic shape is not required for an input, an empty dictionary should be given as the shape hint for that input. Note that you should exclude keyword arguments with value None as those will be filtered out.

Example: def forward(a, b, c=0, d=0):

pass

seq_len = torch.export.Dim(“seq_len”, min=1, max=10) args_dynamic_shape = ({0: seq_len}, {}) # b does not have a dynamic shape kwargs_dynamic_shape = {‘c’: {0, seq_len}, ‘d’: {}} # d does not have a dynamic shape set_expected_dynamic_shape_range(args_dynamic_shape, kwargs_dynamic_shape) # Later when you call the function forward(*(a, b), **{c:…, d:…})

Reference: https://pytorch.org/docs/stable/export.html#expressing-dynamism :param args_dynamic_shape: Dynamic shape hint for the arg_inputs, :type args_dynamic_shape: tuple[dict[Any, Any]] :param kwargs_dynamic_shape: (dict[str, Any]): Dynamic shape hint for the kwarg_inputs

class torch_tensorrt.Input(*args: Any, **kwargs: Any)[source]

Defines an input to a module in terms of expected shape, data type and tensor format.

Variables

shape_mode (torch_tensorrt.Input._ShapeMode) – Is input statically or dynamically shaped
shape (Tuple or Dict) –
Either a single Tuple or a dict of tuples defining the input shape. Static shaped inputs will have a single tuple. Dynamic inputs will have a dict of the form
```
{"min_shape": Tuple, "opt_shape": Tuple, "max_shape": Tuple}
```
dtype (torch_tensorrt.dpython:type) – The expected data type of the input tensor (default: torch_tensorrt.dtype.float32)
format (torch_tensorrt.TensorFormat) – The expected format of the input tensor (default: torch_tensorrt.TensorFormat.NCHW)

__init__(*args: Any, **kwargs: Any) → None[source]

__init__ Method for torch_tensorrt.Input

Input accepts one of a few construction patterns

Parameters

shape (Tuple or List, optional) – Static shape of input tensor

Keyword Arguments

shape (Tuple or List, optional) – Static shape of input tensor
min_shape (Tuple or List, optional) – Min size of input tensor’s shape range Note: All three of min_shape, opt_shape, max_shape must be provided, there must be no positional arguments, shape must not be defined and implicitly this sets Input’s shape_mode to DYNAMIC
opt_shape (Tuple or List, optional) – Opt size of input tensor’s shape range Note: All three of min_shape, opt_shape, max_shape must be provided, there must be no positional arguments, shape must not be defined and implicitly this sets Input’s shape_mode to DYNAMIC
max_shape (Tuple or List, optional) – Max size of input tensor’s shape range Note: All three of min_shape, opt_shape, max_shape must be provided, there must be no positional arguments, shape must not be defined and implicitly this sets Input’s shape_mode to DYNAMIC
dtype (torch.dpython:type or torch_tensorrt.dpython:type) – Expected data type for input tensor (default: torch_tensorrt.dtype.float32)
format (torch.memory_format or torch_tensorrt.TensorFormat) – The expected format of the input tensor (default: torch_tensorrt.TensorFormat.NCHW)
tensor_domain (Tuple(python:float, python:float), optional) – The domain of allowed values for the tensor, as interval notation: [tensor_domain[0], tensor_domain[1]). Note: Entering “None” (or not specifying) will set the bound to [0, 2)
torch_tensor (torch.Tensor) – Holds a corresponding torch tensor with this Input.
name (str, optional) – Name of this input in the input nn.Module’s forward function. Used to specify dynamic shapes for the corresponding input in dynamo tracer.

Examples

Input([1,3,32,32], dtype=torch.float32, format=torch.channel_last)
Input(shape=(1,3,32,32), dtype=torch_tensorrt.dtype.int32, format=torch_tensorrt.TensorFormat.NCHW)
Input(min_shape=(1,3,32,32), opt_shape=[2,3,32,32], max_shape=(3,3,32,32)) #Implicitly dtype=torch_tensorrt.dtype.float32, format=torch_tensorrt.TensorFormat.NCHW

example_tensor(optimization_profile_field: Optional[str] = None) → Tensor[source]

Get an example tensor of the shape specified by the Input object

Parameters: optimization_profile_field (Optional(str)) – Name of the field to use for shape in the case the Input is dynamically shaped
Returns: A PyTorch Tensor

classmethod from_tensor(t: Tensor, disable_memory_format_check: bool = False) → Input[source]

Produce a Input which contains the information of the given PyTorch tensor.

Parameters

tensor (torch.Tensor) – A PyTorch tensor.
disable_memory_format_check (bool) – Whether to validate the memory formats of input tensors

Returns

A Input object.

classmethod from_tensors(ts: Sequence[Tensor], disable_memory_format_check: bool = False) → List[Input][source]

Produce a list of Inputs which contain the information of all the given PyTorch tensors.

Parameters

tensors (Iterable[torch.Tensor]) – A list of PyTorch tensors.
disable_memory_format_check (bool) – Whether to validate the memory formats of input tensors

Returns

A list of Inputs.

dtype: dtype = 1

torch_tensorrt.dtype.float32)

Type: The expected data type of the input tensor (default

format: memory_format = 1

torch_tensorrt.memory_format.linear)

Type: The expected format of the input tensor (default

class torch_tensorrt.Device(*args: Any, **kwargs: Any)[source]

Defines a device that can be used to specify target devices for engines

Variables

device_type (DeviceType) – Target device type (GPU or DLA). Set implicitly based on if dla_core is specified.
gpu_id (python:int) – Device ID for target GPU
dla_core (python:int) – Core ID for target DLA core
allow_gpu_fallback (bool) – Whether falling back to GPU if DLA cannot support an op should be allowed

__init__(*args: Any, **kwargs: Any)[source]

__init__ Method for torch_tensorrt.Device

Device accepts one of a few construction patterns

Parameters

spec (str) – String with device spec e.g. “dla:0” for dla, core_id 0

Keyword Arguments

gpu_id (python:int) – ID of target GPU (will get overridden if dla_core is specified to the GPU managing DLA). If specified, no positional arguments should be provided
dla_core (python:int) – ID of target DLA core. If specified, no positional arguments should be provided.
allow_gpu_fallback (bool) – Allow TensorRT to schedule operations on GPU if they are not supported on DLA (ignored if device type is not DLA)

Examples

Device(“gpu:1”)
Device(“cuda:1”)
Device(“dla:0”, allow_gpu_fallback=True)
Device(gpu_id=0, dla_core=0, allow_gpu_fallback=True)
Device(dla_core=0, allow_gpu_fallback=True)
Device(gpu_id=1)

device_type: DeviceType = 1: Target device type (GPU or DLA). Set implicitly based on if dla_core is specified.

dla_core: int = -1: Core ID for target DLA core

gpu_id: int = -1: Device ID for target GPU

Enums

class torch_tensorrt.dtype(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Enum to describe data types to Torch-TensorRT, has compatibility with torch, tensorrt and numpy dtypes

to(t: Union[Type[dtype], Type[DataType], Type[dtype], Type[dtype]], use_default: bool = False) → Union[dtype, DataType, dtype, dtype][source]

Convert dtype into the equivalent type in [torch, numpy, tensorrt]

Converts self into one of numpy, torch, and tensorrt equivalent dtypes. If self is not supported in the target library, then an exception will be raised. As such it is not recommended to use this method directly.

Alternatively use torch_tensorrt.dtype.try_to()

Parameters

t (Union(Type(torch.dpython:type), Type(tensorrt.DataType), Type(numpy.dpython:type), Type(dpython:type))) – Data type enum from another library to convert to
use_default (bool) – In some cases a catch all type (such as torch.float) is sufficient, so instead of throwing an exception, return default value.

Returns

dtype equivalent torch_tensorrt.dtype from library enum t

Return type

Union(torch.dtype, tensorrt.DataType, numpy.dtype, dtype)

Raises

TypeError – Unsupported data type or unknown target

Examples

# Succeeds
float_dtype = torch_tensorrt.dtype.f32.to(torch.dtype) # Returns torch.float

# Failure
float_dtype = torch_tensorrt.dtype.bf16.to(numpy.dtype) # Throws exception

classmethod try_from(t: Union[dtype, DataType, dtype, dtype], use_default: bool = False) → Optional[dtype][source]

Create a Torch-TensorRT dtype from another library’s dtype system.

Takes a dtype enum from one of numpy, torch, and tensorrt and create a torch_tensorrt.dtype. If the source dtype system is not supported or the type is not supported in Torch-TensorRT, then returns None.

Parameters

t (Union(torch.dpython:type, tensorrt.DataType, numpy.dpython:type, dpython:type)) – Data type enum from another library
use_default (bool) – In some cases a catch all type (such as torch_tensorrt.dtype.f32) is sufficient, so instead of throwing an exception, return default value.

Returns

Equivalent torch_tensorrt.dtype to t or None

Return type

Optional(dtype)

Examples

# Succeeds
float_dtype = torch_tensorrt.dtype.try_from(torch.float) # Returns torch_tensorrt.dtype.f32

# Unsupported type
float_dtype = torch_tensorrt.dtype.try_from(torch.complex128) # Returns None

try_to(t: Union[Type[dtype], Type[DataType], Type[dtype], Type[dtype]], use_default: bool) → Optional[Union[dtype, DataType, dtype, dtype]][source]

Convert dtype into the equivalent type in [torch, numpy, tensorrt]

Converts self into one of numpy, torch, and tensorrt equivalent dtypes. If self is not supported in the target library, then returns None.

Parameters

t (Union(Type(torch.dpython:type), Type(tensorrt.DataType), Type(numpy.dpython:type), Type(dpython:type))) – Data type enum from another library to convert to
use_default (bool) – In some cases a catch all type (such as torch.float) is sufficient, so instead of throwing an exception, return default value.

Returns

dtype equivalent torch_tensorrt.dtype from library enum t

Return type

Optional(Union(torch.dtype, tensorrt.DataType, numpy.dtype, dtype))

Examples

# Succeeds
float_dtype = torch_tensorrt.dtype.f32.to(torch.dtype) # Returns torch.float

# Failure
float_dtype = torch_tensorrt.dtype.bf16.to(numpy.dtype) # Returns None

b

Boolean value, equivalent to dtype.bool

bf16

16 bit “Brain” floating-point number, equivalent to dtype.bfloat16

f16

16 bit floating-point number, equivalent to dtype.half, dtype.fp16 and dtype.float16

f32

32 bit floating-point number, equivalent to dtype.float, dtype.fp32 and dtype.float32

f64

64 bit floating-point number, equivalent to dtype.double, dtype.fp64 and dtype.float64

f8

8 bit floating-point number, equivalent to dtype.fp8 and dtype.float8

i32

Signed 32 bit integer, equivalent to dtype.int32 and dtype.int

i64

Signed 64 bit integer, equivalent to dtype.int64 and dtype.long

i8

Signed 8 bit integer, equivalent to dtype.int8, when enabled as a kernel precision typically requires the model to support quantization

u8

Unsigned 8 bit integer, equivalent to dtype.uint8

unknown

Sentinel value

class torch_tensorrt.DeviceType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Type of device TensorRT will target

to(t: Union[Type[DeviceType], Type[DeviceType]], use_default: bool = False) → Union[DeviceType, DeviceType][source]

Convert DeviceType into the equivalent type in tensorrt

Converts self into one of torch or tensorrt equivalent device type. If self is not supported in the target library, then an exception will be raised. As such it is not recommended to use this method directly.

Alternatively use torch_tensorrt.DeviceType.try_to()

Parameters: t (Union(Type(tensorrt.DeviceType), Type(DeviceType))) – Device type enum from another library to convert to
Returns: Device type equivalent torch_tensorrt.DeviceType in enum t
Return type: Union(tensorrt.DeviceType, DeviceType)
Raises: TypeError – Unknown target type or unsupported device type

Examples

# Succeeds
trt_dla = torch_tensorrt.DeviceType.DLA.to(tensorrt.DeviceType) # Returns tensorrt.DeviceType.DLA

classmethod try_from(d: Union[DeviceType, DeviceType]) → Optional[DeviceType][source]

Create a Torch-TensorRT device type enum from a TensorRT device type enum.

Takes a device type enum from tensorrt and create a torch_tensorrt.DeviceType. If the source is not supported or the device type is not supported in Torch-TensorRT, then an exception will be raised. As such it is not recommended to use this method directly.

Alternatively use torch_tensorrt.DeviceType.try_from()

Parameters: d (Union(tensorrt.DeviceType, DeviceType)) – Device type enum from another library
Returns: Equivalent torch_tensorrt.DeviceType to d
Return type: DeviceType

Examples

torchtrt_dla = torch_tensorrt.DeviceType._from(tensorrt.DeviceType.DLA)

try_to(t: Union[Type[DeviceType], Type[DeviceType]], use_default: bool = False) → Optional[Union[DeviceType, DeviceType]][source]

Convert DeviceType into the equivalent type in tensorrt

Converts self into one of torch or tensorrt equivalent memory format. If self is not supported in the target library, then None will be returned.

Parameters: t (Union(Type(tensorrt.DeviceType), Type(DeviceType))) – Device type enum from another library to convert to
Returns: Device type equivalent torch_tensorrt.DeviceType in enum t
Return type: Optional(Union(tensorrt.DeviceType, DeviceType))

Examples

# Succeeds
trt_dla = torch_tensorrt.DeviceType.DLA.to(tensorrt.DeviceType) # Returns tensorrt.DeviceType.DLA

DLA

Target is a DLA core

GPU

Target is a GPU

UNKNOWN

Sentinel value

class torch_tensorrt.EngineCapability(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]

EngineCapability determines the restrictions of a network during build time and what runtime it targets.

to(t: Union[Type[EngineCapability], Type[EngineCapability]]) → Union[EngineCapability, EngineCapability][source]

Convert EngineCapability into the equivalent type in tensorrt

Converts self into one of torch or tensorrt equivalent engine capability. If self is not supported in the target library, then an exception will be raised. As such it is not recommended to use this method directly.

Alternatively use torch_tensorrt.EngineCapability.try_to()

Parameters: t (Union(Type(tensorrt.EngineCapability), Type(EngineCapability))) – Engine capability enum from another library to convert to
Returns: Engine capability equivalent torch_tensorrt.EngineCapability in enum t
Return type: Union(tensorrt.EngineCapability, EngineCapability)
Raises: TypeError – Unknown target type or unsupported engine capability

Examples

# Succeeds
torchtrt_dla_ec = torch_tensorrt.EngineCapability.DLA_STANDALONE.to(tensorrt.EngineCapability) # Returns tensorrt.EngineCapability.DLA

classmethod try_from() → Optional[EngineCapability][source]

Create a Torch-TensorRT engine capability enum from a TensorRT engine capability enum.

Takes a device type enum from tensorrt and create a torch_tensorrt.EngineCapability. If the source is not supported or the engine capability level is not supported in Torch-TensorRT, then an exception will be raised. As such it is not recommended to use this method directly.

Alternatively use torch_tensorrt.EngineCapability.try_from()

Parameters: c (Union(tensorrt.EngineCapability, EngineCapability)) – Engine capability enum from another library
Returns: Equivalent torch_tensorrt.EngineCapability to c
Return type: EngineCapability

Examples

torchtrt_safety_ec = torch_tensorrt.EngineCapability._from(tensorrt.EngineCapability.SAEFTY)

try_to(t: Union[Type[EngineCapability], Type[EngineCapability]]) → Optional[Union[EngineCapability, EngineCapability]][source]

Convert EngineCapability into the equivalent type in tensorrt

Converts self into one of torch or tensorrt equivalent engine capability. If self is not supported in the target library, then None will be returned.

Parameters: t (Union(Type(tensorrt.EngineCapability), Type(EngineCapability))) – Engine capability enum from another library to convert to
Returns: Engine capability equivalent torch_tensorrt.EngineCapability in enum t
Return type: Optional(Union(tensorrt.EngineCapability, EngineCapability))

Examples

# Succeeds
trt_dla_ec = torch_tensorrt.EngineCapability.DLA.to(tensorrt.EngineCapability) # Returns tensorrt.EngineCapability.DLA_STANDALONE

DLA_STANDALONE

EngineCapability.DLA_STANDALONE provides a restricted subset of network operations that are DLA compatible and the resulting serialized engine can be executed using standalone DLA runtime APIs.

SAFETY

EngineCapability.SAFETY provides a restricted subset of network operations that are safety certified and the resulting serialized engine can be executed with TensorRT’s safe runtime APIs in the tensorrt.safe namespace.

STANDARD

EngineCapability.STANDARD does not provide any restrictions on functionality and the resulting serialized engine can be executed with TensorRT’s standard runtime APIs.

class torch_tensorrt.memory_format(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]

to(t: Union[Type[memory_format], Type[TensorFormat], Type[memory_format]]) → Union[memory_format, TensorFormat, memory_format][source]

Convert memory_format into the equivalent type in torch or tensorrt

Converts self into one of torch or tensorrt equivalent memory format. If self is not supported in the target library, then an exception will be raised. As such it is not recommended to use this method directly.

Alternatively use torch_tensorrt.memory_format.try_to()

Parameters: t (Union(Type(torch.memory_format), Type(tensorrt.TensorFormat), Type(memory_format))) – Memory format type enum from another library to convert to
Returns: Memory format equivalent torch_tensorrt.memory_format in enum t
Return type: Union(torch.memory_format, tensorrt.TensorFormat, memory_format)
Raises: TypeError – Unknown target type or unsupported memory format

Examples

# Succeeds
tf = torch_tensorrt.memory_format.linear.to(torch.dtype) # Returns torch.contiguous

classmethod try_from(f: Union[memory_format, TensorFormat, memory_format]) → Optional[memory_format][source]

Create a Torch-TensorRT memory format enum from another library memory format enum.

Takes a memory format enum from one of torch, and tensorrt and create a torch_tensorrt.memory_format. If the source is not supported or the memory format is not supported in Torch-TensorRT, then None will be returned.

Parameters: f (Union(torch.memory_format, tensorrt.TensorFormat, memory_format)) – Memory format enum from another library
Returns: Equivalent torch_tensorrt.memory_format to f
Return type: Optional(memory_format)

Examples

torchtrt_linear = torch_tensorrt.memory_format.try_from(torch.contiguous)

try_to(t: Union[Type[memory_format], Type[TensorFormat], Type[memory_format]]) → Optional[Union[memory_format, TensorFormat, memory_format]][source]

Convert memory_format into the equivalent type in torch or tensorrt

Converts self into one of torch or tensorrt equivalent memory format. If self is not supported in the target library, then None will be returned

Parameters: t (Union(Type(torch.memory_format), Type(tensorrt.TensorFormat), Type(memory_format))) – Memory format type enum from another library to convert to
Returns: Memory format equivalent torch_tensorrt.memory_format in enum t
Return type: Optional(Union(torch.memory_format, tensorrt.TensorFormat, memory_format))

Examples

# Succeeds
tf = torch_tensorrt.memory_format.linear.to(torch.dtype) # Returns torch.contiguous

cdhw32

Thirty-two wide channel vectorized row major format with 3 spatial dimensions.

This format is bound to FP16 and INT8. It is only available for dimensions >= 4.

For a tensor with dimensions {N, C, D, H, W}, the memory layout is equivalent to a C array with dimensions [N][(C+31)/32][D][H][W][32], with the tensor coordinates (n, d, c, h, w) mapping to array subscript [n][c/32][d][h][w][c%32].

chw16

Sixteen wide channel vectorized row major format.

This format is bound to FP16. It is only available for dimensions >= 3.

For a tensor with dimensions {N, C, H, W}, the memory layout is equivalent to a C array with dimensions [N][(C+15)/16][H][W][16], with the tensor coordinates (n, c, h, w) mapping to array subscript [n][c/16][h][w][c%16].

chw2

Two wide channel vectorized row major format.

This format is bound to FP16 in TensorRT. It is only available for dimensions >= 3.

For a tensor with dimensions {N, C, H, W}, the memory layout is equivalent to a C array with dimensions [N][(C+1)/2][H][W][2], with the tensor coordinates (n, c, h, w) mapping to array subscript [n][c/2][h][w][c%2].

chw32

Thirty-two wide channel vectorized row major format.

This format is only available for dimensions >= 3.

For a tensor with dimensions {N, C, H, W}, the memory layout is equivalent to a C array with dimensions [N][(C+31)/32][H][W][32], with the tensor coordinates (n, c, h, w) mapping to array subscript [n][c/32][h][w][c%32].

chw4

Four wide channel vectorized row major format. This format is bound to INT8. It is only available for dimensions >= 3.

For a tensor with dimensions {N, C, H, W}, the memory layout is equivalent to a C array with dimensions [N][(C+3)/4][H][W][4], with the tensor coordinates (n, c, h, w) mapping to array subscript [n][c/4][h][w][c%4].

dhwc

Non-vectorized channel-last format. This format is bound to FP32. It is only available for dimensions >= 4.

Equivient to memory_format.channels_last_3d

dhwc8

Eight channel format where C is padded to a multiple of 8.

This format is bound to FP16, and it is only available for dimensions >= 4.

For a tensor with dimensions {N, C, D, H, W}, the memory layout is equivalent to an array with dimensions [N][D][H][W][(C+7)/8*8], with the tensor coordinates (n, c, d, h, w) mapping to array subscript [n][d][h][w][c].

dla_hwc4

DLA image format. channel-last format. C can only be 1, 3, 4. If C == 3 it will be rounded to 4. The stride for stepping along the H axis is rounded up to 32 bytes.

This format is bound to FP16/Int8 and is only available for dimensions >= 3.

For a tensor with dimensions {N, C, H, W}, with C’ is 1, 4, 4 when C is 1, 3, 4 respectively, the memory layout is equivalent to a C array with dimensions [N][H][roundUp(W, 32/C’/elementSize)][C’] where elementSize is 2 for FP16 and 1 for Int8, C’ is the rounded C. The tensor coordinates (n, c, h, w) maps to array subscript [n][h][w][c].

dla_linear

DLA planar format. Row major format. The stride for stepping along the H axis is rounded up to 64 bytes.

This format is bound to FP16/Int8 and is only available for dimensions >= 3.

For a tensor with dimensions {N, C, H, W}, the memory layout is equivalent to a C array with dimensions [N][C][H][roundUp(W, 64/elementSize)] where elementSize is 2 for FP16 and 1 for Int8, with the tensor coordinates (n, c, h, w) mapping to array subscript [n][c][h][w].

hwc

Non-vectorized channel-last format. This format is bound to FP32 and is only available for dimensions >= 3.

Equivient to memory_format.channels_last

hwc16

Sixteen channel format where C is padded to a multiple of 16. This format is bound to FP16. It is only available for dimensions >= 3.

For a tensor with dimensions {N, C, H, W}, the memory layout is equivalent to the array with dimensions [N][H][W][(C+15)/16*16], with the tensor coordinates (n, c, h, w) mapping to array subscript [n][h][w][c].

hwc8

Eight channel format where C is padded to a multiple of 8.

This format is bound to FP16. It is only available for dimensions >= 3.

For a tensor with dimensions {N, C, H, W}, the memory layout is equivalent to the array with dimensions [N][H][W][(C+7)/8*8], with the tensor coordinates (n, c, h, w) mapping to array subscript [n][h][w][c].

linear

Row major linear format.

For a tensor with dimensions {N, C, H, W}, the W axis always has unit stride, and the stride of every other axis is at least the product of the next dimension times the next stride. the strides are the same as for a C array with dimensions [N][C][H][W].

Equivient to memory_format.contiguous

torch_tensorrt

Functions

Classes

Enums

Submodules

Docs

Tutorials

Resources