Table of Contents

Shortcuts

Here we provide examples of Torch-TensorRT compilation of popular computer vision and language models.

Dependencies

Please install the following external dependencies (assuming you already have correct torch, torch_tensorrt and tensorrt libraries installed (dependencies))

pip install -r requirements.txt

Model Zoo

Compiling ResNet with dynamic shapes using the torch.compile backend: Compiling a ResNet model using the Torch Compile Frontend for torch_tensorrt.compile
Compiling BERT using the torch.compile backend: Compiling a Transformer model using torch.compile
Compiling Stable Diffusion model using the torch.compile backend: Compiling a Stable Diffusion model using torch.compile
_torch_compile_gpt2: Compiling a GPT2 model using torch.compile
_torch_export_gpt2: Compiling a GPT2 model using AOT workflow (ir=dynamo)
_torch_export_llama2: Compiling a Llama2 model using AOT workflow (ir=dynamo)
_torch_export_sam2: Compiling SAM2 model using AOT workflow (ir=dynamo)
_torch_export_flux_dev: Compiling FLUX.1-dev model using AOT workflow (ir=dynamo)

Compiling Stable Diffusion model using the torch.compile backend

Compiling Stable Diffusion model using the torch.compile backend

sphx_glr_tutorials__rendered_examples_dynamo_cross_runtime_compilation_for_windows.py

cross runtime compilation limitations:

Refitting Torch-TensorRT Programs with New Weights

Refitting Torch-TensorRT Programs with New Weights

Compiling BERT using the torch.compile backend

Compiling BERT using the torch.compile backend

Compiling GPT2 using the Torch-TensorRT torch.compile frontend

Compiling GPT2 using the Torch-TensorRT torch.compile frontend

Torch Compile Advanced Usage

Torch Compile Advanced Usage

Torch Export with Cudagraphs

Torch Export with Cudagraphs

Engine Caching (BERT)

Engine Caching (BERT)

Pre-allocated output buffer

Pre-allocated output buffer

Compiling ResNet with dynamic shapes using the torch.compile backend

Compiling ResNet with dynamic shapes using the torch.compile backend

Compiling FLUX.1-dev model using the Torch-TensorRT dynamo backend

Compiling FLUX.1-dev model using the Torch-TensorRT dynamo backend

Compiling GPT2 using the dynamo backend

Compiling GPT2 using the dynamo backend

Compiling Llama2 using the dynamo backend

Compiling Llama2 using the dynamo backend

Automatically Generate a Converter for a Custom Kernel

Automatically Generate a Converter for a Custom Kernel

Automatically Generate a Plugin for a Custom Kernel

Automatically Generate a Plugin for a Custom Kernel

Overloading Torch-TensorRT Converters with Custom Converters

Overloading Torch-TensorRT Converters with Custom Converters

Weight Streaming

Weight Streaming

Mutable Torch TensorRT Module

Mutable Torch TensorRT Module

Compiling SAM2 using the dynamo backend

Compiling SAM2 using the dynamo backend

Deploy Quantized Models using Torch-TensorRT

Deploy Quantized Models using Torch-TensorRT

Engine Caching

Using Custom Kernels within TensorRT Engines with Torch-TensorRT

Using Custom Kernels within TensorRT Engines with Torch-TensorRT

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources