Torch-TensorRT
In-framework compilation of PyTorch inference code for NVIDIA GPUs
Torch-TensorRT is a inference compiler for PyTorch, targeting NVIDIA GPUs via NVIDIA’s TensorRT Deep Learning Optimizer and Runtime.
It supports both just-in-time (JIT) compilation workflows via the torch.compile
interface as well as ahead-of-time (AOT) workflows.
Torch-TensorRT integrates seamlessly into the PyTorch ecosystem supporting hybrid execution of optimized TensorRT code with standard PyTorch code.
More Information / System Architecture:
Getting Started
User Guide
Tutorials
Overloading Torch-TensorRT Converters with Custom Converters
Using Custom Kernels within TensorRT Engines with Torch-TensorRT
Dynamo Frontend
TorchScript Frontend
FX Frontend
Model Zoo
Compiling ResNet with dynamic shapes using the torch.compile backend
Compiling Stable Diffusion model using the torch.compile backend
Compiling GPT2 using the Torch-TensorRT torch.compile frontend
Compiling FLUX.1-dev model using the Torch-TensorRT dynamo backend