System Overview

Torch-TensorRT is primarily a C++ Library with a Python API planned. We use Bazel as our build system and target Linux x86_64 and Linux aarch64 (only natively) right now. The compiler we use is GCC 7.5.0 and the library is untested with compilers before that version so there may be compilation errors if you try to use an older compiler.

The repository is structured into:

core: Main compiler source code
cpp: C++ API
tests: tests of the C++ API, the core and converters
py: Python API
notebooks: Example applications built with Torch-TensorRT
docs: Documentation
docsrc: Documentation Source
third_party: BUILD files for dependency libraries
toolchains: Toolchains for different platforms

The C++ API is unstable and subject to change until the library matures, though most work is done under the hood in the core.

The core has a couple major parts: The top level compiler interface which coordinates ingesting a module, lowering, converting and generating a new module and returning it back to the user. There are the three main phases of the compiler, the lowering phase, the conversion phase, and the execution phase.

Compiler Phases

Lowering

Lowering Phase

The lowering is made up of a set of passes (some from PyTorch and some specific to Torch-TensorRT) run over the graph IR to map the large PyTorch opset to a reduced opset that is easier to convert to TensorRT.

Partitioning

Partitioning Phase

The phase is optional and enabled by the user. It instructs the compiler to separate nodes into ones that should run in PyTorch and ones that should run in TensorRT. Criteria for separation include: Lack of a converter, operator is explicitly set to run in PyTorch by the user or the node has a flag which tells partitioning to run in PyTorch by the module fallback passes.

Conversion

Conversion Phase

In the conversion phase we traverse the lowered graph and construct an equivalent TensorRT graph. The conversion phase is made up of three main components, a context to manage compile time data, a evaluator library which will execute operations that can be resolved at compile time and a converter library which maps an op from JIT to TensorRT.

Compilation and Runtime

Deploying Torch-TensorRT Programs

The final compilation phase constructs a TorchScript program to run the converted TensorRT engine. It takes a serialized engine and instantiates it within a engine manager, then the compiler will build out a JIT graph that references this engine and wraps it in a module to return to the user. When the user executes the module, the JIT program run in the JIT runtime extended by Torch-TensorRT with the data providied from the user.

System Overview

Compiler Phases

Lowering

Partitioning

Conversion

Compilation and Runtime

Docs

Tutorials

Resources