.. _torchtrtc: torchtrtc ================================= ``torchtrtc`` is a CLI application for using the Torch-TensorRT compiler. It serves as an easy way to compile a TorchScript Module with Torch-TensorRT from the command-line to quickly check support or as part of a deployment pipeline. All basic features of the compiler are supported including post training quantization (though you must already have a calibration cache file to use the PTQ feature). The compiler can output two formats, either a TorchScript program with the TensorRT engine embedded or the TensorRT engine itself as a PLAN file. All that is required to run the program after compilation is for C++ linking against ``libtorchtrt.so`` or in Python importing the torch_tensorrt package. All other aspects of using compiled modules are identical to standard TorchScript. Load with ``torch.jit.load()`` and run like you would run any other module. .. code-block:: txt torchtrtc [input_file_path] [output_file_path] [input_specs...] {OPTIONS} torchtrtc is a compiler for TorchScript, it will compile and optimize TorchScript programs to run on NVIDIA GPUs using TensorRT OPTIONS: -h, --help Display this help menu Verbiosity of the compiler -v, --verbose Dumps debugging information about the compilation process onto the console -w, --warnings Disables warnings generated during compilation onto the console (warnings are on by default) --i, --info Dumps info messages generated during compilation onto the console --build-debuggable-engine Creates a debuggable engine --allow-gpu-fallback (Only used when targeting DLA (device-type)) Lets engine run layers on GPU if they are not supported on DLA --require-full-compilation Require that the model should be fully compiled to TensorRT or throw an error --check-method-support=[method_name] Check the support for end to end compilation of a specified method in the TorchScript module --disable-tf32 Prevent Float32 layers from using the TF32 data format --sparse-weights Enable sparsity for weights of conv and FC layers -p[precision...], --enable-precision=[precision...] (Repeatable) Enabling an operating precision for kernels to use when building the engine (Int8 requires a calibration-cache argument) [ float | float32 | f32 | fp32 | half | float16 | f16 | fp16 | int8 | i8 | char ] (default: float) -d[type], --device-type=[type] The type of device the engine should be built for [ gpu | dla ] (default: gpu) --gpu-id=[gpu_id] GPU id if running on multi-GPU platform (defaults to 0) --dla-core=[dla_core] DLACore id if running on available DLA (defaults to 0) --engine-capability=[capability] The type of device the engine should be built for [ standard | safety | dla_standalone ] --calibration-cache-file=[file_path] Path to calibration cache file to use for post training quantization --teo=[op_name...], --torch-executed-op=[op_name...] (Repeatable) Operator in the graph that should always be run in PyTorch for execution (partial compilation must be enabled) --tem=[module_name...], --torch-executed-mod=[module_name...] (Repeatable) Module that should always be run in Pytorch for execution (partial compilation must be enabled) --mbs=[num_ops], --min-block-size=[num_ops] Minimum number of contiguous TensorRT supported ops to compile a subgraph to TensorRT --embed-engine Whether to treat input file as a serialized TensorRT engine and embed it into a TorchScript module (device spec must be provided) --num-min-timing-iter=[num_iters] Number of minimization timing iterations used to select kernels --num-avg-timing-iters=[num_iters] Number of averaging timing iterations used to select kernels --workspace-size=[workspace_size] Maximum size of workspace given to TensorRT -t[threshold], --threshold=[threshold] Maximum acceptable numerical deviation from standard torchscript output (default 2e-5) --no-threshold-check Skip checking threshold compliance --truncate-long-double, --truncate, --truncate-64bit Truncate weights that are provided in 64bit to 32bit (Long, Double to Int, Float) --save-engine Instead of compiling a full a TorchScript program, save the created engine to the path specified as the output path input_file_path Path to input TorchScript file output_file_path Path for compiled TorchScript (or TensorRT engine) file input_specs... Specs for inputs to engine, can either be a single size or a range defined by Min, Optimal, Max sizes, e.g. "(N,..,C,H,W)" "[(MIN_N,..,MIN_C,MIN_H,MIN_W);(OPT_N,..,OPT_C,OPT_H,OPT_W);(MAX_N,..,MAX_C,MAX_H,MAX_W)]". Data Type and format can be specified by adding an "@" followed by dtype and "%" followed by format to the end of the shape spec. e.g. "(3, 3, 32, 32)@f16%NHWC" "--" can be used to terminate flag options and force all following arguments to be treated as positional options e.g. .. code-block:: shell torchtrtc tests/modules/ssd_traced.jit.pt ssd_trt.ts "[(1,3,300,300); (1,3,512,512); (1, 3, 1024, 1024)]@f16%contiguous" -p f16