Using ExecuTorch with C++

In order to support a wide variety of devices, from high-end mobile phones down to tiny embedded systems, ExecuTorch provides an API surface with a high degree of customizability. The C++ APIs expose advanced configuration options, such as controlling memory allocation, placement, and data loading. To meet the needs of both application and embedded programming, ExecuTorch provides a low-level, highly-customizable core set of APIs, and set of high-level extensions, which abstract away many of the low-level details that are not relevant for mobile application programming.

High-Level APIs

The C++ Module class provides the high-level interface to load and execute a model from C++. It is responsible for loading the .pte file, configuring memory allocation and placement, and running the model. The Module constructor takes a file path and provides a simplified forward() method to run the model.

In addition the Module class, the tensor extension provides an encapsulated interface to define and manage tensor memory. It provides the TensorPtr class, which is a “fat” smart pointer. It provides ownership over the tensor data and metadata, such as size and strides. The make_tensor_ptr and from_blob methods, defined in tensor.h, provide owning and non-owning tensor creation APIs, respectively.

#include <executorch/extension/module/module.h>
#include <executorch/extension/tensor/tensor.h>

using namespace ::executorch::extension;

// Load the model.
Module module("/path/to/model.pte");

// Create an input tensor.
float input[1 * 3 * 256 * 256];
auto tensor = from_blob(input, {1, 3, 256, 256});

// Perform an inference.
const auto result = module.forward(tensor);

if (result.ok()) {
  // Retrieve the output data.
  const auto output = result->at(0).toTensor().const_data_ptr<float>();
}

For more information on the Module class, see Running an ExecuTorch Model Using the Module Extension in C++. For information on high-level tensor APIs, see Managing Tensor Memory in C++.

Low-Level APIs

Running a model using the low-level runtime APIs allows for a high-degree of control over memory allocation, placement, and loading. This allows for advanced use cases, such as placing allocations in specific memory banks or loading a model without a file system. For an end to end example using the low-level runtime APIs, see Running an ExecuTorch Model in C++ Tutorial.

Building with CMake

ExecuTorch uses CMake as the primary build system. Inclusion of the module and tensor APIs are controlled by the EXECUTORCH_BUILD_EXTENSION_MODULE and EXECUTORCH_BUILD_EXTENSION_TENSOR CMake options. As these APIs may not be supported on embedded systems, they are disabled by default when building from source. The low-level API surface is always included. To link, add the executorch target as a CMake dependency, along with executorch_module_static and executorch_tensor, if desired.

# CMakeLists.txt
add_subdirectory("executorch")
...
target_link_libraries(
    my_target
    PRIVATE executorch
    executorch_module_static
    executorch_tensor
    optimized_native_cpu_ops_lib
    xnnpack_backend)

See Building from Source for more information on the CMake build process.

Reference Runners

The ExecuTorch repository includes several reference runners, which are simple programs that load and execute a .pte file, typically with random inputs. These can be used to sanity check model execution on a development platform and as a code reference for runtime integration.

The executor_runner target is built by default when building with CMake. It can be invoked as follows:

./cmake-out/executor_runner --model_path path/to/model.pte

The runner source code can be found in the ExecuTorch repo under examples/portable/executor_runner.cpp. Some backends, such as CoreML, have dedicated runners to showcase backend and platform-specific functionality. See examples/apple/coreml and the examples directory for more information.

Next Steps

Runtime API Reference for documentation on the available C++ runtime APIs.
Running an ExecuTorch Model Using the Module Extension in C++ for information on the high-level Module API.
Managing Tensor Memory in C++ for information on high-level tensor APIs.
Running an ExecuTorch Model in C++ Tutorial for information on the low-level runtime APIs.
Building from Source for information on CMake build integration.

Using ExecuTorch with C++

High-Level APIs

Low-Level APIs

Building with CMake

Reference Runners

Next Steps

Docs

Tutorials

Resources