Building and Running ExecuTorch with MPS Backend

In this tutorial we will walk you through the process of getting setup to build the MPS backend for ExecuTorch and running a simple model on it.

The MPS backend device maps machine learning computational graphs and primitives on the MPS Graph framework and tuned kernels provided by MPS.

What you will learn in this tutorial:

In this tutorial you will learn how to export MobileNet V3 model to the MPS delegate.
You will also learn how to compile and deploy the ExecuTorch runtime with the MPS delegate on macOS and iOS.

Tutorials we recommend you complete before this:

Prerequisites (Hardware and Software)

In order to be able to successfully build and run a model using the MPS backend for ExecuTorch, you’ll need the following hardware and software components:

Hardware:

A mac for tracing the model

Software:

Ahead of time tracing:
- macOS 12
Runtime:
- macOS >= 12.4
- iOS >= 15.4
- Xcode >= 14.1

Setting up Developer Environment

Step 1. Please finish tutorial Setting up ExecuTorch.

Step 2. Install dependencies needed to lower MPS delegate:

./backends/apple/mps/install_requirements.sh

Build

AOT (Ahead-of-time) Components

Compiling model for MPS delegate:

In this step, you will generate a simple ExecuTorch program that lowers MobileNetV3 model to the MPS delegate. You’ll then pass this Program (the .pte file) during the runtime to run it using the MPS backend.

cd executorch
# Note: `mps_example` script uses by default the MPSPartitioner for ops that are not yet supported by the MPS delegate. To turn it off, pass `--no-use_partitioner`.
python3 -m examples.apple.mps.scripts.mps_example --model_name="mv3" --bundled --use_fp16

# To see all options, run following command:
python3 -m examples.apple.mps.scripts.mps_example --help

Runtime

Building the MPS executor runner:

# In this step, you'll be building the `mps_executor_runner` that is able to run MPS lowered modules:
cd executorch
./examples/apple/mps/scripts/build_mps_executor_runner.sh

Run the mv3 generated model using the mps_executor_runner

./cmake-out/examples/apple/mps/mps_executor_runner --model_path mv3_mps_bundled_fp16.pte --bundled_program

You should see the following results. Note that no output file will be generated in this example:

I 00:00:00.003290 executorch:mps_executor_runner.mm:286] Model file mv3_mps_bundled_fp16.pte is loaded.
I 00:00:00.003306 executorch:mps_executor_runner.mm:292] Program methods: 1
I 00:00:00.003308 executorch:mps_executor_runner.mm:294] Running method forward
I 00:00:00.003311 executorch:mps_executor_runner.mm:349] Setting up non-const buffer 1, size 606112.
I 00:00:00.003374 executorch:mps_executor_runner.mm:376] Setting up memory manager
I 00:00:00.003376 executorch:mps_executor_runner.mm:392] Loading method name from plan
I 00:00:00.018942 executorch:mps_executor_runner.mm:399] Method loaded.
I 00:00:00.018944 executorch:mps_executor_runner.mm:404] Loading bundled program...
I 00:00:00.018980 executorch:mps_executor_runner.mm:421] Inputs prepared.
I 00:00:00.118731 executorch:mps_executor_runner.mm:438] Model executed successfully.
I 00:00:00.122615 executorch:mps_executor_runner.mm:501] Model verified successfully.

[Optional] Run the generated model directly using pybind

Make sure pybind MPS support was installed:

./install_requirements.sh --pybind mps

Run the mps_example script to trace the model and run it directly from python:

cd executorch
# Check correctness between PyTorch eager forward pass and ExecuTorch MPS delegate forward pass
python3 -m examples.apple.mps.scripts.mps_example --model_name="mv3" --no-use_fp16 --check_correctness
# You should see following output: `Results between ExecuTorch forward pass with MPS backend and PyTorch forward pass for mv3_mps are matching!`

# Check performance between PyTorch MPS forward pass and ExecuTorch MPS forward pass
python3 -m examples.apple.mps.scripts.mps_example --model_name="mv3" --no-use_fp16 --bench_pytorch

Profiling:

[Optional] Generate an ETRecord while you’re exporting your model.

cd executorch
python3 -m examples.apple.mps.scripts.mps_example --model_name="mv3" --generate_etrecord -b

Run your Program on the ExecuTorch runtime and generate an ETDump.

./cmake-out/examples/apple/mps/mps_executor_runner --model_path mv3_mps_bundled_fp16.pte --bundled_program --dump-outputs

Create an instance of the Inspector API by passing in the ETDump you have sourced from the runtime along with the optionally generated ETRecord from step 1.

python3 -m sdk.inspector.inspector_cli --etdump_path etdump.etdp --etrecord_path etrecord.bin

Deploying and Running on Device

Step 1. Create the ExecuTorch core and MPS delegate frameworks to link on iOS

cd executorch
./build/build_apple_frameworks.sh --mps

mps_delegate.xcframework will be in cmake-out folder, along with executorch.xcframework and portable_delegate.xcframework:

cd cmake-out && ls

Step 2. Link the frameworks into your XCode project: Go to project Target’s Build Phases - Link Binaries With Libraries, click the + sign and add the frameworks: files located in Release folder.

executorch.xcframework
portable_delegate.xcframework
mps_delegate.xcframework

From the same page, include the needed libraries for the MPS delegate:

MetalPerformanceShaders.framework
MetalPerformanceShadersGraph.framework
Metal.framework

In this tutorial, you have learned how to lower a model to the MPS delegate, build the mps_executor_runner and run a lowered model through the MPS delegate, or directly on device using the MPS delegate static library.

Frequently encountered errors and resolution.

If you encountered any bugs or issues following this tutorial please file a bug/issue on the ExecuTorch repository, with hashtag #mps.