Introducing ExecuTorch 1.0: Powering the next generation of edge AI

TLDR

ExecuTorch enables seamless, production-ready deployment of PyTorch models directly to edge devices (mobile, embedded, desktop) without the need for conversion or rewriting, supporting a wide range of hardware backends and model types.
ExecuTorch 1.0 release delivers broader hardware support across CPU, GPU, and NPU, greater stability for production use, and robust model compatibility.
ExecuTorch is fully open source and already powering real-world applications across the community, accelerating innovation and adoption of on-device AI for billions of users.

On-device AI deployment brings machine learning capabilities directly to users’ devices, enabling fast, real-time responses without relying on the cloud. This approach also improves privacy by keeping data local and supports personalized experiences and features that work even without internet access. Traditional on-device AI examples include running computer vision algorithms on mobile devices for photo editing and processing. But, recently, there has been rapid growth in new use cases driven by advances in hardware and AI models, such as local agents powered by smart large language models (LLMs) and ambient AI applications in wearables and smart glasses.

However, when deploying these novel models to on-device production environments such as mobile, desktop, and embedded applications, models often had to be converted to other formats and runtimes. Today, these conversions can be done programmatically using conversion scripts targeting formats like ONNX or TFLite, or by completely rewriting the model in different languages such as C++ (e.g., llama.cpp) or MLX. These conversions are time-consuming for machine learning engineers and often become bottlenecks in the production deployment process due to issues such as numerical mismatches and loss of debug information during conversion. Rewriting models in languages like C++ is also introducing challenges as models are becoming complex. At first, it was relatively feasible when LLM architectures were somewhat standard text transformers, but today’s models are becoming more complex with the emergence of multimodal and reasoning models. General-purpose solutions and the flexibility to compose and modularize backbone components are especially important today for multimodal LLMs, where different encoders for images, audio, and other modalities need to be easily swapped and integrated.

We introduce ExecuTorch 1.0, a framework designed to help developers build these novel AI applications optimized for edge devices. ExecuTorch is completely open source and ready to power the growing need of the next generation of on-device AI use cases, enabling smarter, faster, and more private experiences at the edge.

ExecuTorch is a general-purpose, fully open source PyTorch-native solution designed for mobile, embedded and desktop devices, supporting platforms such as iOS, Android, embedded devices, and laptop computers (aka AI PCs). It enables developers to take any PyTorch-based model from any domain—whether large language models (LLM), vision-language models (VLM), image segmentation, detection, audio, and more—and deploy it directly onto edge devices without the need to convert to other formats or rewrite the model.

During the deployment process, ExecuTorch uses PyTorch’s native way to convert models into a stable and compact representation for efficient on-device deployment that ensures minimal mismatch between the original model and the deployed model, preserving accuracy and performance. At the same time, it maintains flexibility and modularity, allowing developers to compose and customize backbone components as needed, which is especially important for multimodal LLMs. Moreover, developers can use the same workflow in ExecuTorch to enable models on many form factors (e.g., microcontrollers, phones, laptops) across various backends (DSP, CPU, GPU, and NPU).

In short, ExecuTorch significantly reduces the time and complexity involved in production deployment, eliminating common bottlenecks and debugging challenges.

ExecuTorch 1.0 release

Today, PyTorch is widely used by researchers to develop new models. We announced ExecuTorch Beta release in October 2024 providing stability in terms of developer APIs and runtime features. Since then, we’ve been working on improving reliability, stability and performance. Here’s what the 1.0 launch brings:

Accelerator Coverage: In terms of coverage, we’ve added new backends— Arm VGF, NXP Semiconductors’ eIQ® Neutron NPU, Samsung Exynos NPU, Exynos GPU, and OpenVINO from Intel—further increasing the backend line-up. These join our existing backends with alpha and beta status such as Cadence DSP, MediaTek NPU, and Apple Metal Performance Shaders (MPS).
Accelerator Stability: The following backends have been promoted from beta to production-ready status due to improvements in stability and reliability, so that you can start using them in production.
- XNNPACK with Arm Kleidi, for CPU acceleration
- Apple Core ML for Apple silicon
- Qualcomm® AI Engine with the ExecuTorch delegate for Qualcomm® Hexagon™ NPU
- Arm Ethos-U NPU
Vulkan GPU
Model coverage. We have validated ExecuTorch (along backends) across traditional AI use-cases (e.g., object detection, depth, OCR, ASR, segmentation etc) as well as on-device text LLM and multimodal LLMs (e.g., Voxtral for audio-text-input and Gemma3 image-text-input). We have partnered with Hugging Face to make it easy to export and run a wide range of models on HF transformers on ExecuTorch.

For a full feature list and capabilities for ExecuTorch, please refer to our 1.0 release notes for details, and the backends overview for technical documents covering all supported backends.

Lastly, for developers interested in exploring the latest features from our main branch, we’re excited to share that ExecuTorch can now be embedded into native C++ desktop and laptop applications, leveraging CPU, GPU, and NPU on their computer. Though still early, this is especially exciting for those developers building applications for consumer desktops and laptops—such as productivity tools and privacy-preserving personal assistants—thanks to ExecuTorch’s flexibility and its broad support for models and hardware backends.

ExecuTorch Success Stories

Today, ExecuTorch has been adopted by multiple companies in production and integrated with prominent OSS frameworks and ecosystems. Meta’s blog post recently highlighted some examples of on-device AI features, powered by ExecuTorch, that are serving billions of people on Instagram, WhatsApp, Messenger, and Facebook. Check out these firsthand accounts of how our ExecuTorch product is delivering real value to the AI community today:

“Meta’s Reality Labs have adopted ExecuTorch in production in our Wearables lineup—including the latest Meta Ray-Ban Display glasses with EMG band. Advanced AI features like speech recognition, motion sensing, and computer vision run on these devices by leveraging cutting-edge hardware. ExecuTorch helps us push what’s possible and speeds up research-to-production through PyTorch standardization. Its open-source foundation accelerates innovation, making it easy to integrate contributions from hardware partners and the research community. As a result, we’re delivering smarter, faster, and more personalized experiences to millions of users.”

Anuj Kumar, Director, Reality Labs, Meta

“There are over 750k transformers models on Hugging Face today. This represents a huge opportunity to bring innovation to the edge and power the ambient intelligence revolution. We are excited that models hosted on Hugging Face transformers are now becoming easily exportable to ExecuTorch. Today, more than 80% of the most downloaded, edge-friendly large language models on Hugging Face run on ExecuTorch out of the box. For multimodal applications, support is rapidly expanding to include several popular vision models (such as Gemma3, SmolVLM, and LLaVA) and audio models (Voxtral, Granite, and others). This significantly broadens the possibilities for on-device innovation!”

Lysandre Debut, Chief Open-Source Officer, Hugging Face

“Fine-tuning with and quantizing models directly in PyTorch, doing experimentation and evaluation in python and taking the exact quantized model to run on-device has been a game changer for developers and has allowed the Unsloth team to be more creative with techniques and schemes. For Unsloth users, ExecuTorch is a great path to edge deployment!”

Daniel Han, Co-founder, Unsloth AI

“Why do we use Executorch in our LFM2 model series? Two reasons: 1) Flexibility, both in terms of supporting Liquid AI’s custom model architectures and pre/post-processing code, that allows our research team to experiment with new models freely without adding extra burden on the inference runtime. 2) Performance. State tracking and caching of the PyTorch graph that Executorch provides reduce token generation time at inference and decrease latency.”

Mathias Lechner, CTO at Liquid AI

“Voxtral unifies audio and text processing in a single multimodal architecture, and deploying it efficiently on-device requires precise control over execution and hardware utilization. ExecuTorch’s PyTorch-native runtime lets us run Voxtral directly—without format conversion—while efficiently targeting GPU, NPU, and CPU backends for low-latency, high-fidelity on-device inference.“

Timothée Lacroix, CTO, Mistral AI

For more examples of ecosystem integrations and adoptions, please see ExecuTorch Success Stories page for more details.

Call To Action

We invite developers everywhere to dive into ExecuTorch, experiment with the code, and share your feedback to help shape its future! Get started with the ExecuTorch code and explore our comprehensive documentation.

Acknowledgements

ExecuTorch is the work of many hands, including (alphabetical):

[PyTorch Edge] Cagatay Bilgin, Manuel Candales, Gregory Comer, Digant Desai, Tanvir Islam, Songhao Jia, Stephen Jia, Abhinay Kukkadapu, Chen Lai, Mengwei Liu, Mergen Nachin, Kimish Patel, Siddartha Pothapragada, Lucy Qiu, Scott Roy, Anthony Shoumikhin, Jacob Szwejbka, Scott Wolchok, Jack Zhang, Hansong Zhang

[Compiler] Yanan Cao, Avik Chaudhuri, Zhengxu Chen, Tugsuu Manlaibaatar, Yidi Wu, Angela Yi, Shangdi Yu

[AO] Andrew Or, Supriya Rao, Jerry Zhang

[Dev-Infra] Huy Do, Andrey Talman, Eli Uriegas

[Others]: Stefano Cadario, Andrew Caples, Matthias Cremon, Alban Desmaison, Soumyadeep Ghosh, Chris Gottbrath, Min Guo, Harshit Khaitan, Svetlana Karslioglu, Ji Li, Dulin Riley, Nikita Shulga, Joe Spisak, Jake Stevens, Vivek Trivedi, Shen Xu

[Alums]: Raziel Alvarez, Dave Bort, Xingying Cheng, Salil Desai, Michael Gschwind, Lunwen He, Tarun Karuturi, Jack Khuu, Ali Khosh, Olivia Liu, Max Ren, Orion Reblitz-Richardson, Varun Puri, Jathu Satkunarajah, Jesse White, Martin Yuan, Guang Yang, Chakri Uddaraju

[Partners]: Apple, Arm, Cadence, Intel, MediaTek, NXP, Qualcomm, Samsung

Introducing ExecuTorch 1.0: Powering the next generation of edge AI

ExecuTorch 1.0 release

ExecuTorch Success Stories

Call To Action

Acknowledgements

Docs

Tutorials

Resources

Stay in touch for updates, event info, and the latest news