May, 2026

Welcome to the May edition of the PyTorch Foundation Newsletter. This month, we reflect on the success of PyTorch Conference Europe 2026 by shining a spotlight on some of the most popular sessions from our members. We also look at the 2.12 release, upcoming events, webinars, and new technical blogs.

Announcements

PyTorch 2.12 Release

2.12 was composed of 2,926 commits from 457 contributors since PyTorch 2.11. Thank you to everyone who contributed.

Highlights include:
Batched linalg.eigh on CUDA is up to 100x faster;
New torch.accelerator.Graph API unifies graph capture and replay across CUDA, XPU, and out-of-tree backends;
torch.export.save now supports Microscaling (MX) quantization formats.

👉 Check out the full release blog.

Calling all PyTorch enthusiasts!

Nominations are open for the PyTorch Foundation Ambassador Program. Our ambassador program supports passionate local community leaders who educate, advocate for, and build with PyTorch. Ambassadors help grow PyTorch Foundation projects around the world by organizing events, creating educational content, mentoring contributors, and strengthening local communities. We especially encourage applications from contributors across Africa, Latin America, the Middle East, Oceania, Southeast Asia, and Eastern Europe as we continue expanding Ambassador representation across more regions and local communities. Interested in joining the program? Put yourself forward by June 18th:

🔥PyTorch Ambassador Program Application🔥

Upcoming Events

PyTorch 2.12 Release Live Q&A 🎬

PyTorch 2.12 includes major updates across compilation, distributed systems, export, graph capture, and accelerator support.

Join Andrey Talman, Alban Desmaison, Joe Spisak, and moderator Chris Gottbrath on Wednesday, May 20 at 10 AM PT for a technical overview of the release and a live Q&A. Register now >>

MLSys 2026 Conference Banner Image

MLSys 2026

May 18-22, visit PyTorch Foundation at booth #20 and attend Matt White’s lightning talk to learn how PyTorch is accelerating open source AI. The ExecuTorch team also presented their award-winning paper in Tuesday morning’s Best Paper Session. 🏆

PLDI 2026 AI Summit

In “Writing Performance-Portable Kernels Simplified with Helion,” attendees will write, autotune, and run real Helion kernels live. June 15-19 in Boulder, CO.

KubeCon + CloudNativeCon + OpenInfra Summit + PyTorch Conference China 2026

PyTorch Conference China 2026 will take place September 8-9 in Shanghai as part of KubeCon + CloudNativeCon + OpenInfra Summit + PyTorch Conference China 2026.

The event brings together the OpenInfra, cloud native, and AI communities to advance open source infrastructure, machine learning, and next-generation AI across shared industry use cases. Registration is now open, and the full schedule will be announced in June.

PyTorch Conference North America 2026

PyTorch Conference North America 2026 comes to San Jose on October 20-21. Hosted by PyTorch Foundation, #PyTorchCon brings together researchers, developers, and engineers to explore PyTorch Foundation projects, including PyTorch, vLLM, DeepSpeed, and Ray through technical talks, hands-on workshops, and keynotes spanning the full AI stack.

Submit a session proposal by 11:59 PM PDT on Sunday, June 7, and register by July 31 to secure early bird pricing.

Recent Events

PyTorch Docathon

For the past two weeks, we have hosted a hackathon-style community event focused on improving PyTorch documentation. We were pleased to see such a great turnout since great docs are a force multiplier: they help new users ramp faster, help experienced practitioners apply advanced features correctly, and shorten the path from research to production in machine learning. 👉Stay tuned for winner announcements soon, and in the meantime, view the leaderboard.

Member Highlights from PyTorch Conference Europe 2026

We had some fantastic sessions at PyTorch Conference Europe 2026, a testament to PyTorch Foundation projects’ continued momentum. Check out some of the highlights from PyTorch members below, and 📹 view all sessions in the conference playlist.

Arm: Optimizing CPU LLM Inference in PyTorch: Lessons From vLLM – Crefeda Rodrigues & Fadi Arafeh
AMD: Brevitas Quantization Library – Pablo Monteagudo Lago
AWS: Lightning Talk: Cross-Region Model Serving: PyTorch Inference, Observability & LLMOps – Suraj Muraleedharan
Google: Lightning Talk: Bringing Google’s Colossus to PyTorch: Rapid Storage via fsspec to Keep GPUs Busy – Ankita Luthra & Trinadh Kotturu
Hugging Face: torch.compile and Diffusers: A Hands-On Guide to Peak Performance – Sayak Paul
IBM: Lightning Talk: TerraKit: Standardising AI-Ready Geospatial Data Preparation for the TorchGeo Ecosystem – Rosie Lickorish & Romeo Kienzler
Lightning AI: Model-Changing Transforms With Torch.compile – Thomas Viehmann
Meta: Enabling State-of-the-art Asynchronous Execution in Torch.compile With CUDA Streams – Michael Lazos
Microsoft: Tour De Force: LLM Inference Optimization From Simple To Sophisticated – Christin Pohl
NVIDIA: The Science and Practice of Open and Scalable LLM Evaluations – Grzegorz Chlebus

In the News

Latest Blogs

Meta: Running PyTorch Models on Apple Silicon GPUs with the ExecuTorch MLX Delegate

ExecuTorch’s new MLX delegate enables GPU-accelerated inference for PyTorch models on Apple Silicon Macs using Apple’s MLX framework. The experimental backend integrates with the PyTorch 2 export stack and delivers 3-6x higher throughput than existing ExecuTorch delegates on macOS. Read more >>

vLLM, Meta, Inferact: vLLM and PyTorch Work Together to Improve the Developer Experience on aarch64

Starting with PyTorch 2.11, pip install torch on aarch64 Linux now installs CUDA-enabled wheels directly from PyPI. The change eliminates the custom indexes and workarounds previously required on GH200, GB200, and GB300 systems. Read here >>

Arm: Efficient Edge AI on Arm CPUs and NPUs: Understanding ExecuTorch through Practical Labs

The post and labs introduce both CPU and NPU inference, across Cortex-A and Cortex-M + Ethos-U platforms, and showcase use of Model Explorer adapters, developed by Arm, to gain visibility into model deployment with ExecuTorch. Read more >>

IBM: IBM Research uses vLLM at the heart of its RITS Platform

This blog looks at the Research Inference & Tuning Service (RITS) Platform, a platform which provides centralized deployment of and shared access to Model Inferencing Endpoints and “Ancillary” Tuning Service Endpoints. Read more >>

SSAIL Lab, University of Illinois Urbana-Champaign, Anyscale, Snowflake: Introducing AutoSP

Learn how AutoSP automatically converts standard transformer training code into sequence-parallel code for long-context LLM training across multiple GPUs. Read more >>

Lightseek Foundation: SMG – The Case for Disaggregating CPU from GPU in LLM Serving

Read about how Shepherd Model Gateway (SMG) was designed with one principle in mind – GPUs should do tensor math, everything else belongs in a dedicated serving layer. Read more >>

Meta: In-Kernel Broadcast Optimization: Co-Designing Kernels for RecSys Inference

Traditional RecSys inference explicitly replicates shared user embeddings/sequences for every candidate. In-Kernel Broadcast Optimization (IKBO) eliminates this overhead via a kernel-model-system co-design that fuses broadcast logic directly into user-candidate interaction kernels. Read here >>

Subscribe to the PyTorch Newsletter
Get updates directly to your inbox: https://pytorch.org/newsletter/

Announcements

Upcoming Events

Recent Events

In the News

Latest Blogs

Docs

Tutorials

Resources

Stay in touch for updates, event info, and the latest news