
Welcome to the May edition of the PyTorch Foundation Newsletter. This month, we reflect on the success of PyTorch Conference Europe 2026 by shining a spotlight on some of the most popular sessions from our members. We also look at the 2.12 release, upcoming events, webinars, and new technical blogs.
Announcements
Calling all PyTorch enthusiasts!
Nominations are open for the PyTorch Foundation Ambassador Program. Our ambassador program supports passionate local community leaders who educate, advocate for, and build with PyTorch. Ambassadors help grow PyTorch Foundation projects around the world by organizing events, creating educational content, mentoring contributors, and strengthening local communities. We especially encourage applications from contributors across Africa, Latin America, the Middle East, Oceania, Southeast Asia, and Eastern Europe as we continue expanding Ambassador representation across more regions and local communities. Interested in joining the program? Put yourself forward by June 18th:
🔥PyTorch Ambassador Program Application🔥
Upcoming Events

PyTorch 2.12 Release Live Q&A 🎬
PyTorch 2.12 includes major updates across compilation, distributed systems, export, graph capture, and accelerator support.
Join Andrey Talman, Alban Desmaison, Joe Spisak, and moderator Chris Gottbrath on Wednesday, May 20 at 10 AM PT for a technical overview of the release and a live Q&A. Register now >>

May 18-22, visit PyTorch Foundation at booth #20 and attend Matt White’s lightning talk to learn how PyTorch is accelerating open source AI. The ExecuTorch team also presented their award-winning paper in Tuesday morning’s Best Paper Session. 🏆

In “Writing Performance-Portable Kernels Simplified with Helion,” attendees will write, autotune, and run real Helion kernels live. June 15-19 in Boulder, CO.

KubeCon + CloudNativeCon + OpenInfra Summit + PyTorch Conference China 2026
PyTorch Conference China 2026 will take place September 8-9 in Shanghai as part of KubeCon + CloudNativeCon + OpenInfra Summit + PyTorch Conference China 2026.
The event brings together the OpenInfra, cloud native, and AI communities to advance open source infrastructure, machine learning, and next-generation AI across shared industry use cases. Registration is now open, and the full schedule will be announced in June.

PyTorch Conference North America 2026
PyTorch Conference North America 2026 comes to San Jose on October 20-21. Hosted by PyTorch Foundation, #PyTorchCon brings together researchers, developers, and engineers to explore PyTorch Foundation projects, including PyTorch, vLLM, DeepSpeed, and Ray through technical talks, hands-on workshops, and keynotes spanning the full AI stack.
Submit a session proposal by 11:59 PM PDT on Sunday, June 7, and register by July 31 to secure early bird pricing.
Recent Events
For the past two weeks, we have hosted a hackathon-style community event focused on improving PyTorch documentation. We were pleased to see such a great turnout since great docs are a force multiplier: they help new users ramp faster, help experienced practitioners apply advanced features correctly, and shorten the path from research to production in machine learning. 👉Stay tuned for winner announcements soon, and in the meantime, view the leaderboard.

Member Highlights from PyTorch Conference Europe 2026
We had some fantastic sessions at PyTorch Conference Europe 2026, a testament to PyTorch Foundation projects’ continued momentum. Check out some of the highlights from PyTorch members below, and 📹 view all sessions in the conference playlist.
- Arm: Optimizing CPU LLM Inference in PyTorch: Lessons From vLLM – Crefeda Rodrigues & Fadi Arafeh
- AMD: Brevitas Quantization Library – Pablo Monteagudo Lago
- AWS: Lightning Talk: Cross-Region Model Serving: PyTorch Inference, Observability & LLMOps – Suraj Muraleedharan
- Google: Lightning Talk: Bringing Google’s Colossus to PyTorch: Rapid Storage via fsspec to Keep GPUs Busy – Ankita Luthra & Trinadh Kotturu
- Hugging Face: torch.compile and Diffusers: A Hands-On Guide to Peak Performance – Sayak Paul
- IBM: Lightning Talk: TerraKit: Standardising AI-Ready Geospatial Data Preparation for the TorchGeo Ecosystem – Rosie Lickorish & Romeo Kienzler
- Lightning AI: Model-Changing Transforms With Torch.compile – Thomas Viehmann
- Meta: Enabling State-of-the-art Asynchronous Execution in Torch.compile With CUDA Streams – Michael Lazos
- Microsoft: Tour De Force: LLM Inference Optimization From Simple To Sophisticated – Christin Pohl
- NVIDIA: The Science and Practice of Open and Scalable LLM Evaluations – Grzegorz Chlebus
In the News
- There’s growing pressure on the PyTorch Foundation. New Executive Director Mark Collier tells us his priorities, The Stack
- PyTorch Foundation expands its AI stack with Safetensors, ExecuTorch, and Helion, The New Stack
Latest Blogs

Meta: Running PyTorch Models on Apple Silicon GPUs with the ExecuTorch MLX Delegate
ExecuTorch’s new MLX delegate enables GPU-accelerated inference for PyTorch models on Apple Silicon Macs using Apple’s MLX framework. The experimental backend integrates with the PyTorch 2 export stack and delivers 3-6x higher throughput than existing ExecuTorch delegates on macOS. Read more >>

vLLM, Meta, Inferact: vLLM and PyTorch Work Together to Improve the Developer Experience on aarch64
Starting with PyTorch 2.11, pip install torch on aarch64 Linux now installs CUDA-enabled wheels directly from PyPI. The change eliminates the custom indexes and workarounds previously required on GH200, GB200, and GB300 systems. Read here >>

Arm: Efficient Edge AI on Arm CPUs and NPUs: Understanding ExecuTorch through Practical Labs
The post and labs introduce both CPU and NPU inference, across Cortex-A and Cortex-M + Ethos-U platforms, and showcase use of Model Explorer adapters, developed by Arm, to gain visibility into model deployment with ExecuTorch. Read more >>

IBM: IBM Research uses vLLM at the heart of its RITS Platform
This blog looks at the Research Inference & Tuning Service (RITS) Platform, a platform which provides centralized deployment of and shared access to Model Inferencing Endpoints and “Ancillary” Tuning Service Endpoints. Read more >>

SSAIL Lab, University of Illinois Urbana-Champaign, Anyscale, Snowflake: Introducing AutoSP
Learn how AutoSP automatically converts standard transformer training code into sequence-parallel code for long-context LLM training across multiple GPUs. Read more >>

Lightseek Foundation: SMG – The Case for Disaggregating CPU from GPU in LLM Serving
Read about how Shepherd Model Gateway (SMG) was designed with one principle in mind – GPUs should do tensor math, everything else belongs in a dedicated serving layer. Read more >>

Meta: In-Kernel Broadcast Optimization: Co-Designing Kernels for RecSys Inference
Traditional RecSys inference explicitly replicates shared user embeddings/sequences for every candidate. In-Kernel Broadcast Optimization (IKBO) eliminates this overhead via a kernel-model-system co-design that fuses broadcast logic directly into user-candidate interaction kernels. Read here >>
Subscribe to the PyTorch Newsletter
Get updates directly to your inbox: https://pytorch.org/newsletter/

