Monarch: an API to your supercomputer Blog Monarch: an API to your supercomputer Getting distributed training jobs to run on huge clusters is hard! This is especially true…The PyTorch Team at MetaApril 8, 2026
SOTA Normalization Performance with torch.compile Blog SOTA Normalization Performance with torch.compile Introduction Normalization methods (LayerNorm/RMSNorm) are foundational in deep learning and are used to normalize values…Shunting Zhang, Paul Zhang, Elias Ellison, Markus Hoehnerbach, Jason Ansel, Natalia GimelsheinApril 8, 2026
ExecuTorch Becomes a Part of PyTorch Core to Expand On-Device Inference Capabilities AnnouncementsBlog ExecuTorch Becomes a Part of PyTorch Core to Expand On-Device Inference Capabilities Today, we’re excited to share that ExecuTorch is becoming a part of PyTorch Core. ExecuTorch…PyTorch FoundationApril 7, 2026
PyTorch Foundation Welcomes Helion as a Foundation-Hosted Project to Standardize Open, Portable, and Accessible AI Kernel Authoring AnnouncementsBlogPress Release PyTorch Foundation Welcomes Helion as a Foundation-Hosted Project to Standardize Open, Portable, and Accessible AI Kernel Authoring Helion joins community of leading open source AI projects to simplify kernel development across the…PyTorch FoundationApril 7, 2026
Generating State-of-the-Art GEMMs with TorchInductor’s CuteDSL backend Blog Generating State-of-the-Art GEMMs with TorchInductor’s CuteDSL backend Introduction TorchInductor currently supports three autotuning backends for matrix multiplications: Triton, CUTLASS (C++), and cuBLAS.…Nikhil Patel, Michael Lazos, Driss Guessous, Elias Ellison, MetaApril 7, 2026
RSVP for the 2026 PyTorch Docathon Announcements RSVP for the 2026 PyTorch Docathon We're excited to announce that the 2026 PyTorch Docathon will take place May 5-19! This…Team PyTorchApril 3, 2026
Call for Proposals Open for PyTorch Conference North America 2026 Announcements Call for Proposals Open for PyTorch Conference North America 2026 Submit a Session Proposal or Register Now to secure Super Early Bird pricing for PyTorch…PyTorch FoundationApril 2, 2026
PyTorch Ecosystem Landscape Welcomes PhysicsNeMo, Unsloth, ONNX, and KTransformers AnnouncementsBlog PyTorch Ecosystem Landscape Welcomes PhysicsNeMo, Unsloth, ONNX, and KTransformers The PyTorch Ecosystem Working Group is happy to welcome several new projects to the PyTorch…PyTorch Ecosystem Working GroupApril 2, 2026
Flight Recorder: A New Lens for Understanding NCCL Watchdog Timeouts Blog Flight Recorder: A New Lens for Understanding NCCL Watchdog Timeouts If you’ve ever trained a large AI model and had it fail with an error…Phillip Liu, Uttam Thakore, Junjie Wang, Justin YangMarch 25, 2026
Enabling Up to 41% Faster Pre-training: MXFP8 and DeepEP for DeepSeek-V3 on B200 with TorchTitan Blog Enabling Up to 41% Faster Pre-training: MXFP8 and DeepEP for DeepSeek-V3 on B200 with TorchTitan TL;DR In a joint effort between PyTorch and Nebius, we enabled training DeepSeek-V3 Mixture-of-Experts models…PyTorch and Nebius (Hooman Ramezani) TeamsMarch 25, 2026
PyTorch 2.11 Release Blog Blog PyTorch 2.11 Release Blog We are excited to announce the release of PyTorch® 2.11 (release notes)! The PyTorch 2.11…PyTorch FoundationMarch 23, 2026
PyTorch 2.10+TorchAO: Powering AIPC scenarios on Intel® Core™ Ultra Series 3 processors Blog PyTorch 2.10+TorchAO: Powering AIPC scenarios on Intel® Core™ Ultra Series 3 processors Overview We are excited to introduce the highlights of Intel® Core™ Ultra Series 3 processors…Intel PyTorch and Client AI SW teamMarch 20, 2026
TorchSpec: Speculative Decoding Training at Scale Blog TorchSpec: Speculative Decoding Training at Scale Introduction Over the past year, large language models have rapidly expanded in both scale and…TorchSpec team, Mooncake teamMarch 19, 2026
Generalized Dot-Product Attention: Tackling Real-World Challenges in GPU Training Kernels Blog Generalized Dot-Product Attention: Tackling Real-World Challenges in GPU Training Kernels In this blog post, we present the kernel design of Generalized Dot-Product Attention (GDPA), a…Jackie (Jiaqi) Xu, Chao Chen, Shuqi Yang, Markus Hoehnerbach, Xiaoyi (Leo) Liu, Ted Zadouri‡, Dev (Devashish) Shankar, Jacky Zhou, Hongtao Yu, Manman Ren, Han Xu, Chunzhi Yang†, Jade Nie†, Haoyu Zhang, Huayu Li, Michael Shu, Musharaf Sultan, Max Leung, John Bocharov, Tri Dao‡March 18, 2026
Building Voice Agents with ExecuTorch: A Cross-Platform Foundation for On-Device Audio Blog Building Voice Agents with ExecuTorch: A Cross-Platform Foundation for On-Device Audio TL;DR Open source voice models are proliferating, but there's no unified native inference platform for…Mergen Nachin, Manuel Candales, Mengwei Liu, Jacob Szwejbka, Young Han, Songhao Jia, Stephen Jia, Scott Roy, Alban Desmaison, Hansong Zhang from PyTorch Team at Meta; Yagil Burowski, Matt Clayton, Will Burford from LM StudioMarch 15, 2026
MXFP8 Training for MoEs: 1.3x training speedup vs BF16 for Llama4 Scout on GB200 cluster using TorchAO and TorchTitan Blog MXFP8 Training for MoEs: 1.3x training speedup vs BF16 for Llama4 Scout on GB200 cluster using TorchAO and TorchTitan TL;DR We recently demonstrated a +30.2% training speedup for Llama4 Scout with equivalent convergence to…Daniel Vega-Myhre,Matthias Reso,Vasiliy Kuznetsov,Driss Guessous,Simon Fan, Alireza Shamsoshoara, Chinmay BaikarMarch 12, 2026
KubeCon + CloudNativeCon + OpenInfra Summit + PyTorch Conference China 2026 CFP & Registration Now Open AnnouncementsBlog KubeCon + CloudNativeCon + OpenInfra Summit + PyTorch Conference China 2026 CFP & Registration Now Open Both the Call for Proposals and registration are now open for KubeCon + CloudNativeCon +…PyTorch FoundationMarch 11, 2026
PyTorch at NVIDIA GTC 2026: Join Us in San Jose! Blog PyTorch at NVIDIA GTC 2026: Join Us in San Jose! We're excited to announce that PyTorch will have a strong presence at NVIDIA GTC 2026,…Clement Anthonioz Blanc, Chris Gottbrath, PyTorch Team at MetaMarch 9, 2026
KernelAgent: Hardware-Guided GPU Kernel Optimization via Multi-Agent Orchestration Blog KernelAgent: Hardware-Guided GPU Kernel Optimization via Multi-Agent Orchestration Summary Recently, the PyTorch team released KernelAgent, an open agentic system achieving 100% correctness across…Kaiming Cheng, Laura Wang, Jack Khuu, Mark Saroufim, Wenyuan Chi, Jiannan Wang, and Joe IsaacsonMarch 6, 2026
FlexAttention + FlashAttention-4: Fast and Flexible Blog FlexAttention + FlashAttention-4: Fast and Flexible TL;DR: On Hopper and Blackwell GPUs, FlexAttention now has a FlashAttention-4 backend. We added support…Driss Guessous, Reuben Stern, Markus Hoehnerbach, Fung Xie, Ted Zadouri, Jay Shah, Tri DaoMarch 5, 2026