Blog

Blog

PyTorch 2.11 Release Blog

We are excited to announce the release of PyTorch® 2.11 (release notes)! The PyTorch 2.11…

PyTorch FoundationMarch 23, 2026

Blog

PyTorch 2.10+TorchAO: Powering AIPC scenarios on Intel® Core™ Ultra Series 3 processors

Overview We are excited to introduce the highlights of Intel® Core™ Ultra Series 3 processors…

Intel PyTorch and Client AI SW teamMarch 20, 2026

Blog

TorchSpec: Speculative Decoding Training at Scale

Introduction Over the past year, large language models have rapidly expanded in both scale and…

TorchSpec team, Mooncake teamMarch 19, 2026

Blog

Generalized Dot-Product Attention: Tackling Real-World Challenges in GPU Training Kernels

In this blog post, we present the kernel design of Generalized Dot-Product Attention (GDPA), a…

Jackie (Jiaqi) Xu, Chao Chen, Shuqi Yang, Markus Hoehnerbach, Xiaoyi (Leo) Liu, Ted Zadouri‡, Dev (Devashish) Shankar, Jacky Zhou, Hongtao Yu, Manman Ren, Han Xu, Chunzhi Yang†, Jade Nie†, Haoyu Zhang, Huayu Li, Michael Shu, Musharaf Sultan, Max Leung, John Bocharov, Tri Dao‡March 18, 2026

Blog

Building Voice Agents with ExecuTorch: A Cross-Platform Foundation for On-Device Audio

TL;DR Open source voice models are proliferating, but there's no unified native inference platform for…

Mergen Nachin, Manuel Candales, Mengwei Liu, Jacob Szwejbka, Young Han, Songhao Jia, Stephen Jia, Scott Roy, Alban Desmaison, Hansong Zhang from PyTorch Team at Meta; Yagil Burowski, Matt Clayton, Will Burford from LM StudioMarch 15, 2026

Blog

MXFP8 Training for MoEs: 1.3x training speedup vs BF16 for Llama4 Scout on GB200 cluster using TorchAO and TorchTitan

TL;DR We recently demonstrated a +30.2% training speedup for Llama4 Scout with equivalent convergence to…

Daniel Vega-Myhre,Matthias Reso,Vasiliy Kuznetsov,Driss Guessous,Simon Fan, Alireza Shamsoshoara, Chinmay BaikarMarch 12, 2026

Blog

PyTorch at NVIDIA GTC 2026: Join Us in San Jose!

We're excited to announce that PyTorch will have a strong presence at NVIDIA GTC 2026,…

Clement Anthonioz Blanc, Chris Gottbrath, PyTorch Team at MetaMarch 9, 2026

Blog

KernelAgent: Hardware-Guided GPU Kernel Optimization via Multi-Agent Orchestration

Summary Recently, the PyTorch team released KernelAgent, an open agentic system achieving 100% correctness across…

Kaiming Cheng, Laura Wang, Jack Khuu, Mark Saroufim, Wenyuan Chi, Jiannan Wang, and Joe IsaacsonMarch 6, 2026

Blog

FlexAttention + FlashAttention-4: Fast and Flexible

TL;DR: On Hopper and Blackwell GPUs, FlexAttention now has a FlashAttention-4 backend. We added support…

Driss Guessous, Reuben Stern, Markus Hoehnerbach, Fung Xie, Ted Zadouri, Jay Shah, Tri DaoMarch 5, 2026

Blog

Deploying PyTorch Models to the Micro-Edge with ExecuTorch and Arm

The world of AI is expanding beyond the cloud, reaching devices that fit in the…

Dominica Abena Oforiwaa AmanfoMarch 5, 2026

Blog

Quantization-Aware Training in TorchAO (II)

In our previous Quantization-Aware Training (QAT) blog, we introduced the initial QAT flow in TorchAO…

Meta: Andrew Or, Lisa Jin, Scott Roy, Jerry Zhang, Mergen Nachin, Supriya Rao, Lin Xiao Unsloth: Daniel Han Axolotl: Salman MohammadiMarch 4, 2026

Blog

Enhancing Multimodal Training and Memory Efficiency with DeepSpeed

Overview This blog walks through two crucial DeepSpeed updates: (1) a PyTorch-identical backward API that…

Masahiro Tanaka (Anyscale) and Olatunji Ruwase (Snowflake)February 24, 2026

Blog

Accelerating Autotuning in Helion with Bayesian Optimization

Introduction As introduced in a previous blog post, Helion is a high-level DSL that empowers…

Ethan Che, Oguz Ulgen, Max Balandat, Jongsok Choi, Jason AnselFebruary 24, 2026

Blog

Pyrefly Now Type Checks PyTorch

We’re excited to share that PyTorch now leverages Pyrefly to power type checking across our…

PyTorch and Pyrefly Teams at MetaFebruary 12, 2026

Blog

Accelerating Mamba2 with Kernel Fusion

Summary In this post, we discuss how we optimized the Mamba-2 State-Space Dual (SSD) module…

Rishi Astra, Tri Dao, Adnan HoqueFebruary 6, 2026

Blog

Some Matrix Multiplication Engines Are Not As Accurate As We Thought

What is an accumulator in an accelerator's GEMM engine and why does it matter? GPUs…

Chi-Chun (Charlie) Liu, Monodeep Kar, Naigang Wang, Raghu Kiran Ganti, Mudhakar SrivatsaFebruary 6, 2026

Blog

Building Highly Efficient Inference System for Recommenders Using PyTorch

Why Choose PyTorch for Recommendation System PyTorch has emerged as the de facto framework in…

Lu Fang, Shiyan Deng, Hongyi Jia, Huamin Li, Ilina Mitra, Sheng Qin, Zhengkai Zhang, Zhuoran Zhao, Zinnia ZhengFebruary 5, 2026

Blog

Portable Paged Attention in Helion

Recently, the PyTorch team released Helion, a new domain-specific and PyTorch-based language to make the…

Burkhard Ringlein (IBM Research) and the vLLM Team at IBM ResearchFebruary 3, 2026

Blog Community

Unlock Reasoning in Llama 3.1-8B via Full Fine-Tuning on NVIDIA DGX Spark

What is the unsaid joy of local LLMs? The magic of downloading weights, running some…

Sanyam Bhutani (PyTorch Meta), Hamid Shojanazeri (PyTorch Meta), Clement Anthonioz Blanc (Meta)February 2, 2026

Blog

Accelerating On-Device ML Inference with ExecuTorch and Arm SME2

Interactive image segmentation has become a defining mobile experience across the world’s most popular apps.…

Jason Zhu, Tyler Mullenbach, Damien Dooley, and Gian Marco Idoice, ArmJanuary 29, 2026

PyTorch 2.11 Release Blog

PyTorch 2.10+TorchAO: Powering AIPC scenarios on Intel® Core™ Ultra Series 3 processors

TorchSpec: Speculative Decoding Training at Scale

Generalized Dot-Product Attention: Tackling Real-World Challenges in GPU Training Kernels

Building Voice Agents with ExecuTorch: A Cross-Platform Foundation for On-Device Audio

MXFP8 Training for MoEs: 1.3x training speedup vs BF16 for Llama4 Scout on GB200 cluster using TorchAO and TorchTitan

PyTorch at NVIDIA GTC 2026: Join Us in San Jose!

KernelAgent: Hardware-Guided GPU Kernel Optimization via Multi-Agent Orchestration

FlexAttention + FlashAttention-4: Fast and Flexible

Deploying PyTorch Models to the Micro-Edge with ExecuTorch and Arm

Quantization-Aware Training in TorchAO (II)

Enhancing Multimodal Training and Memory Efficiency with DeepSpeed

Accelerating Autotuning in Helion with Bayesian Optimization

Pyrefly Now Type Checks PyTorch

Accelerating Mamba2 with Kernel Fusion

Some Matrix Multiplication Engines Are Not As Accurate As We Thought

Building Highly Efficient Inference System for Recommenders Using PyTorch

Portable Paged Attention in Helion

Unlock Reasoning in Llama 3.1-8B via Full Fine-Tuning on NVIDIA DGX Spark

Accelerating On-Device ML Inference with ExecuTorch and Arm SME2

Docs

Tutorials

Resources

Stay in touch for updates, event info, and the latest news