Blog

Blog

Using Muon Optimizer with DeepSpeed

TL;DR DeepSpeed now supports Muon Optimizer! Muon Optimizer has gained great momentum with significant adoption…

Zhipeng Wang, Guokai Ma, Peng Du and Chi McIsaac, DeepSpeed teamJune 3, 2026

Blog Case Studies

How LinkedIn Uses PyTorch to Solve Extreme-Scale Optimization Problems

TL;DR: This case study demonstrates how LinkedIn re-architected its distributed linear programming solver, DuaLip, by…

Aida Rahmattalabi, Sanjana Garg, Gregory Dexter, Zhipeng Wang, Ruby Tu, Yuan Gao, Yi ZhangJune 1, 2026

Blog

Why Is PyTorch Compile So Fast: Kernel Fusion

When you use PyTorch's compiler, your model runs faster, up to 10x faster. But what's…

Morrison TurnanskyMay 27, 2026

Blog

Up to 580tps! New Speed Record of Qwen3.5-397B-A17B on GPU for Agentic Workloads with TokenSpeed

TL;DR: The TokenSpeed inference engine achieved a record-breaking 580 tps running the Qwen3.5-397B-A17B model on…

TokenSpeed Team, Qwen TeamMay 27, 2026

PyTorch Foundation Welcomes Alibaba Cloud

Announcements Blog

Alibaba Cloud Joins the PyTorch Foundation as a Platinum Member

The PyTorch Foundation, a community-driven hub for open source AI under the Linux Foundation, is…

PyTorch FoundationMay 26, 2026

Blog

TLX Block Attention: A Warp-Specialized Blackwell Kernel for Fixed-Block Sparse Self-Attention

Code available at: https://github.com/facebookresearch/ads_model_kernel_library In this post, we present the design of TLX Block Attention…

Jake Siso, Dev (Devashish) Shankar, Jackie (Jiaqi) Xu, Jacky Zhou, Darren Liu, Han Xu, Yasmine Badr, Dan Chanpuriya, Hongtao Yu, Huayu Li, Ernest Wang, Shuo Chang, Max LeungMay 26, 2026

Announcements Blog

Join the PyTorch Foundation Ambassador Program: A Global Network of Community Leaders

A little over a year ago, the PyTorch Foundation launched the Ambassador Program, an initiative…

PyTorch FoundationMay 22, 2026

PyTorch Docathon 2026 Top Community Contributors

Announcements Blog

PyTorch Docathon 2026 Results in 150+ Merged Pull Requests

Thank you to everyone who participated in the PyTorch Docathon 2026! Once again, the community…

PyTorch FoundationMay 20, 2026

Blog

vLLM and PyTorch Work Together to Improve the Developer Experience on aarch64

TLDR: PyTorch 2.11 makes it possible to install CUDA-enabled PyTorch wheels on aarch64 Linux directly…

Kaichao You (Inferact)May 18, 2026

Blog

Running PyTorch Models on Apple Silicon GPUs with the ExecuTorch MLX Delegate

TL;DR: Introducing the ExecuTorch MLX Delegate The new MLX delegate enables optimized, GPU-accelerated inference for…

ExecuTorch TeamMay 18, 2026

Blog

PyTorch 2.12 Release Blog

We are excited to announce the release of PyTorch® 2.12 (release notes)! The PyTorch 2.12…

PyTorch FoundationMay 13, 2026

Blog

Efficient Edge AI on Arm CPUs and NPUs: Understanding ExecuTorch through Practical Labs

TL;DR: ExecuTorch extends the PyTorch ecosystem to deliver local AI inference on constrained edge devices.…

Matt CossinsMay 12, 2026

Blog

In-Kernel Broadcast Optimization: Co-Designing Kernels for RecSys Inference

TL;DR: Traditional RecSys inference explicitly replicates shared user embeddings/sequences for every candidate. In-Kernel Broadcast Optimization…

Jian Jiao, Boda Li, Hongtao Yu, Yuanwei (Kevin) Fang, Zhengkai Zhang, Zhuoran Zhao, Yuxin Chen, Sijia Chen†, Yang Chen†, Zijian Shen, Shuyao Bi, Ao Cai, Junhan Hu†, Shuqi Yang†, Wei Wei, Lu Fang, Rengan Xu, Manman Ren, Alex Zhong, Xiaohan Wei, Zeliang Chen, Ellie Wen, Wenlin ChenMay 5, 2026

Blog

SMG: The Case for Disaggregating CPU from GPU in LLM Serving

How It Started: Hitting the GIL Wall at Scale We've been running production model serving…

Simo Lin, Chang Su, and Keyang Ru, members of LightSeek FoundationApril 30, 2026

Blog

Introducing AutoSP

¹ SSAIL Lab, University of Illinois Urbana-Champaign, ² Anyscale, ³ Snowflake TL;DR: AutoSP automatically converts…

Ahan Gupta¹, Zhihao Wang¹, Neel Dani¹, Masahiro Tanaka², Olatunji Ruwase³, Minjia Zhang¹April 29, 2026

Blog Case Studies

IBM Research uses vLLM at the heart of its RITS Platform

TL;DR: vLLM has been critical to democratizing access to our research community to the latest…

PyTorch FoundationApril 24, 2026

Blog

Optimizing Effective Training Time for Meta’s Internal Recommendation/Ranking Workloads

Motivation and Introduction Across the industry, teams training and serving large AI models face aggressive…

Ruilin Chen, Yuzhen Huang, Hang Qi, Mingming Ding, Damian Reeves, Boris Sarana, Kevin Tang, Satendra Gera, Gagan Jain, Sahil Shah, Oguz Ulgen, Mayank Garg, Meet Vadakkanchery, James March, Sophie Lin, Wei SunApril 17, 2026

Announcements Blog

PyTorch Conference Europe 2026: A Landmark Moment for Open Source AI in Paris

The first-ever PyTorch Conference Europe April 7-8, 2026 brought together more than 600 researchers, developers,…

PyTorch FoundationApril 15, 2026

Blog

Faster Diffusion on Blackwell: MXFP8 and NVFP4 with Diffusers and TorchAO

Diffusion models for image and video generation have been surging in popularity, delivering super-realistic visual…

Vasiliy Kuznetsov (Meta) and Sayak Paul (Hugging Face)April 8, 2026

Announcements Blog Press Release

PyTorch Foundation Announces Safetensors as Newest Contributed Project to Secure AI Model Execution

Safetensors is welcomed into the PyTorch Foundation to secure model distribution and build trusted agentic…

PyTorch FoundationApril 8, 2026

Using Muon Optimizer with DeepSpeed

How LinkedIn Uses PyTorch to Solve Extreme-Scale Optimization Problems

Why Is PyTorch Compile So Fast: Kernel Fusion

Up to 580tps! New Speed Record of Qwen3.5-397B-A17B on GPU for Agentic Workloads with TokenSpeed

Alibaba Cloud Joins the PyTorch Foundation as a Platinum Member

TLX Block Attention: A Warp-Specialized Blackwell Kernel for Fixed-Block Sparse Self-Attention

Join the PyTorch Foundation Ambassador Program: A Global Network of Community Leaders

PyTorch Docathon 2026 Results in 150+ Merged Pull Requests

vLLM and PyTorch Work Together to Improve the Developer Experience on aarch64

Running PyTorch Models on Apple Silicon GPUs with the ExecuTorch MLX Delegate

PyTorch 2.12 Release Blog

Efficient Edge AI on Arm CPUs and NPUs: Understanding ExecuTorch through Practical Labs

In-Kernel Broadcast Optimization: Co-Designing Kernels for RecSys Inference

SMG: The Case for Disaggregating CPU from GPU in LLM Serving

Introducing AutoSP

IBM Research uses vLLM at the heart of its RITS Platform

Optimizing Effective Training Time for Meta’s Internal Recommendation/Ranking Workloads

PyTorch Conference Europe 2026: A Landmark Moment for Open Source AI in Paris

Faster Diffusion on Blackwell: MXFP8 and NVFP4 with Diffusers and TorchAO

PyTorch Foundation Announces Safetensors as Newest Contributed Project to Secure AI Model Execution

Docs

Tutorials

Resources

Stay in touch for updates, event info, and the latest news