Accelerating 2K scale pre-training up to 1.28x with TorchAO, MXFP8 and TorchTitan on Crusoe B200 Cluster Blog Accelerating 2K scale pre-training up to 1.28x with TorchAO, MXFP8 and TorchTitan on Crusoe B200 Cluster tldr: 1.22x - 1.28x training acceleration with MXFP8, equivalent convergence compared to BF16. We recently…Less Wright, Vasiliy Kuznetsov, Daniel Vega-Myhre, Driss Guessous, Hamid Shojanazeri, Elias Ellison, Martin Cala, Ethan PetersenSeptember 3, 2025
Startup Showcase Returns to the PyTorch Conference October 21 in San Francisco Announcements Startup Showcase Returns to the PyTorch Conference October 21 in San Francisco The Startup Showcase returns to the PyTorch Conference on Tuesday, October 21, 2025, spotlighting the…PyTorch FoundationAugust 28, 2025
A Primer on LLM Post-Training Blog A Primer on LLM Post-Training Large Language Models (LLMs) have revolutionized how we write and consume documents. In the past…Davide TestuggineAugust 26, 2025
DRAMA Model Inference Efficiency Boosted by 1.7x-2.3x Blog DRAMA Model Inference Efficiency Boosted by 1.7x-2.3x TL;DR NJTs (Nested Jagged Tensors) boost DRAMA model inference efficiency by 1.7x-2.3x, making it more…Shreya GoyalAugust 22, 2025
ZenFlow: Stall-Free Offloading Engine for LLM Training Blog ZenFlow: Stall-Free Offloading Engine for LLM Training Introduction ZenFlow is a new extension to DeepSpeed introduced in summer 2025, designed as a…Tingfeng Lan, Yusen Wu, Bin Ma, Zhaoyuan Su, Rui Yang, Tekin Bicer, Masahiro Tanaka, Olatunji Ruwase, Dong Li, Yue ChengAugust 20, 2025
Accelerating MoE’s with a Triton Persistent Cache-Aware Grouped GEMM Kernel Blog Accelerating MoE’s with a Triton Persistent Cache-Aware Grouped GEMM Kernel In this post, we present an optimized Triton BF16 Grouped GEMM kernel for running training…Less Wright, Adnan Hoque, Garrett GoonAugust 18, 2025
Open Source AI Week Heads to the San Francisco Bay Area in October 2025 Announcements Open Source AI Week Heads to the San Francisco Bay Area in October 2025 Mark your calendars! The inaugural Open Source AI Week is coming to the San Francisco…PyTorch FoundationAugust 15, 2025
PyTorch Wheel Variants, the Frontier of Python Packaging Blog PyTorch Wheel Variants, the Frontier of Python Packaging charliemarsh’s tweet, creator of uv PyTorch is the leading machine learning framework for developing and…Eli UriegasAugust 13, 2025
PyTorch Day China Recap BlogCommunity PyTorch Day China Recap On June 7, 2025, PyTorch Day China was held in Beijing, co-hosted by PyTorch Foundation…PyTorch FoundationAugust 12, 2025
Introducing Mixed Precision Training in Opacus Blog Introducing Mixed Precision Training in Opacus Introduction We integrate mixed and low-precision training with Opacus to unlock increased throughput and training…Iden Kalemaj, Huanyu ZhangAugust 12, 2025
Bringing Generative AI to the Masses with ExecuTorch and KleidiAI Blog Bringing Generative AI to the Masses with ExecuTorch and KleidiAI Key Takeaways: ExecuTorch 0.7 now enables KleidiAI by default, delivering automatic acceleration on Arm CPUs…Gian Marco Iodice, GenAI Engineering Lead, Arm, Mary Bennion, Director Ecosystem, Arm, Digant Desai, Software Engineer, MetaAugust 11, 2025
vLLM Beijing Meetup: Advancing Large-scale LLM Deployment BlogCommunity vLLM Beijing Meetup: Advancing Large-scale LLM Deployment On August 2, 2025, Tencent’s Beijing Headquarters hosted a major event in the field of…vLLM TeamAugust 7, 2025
Advancing Low-Bit Operators in PyTorch and ExecuTorch: Dynamic Kernel Selection, KleidiAI, and Quantized Tied Embeddings Blog Advancing Low-Bit Operators in PyTorch and ExecuTorch: Dynamic Kernel Selection, KleidiAI, and Quantized Tied Embeddings TorchAO brings high-performance low-bit linear and embedding operators to Arm CPUs. In this update, we’re…Scott Roy, Digant Desai, Ed Miller, Gian Marco Iodice, Ronan NaughtonAugust 7, 2025
PyTorch 2.8 Release Blog Blog PyTorch 2.8 Release Blog We are excited to announce the release of PyTorch® 2.8 (release notes)! This release features: …PyTorch FoundationAugust 6, 2025
The AI future Takes Center Stage: PyTorch Conference Keynote Speakers Announced Announcements The AI future Takes Center Stage: PyTorch Conference Keynote Speakers Announced Get ready, San Francisco. The keynote lineup for PyTorch Conference is officially here and it's…PyTorch FoundationAugust 6, 2025
Nominations Open for the 2025 PyTorch Contributor Awards AnnouncementsBlog Nominations Open for the 2025 PyTorch Contributor Awards Nominations are now open for the 2025 PyTorch Contributor Awards! These awards shine a spotlight…PyTorch FoundationJuly 31, 2025
PyTorch on Kubernetes: Kubeflow Trainer Joins the PyTorch Ecosystem BlogEcosystem PyTorch on Kubernetes: Kubeflow Trainer Joins the PyTorch Ecosystem We’re thrilled to announce that the Kubeflow Trainer project has been integrated into the PyTorch…Andrey Velichkevich, Apple; Yuki Iwai, CyberAgent, Inc.; Yuan Tang, Red Hat; Antonin Stefanutti, Red Hat; Johnu George, NutanixJuly 28, 2025
PyTorch Conference 2025 Schedule Announcement AnnouncementsBlog PyTorch Conference 2025 Schedule Announcement First Look at the Future of AI. The #PyTorchConf Schedule Is Here! The wait is…PyTorch FoundationJuly 23, 2025
torch.compile and Diffusers: A Hands-On Guide to Peak Performance Blog torch.compile and Diffusers: A Hands-On Guide to Peak Performance Diffusers is the go-to library that provides a unified interface to cutting-edge and open diffusion…Sayak Paul (Hugging Face), Animesh Jain (Meta), Benjamin Bossan (Hugging Face)July 17, 2025
Enabling Fully Sharded Data Parallel (FSDP2) in Opacus Blog Enabling Fully Sharded Data Parallel (FSDP2) in Opacus Introduction and Context Opacus is making significant strides in supporting private training of large-scale models…Sai Aparna Aketi, Huanyu ZhangJuly 7, 2025