Blog

Blog

TorchRec and FBGEMM 1.0 Stable Release

We are happy to announce the stable release, 1.0, for TorchRec and FBGEMM. TorchRec is the PyTorch native…

Paul Zhang, Zain Huda, Sarunya Pumma, Shintaro Iwasaki, Supadchaya Puangpontip, Benson MaOctober 23, 2024

Blog

PyTorch 2.5 Release Blog

We are excited to announce the release of PyTorch® 2.5 (release note)! This release features…

PyTorch FoundationOctober 17, 2024

Blog

The Path to Achieve PyTorch Performance Boost on Windows CPU

The challenge of PyTorch’s lower CPU performance on Windows compared to Linux has been a…

Intel CorporationOctober 15, 2024

Blog

PyTorch Foundation Technical Advisory Council Elects New Leadership

We are pleased to announce the first-ever Chair and Vice Chair of the PyTorch Foundation’s…

PyTorch FoundationOctober 8, 2024

Blog

PyTorch Conference 2024 Recap: On Fire 🔥

The 2024 PyTorch Conference in San Francisco gathered nearly 1,500 AI researchers, developers, and enthusiasts.…

Jennifer Bly, PyTorch FoundationOctober 2, 2024

Blog

Challenges and Efforts in PyTorch Multi-Device Integration: Compatibility, Portability, and Integration Efficiencies

Introduction As the demand for diverse hardware accelerators grows, the need for a robust and…

Zesheng Zong (Huawei), Jiawei Li (Huawei) | Co-authors: Jiong Gong (Intel), Bartosz Sochacki (Intel), Eikan Wang (Intel)September 18, 2024

Blog

CUDA-Free Inference for LLMs

In this blog, we discuss the methods we used to achieve FP16 inference with popular…

Adnan Hoque, Less Wright, Raghu Ganti and Mudhakar SrivatsaSeptember 4, 2024

Blog

Accelerate Your AI: PyTorch 2.4 Now Supports Intel GPUs for Faster Workloads

We have exciting news! PyTorch 2.4 now supports Intel® Data Center GPU Max Series and…

the PyTorch Team at IntelAugust 29, 2024

Blog

Enabling Fast Gradient Clipping and Ghost Clipping in Opacus

Introduction and Context Differentially Private Stochastic Gradient Descent (DP-SGD) is the canonical method for training machine…

Enayat Ullah, Huanyu Zhang, Will Bullock, Ilya MironovAugust 20, 2024

Blog

FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention

In theory, Attention is All You Need. In practice, however, we also need optimized attention…

Team PyTorch: Driss Guessous, Yanbo Liang, Joy Dong, Horace HeAugust 7, 2024

Blog

Quantization-Aware Training for Large Language Models with PyTorch

In this blog, we present an end-to-end Quantization-Aware Training (QAT) flow for large language models…

Andrew Or, Jerry Zhang, Evan Smothers, Kartikay Khandelwal, Supriya RaoJuly 30, 2024

Blog

PyTorch 2.4 Release Blog

We are excited to announce the release of PyTorch® 2.4 (release note)! PyTorch 2.4 adds…

PyTorch FoundationJuly 24, 2024

Blog

Deep Dive on the Hopper TMA Unit for FP8 GEMMs

Abstract The Hopper (H100) GPU architecture, billed as the “first truly asynchronous GPU”, includes a…

Adnan Hoque, Less Wright, Chih-Chieh YangJuly 22, 2024

Blog

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

Attention, as a core layer of the ubiquitous Transformer architecture, is a bottleneck for large…

Jay Shah and Ganesh Bikshandi, Colfax Research, Ying Zhang, Meta, Vijay Thakkar and Pradeep Ramani, NVIDIA, Tri Dao, TogetherAI and Princeton UniversityJuly 11, 2024

Blog

Learn how to develop Android applications with ExecuTorch and Llama models

This blog is courtesy of the PyTorch team at Arm. More details can be found here.…

ArmJuly 10, 2024

Blog

Accelerated PyTorch inference with torch.compile on AWS Graviton processors

Summary Originally PyTorch, used an eager mode where each PyTorch operation that forms the model…

Sunita NadampalliJuly 9, 2024

Blog

Training MoEs at Scale with PyTorch

Over the past year, Mixture of Experts (MoE) models have surged in popularity, fueled by…

Brian Chu, Mihir Patel, Less Wright, Vitaliy Chiley, Evan Racah, Wanchao Liang, Iris Zhang, Andrew GuJune 23, 2024

Blog

Accelerating Neural Network Training with Semi-Structured (2:4) Sparsity

Over the past year, we’ve added support for semi-structured (2:4) sparsity into PyTorch. With just…

Jesse Cai, Daniel Haziza, Supriya RaoJune 20, 2024

Blog

Reducing Model Checkpointing Times by Over 10x with PyTorch Distributed Asynchronous Checkpointing

Summary: With PyTorch distributed’s new asynchronous checkpointing feature, developed with feedback from IBM, we show how…

Meta: Lucas Pasqualin, Less Wright, Iris Zhang (PyTorch), Chien-Chin Huang; IBM Research: Swaminathan Sundararaman, Saransh Gupta, Raghu GantiJune 12, 2024

Blog

INT4 Decoding GQA CUDA Optimizations for LLM Inference

An efficient decoding Grouped-Query Attention with low-precision KV cache Introduction Generative AI has taken the…

Sarunya Pumma, Jongsoo Park, Jianyu Huang, Amy Yang, Jaewon Lee, Daniel Haziza, Grigory Sizov, Jeremy Reizenstein, Jeff Johnson, Ying ZhangJune 6, 2024

TorchRec and FBGEMM 1.0 Stable Release

PyTorch 2.5 Release Blog

The Path to Achieve PyTorch Performance Boost on Windows CPU

PyTorch Foundation Technical Advisory Council Elects New Leadership

PyTorch Conference 2024 Recap: On Fire 🔥

Challenges and Efforts in PyTorch Multi-Device Integration: Compatibility, Portability, and Integration Efficiencies

CUDA-Free Inference for LLMs

Accelerate Your AI: PyTorch 2.4 Now Supports Intel GPUs for Faster Workloads

Enabling Fast Gradient Clipping and Ghost Clipping in Opacus

FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention

Quantization-Aware Training for Large Language Models with PyTorch

PyTorch 2.4 Release Blog

Deep Dive on the Hopper TMA Unit for FP8 GEMMs

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

Learn how to develop Android applications with ExecuTorch and Llama models

Accelerated PyTorch inference with torch.compile on AWS Graviton processors

Training MoEs at Scale with PyTorch

Accelerating Neural Network Training with Semi-Structured (2:4) Sparsity

Reducing Model Checkpointing Times by Over 10x with PyTorch Distributed Asynchronous Checkpointing

INT4 Decoding GQA CUDA Optimizations for LLM Inference

Docs

Tutorials

Resources

Stay in touch for updates, event info, and the latest news