Blog

Blog

Accelerate Your AI: PyTorch 2.4 Now Supports Intel GPUs for Faster Workloads

We have exciting news! PyTorch 2.4 now supports Intel® Data Center GPU Max Series and…

the PyTorch Team at IntelAugust 29, 2024

Blog

Enabling Fast Gradient Clipping and Ghost Clipping in Opacus

Introduction and Context Differentially Private Stochastic Gradient Descent (DP-SGD) is the canonical method for training machine…

Enayat Ullah, Huanyu Zhang, Will Bullock, Ilya MironovAugust 20, 2024

Blog

FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention

In theory, Attention is All You Need. In practice, however, we also need optimized attention…

Team PyTorch: Driss Guessous, Yanbo Liang, Joy Dong, Horace HeAugust 7, 2024

Blog

Quantization-Aware Training for Large Language Models with PyTorch

In this blog, we present an end-to-end Quantization-Aware Training (QAT) flow for large language models…

Andrew Or, Jerry Zhang, Evan Smothers, Kartikay Khandelwal, Supriya RaoJuly 30, 2024

Blog

PyTorch 2.4 Release Blog

We are excited to announce the release of PyTorch® 2.4 (release note)! PyTorch 2.4 adds…

PyTorch FoundationJuly 24, 2024

Blog

Deep Dive on the Hopper TMA Unit for FP8 GEMMs

Abstract The Hopper (H100) GPU architecture, billed as the “first truly asynchronous GPU”, includes a…

Adnan Hoque, Less Wright, Chih-Chieh YangJuly 22, 2024

Blog

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

Attention, as a core layer of the ubiquitous Transformer architecture, is a bottleneck for large…

Jay Shah and Ganesh Bikshandi, Colfax Research, Ying Zhang, Meta, Vijay Thakkar and Pradeep Ramani, NVIDIA, Tri Dao, TogetherAI and Princeton UniversityJuly 11, 2024

Blog

Learn how to develop Android applications with ExecuTorch and Llama models

This blog is courtesy of the PyTorch team at Arm. More details can be found here.…

ArmJuly 10, 2024

Blog

Accelerated PyTorch inference with torch.compile on AWS Graviton processors

Summary Originally PyTorch, used an eager mode where each PyTorch operation that forms the model…

Sunita NadampalliJuly 9, 2024

Blog

Training MoEs at Scale with PyTorch

Over the past year, Mixture of Experts (MoE) models have surged in popularity, fueled by…

Brian Chu, Mihir Patel, Less Wright, Vitaliy Chiley, Evan Racah, Wanchao Liang, Iris Zhang, Andrew GuJune 23, 2024

Blog

Accelerating Neural Network Training with Semi-Structured (2:4) Sparsity

Over the past year, we’ve added support for semi-structured (2:4) sparsity into PyTorch. With just…

Jesse Cai, Daniel Haziza, Supriya RaoJune 20, 2024

Blog

Reducing Model Checkpointing Times by Over 10x with PyTorch Distributed Asynchronous Checkpointing

Summary: With PyTorch distributed’s new asynchronous checkpointing feature, developed with feedback from IBM, we show how…

Meta: Lucas Pasqualin, Less Wright, Iris Zhang (PyTorch), Chien-Chin Huang; IBM Research: Swaminathan Sundararaman, Saransh Gupta, Raghu GantiJune 12, 2024

Blog

INT4 Decoding GQA CUDA Optimizations for LLM Inference

An efficient decoding Grouped-Query Attention with low-precision KV cache Introduction Generative AI has taken the…

Sarunya Pumma, Jongsoo Park, Jianyu Huang, Amy Yang, Jaewon Lee, Daniel Haziza, Grigory Sizov, Jeremy Reizenstein, Jeff Johnson, Ying ZhangJune 6, 2024

Blog

Maximizing Training Throughput Using PyTorch FSDP and Torch.compile

Recently, we demonstrated how FSDP and selective activation checkpointing can be used to achieve 57% MFU…

Team PyTorch at IBM and Team PyTorch at MetaMay 21, 2024

Blog

Achieving Sustainability Goals with PyTorch and Intel AI

This post was contributed by Intel AI in partnership with the PyTorch Foundation. In 2017,…

PyTorch FoundationMay 15, 2024

Blog

Speeding up ViTs using Block Sparsity

TLDR: We show promising results of up to a 1.46x speedup with <2% drop in accuracy on float32…

FAIR at Meta: Mostafa Elhoushi, Sensors and Systems at Meta Reality Labs Research: Syed Shakib Sarwar, Aaryan Kothapalli, Mia Kasperek, Barbara De Salvo, PyTorch at Meta: Christian Puhrsch, Jesse Cai, Joe Isaacson, Quantsight: Andrew James, Pearu Peterson, Nikita VedeneevMay 14, 2024

Community

Introducing depyf: mastering torch.compile with ease

We are thrilled to introduce depyf, a new project to the PyTorch ecosystem designed to help…

Kaichao YouMay 11, 2024

Community

Deep Learning Energy Measurement and Optimization

This post is authored by Jae-Won Chung, a PhD student at the University of Michigan and…

Jae-Won ChungMay 11, 2024

Blog

A Hitchhiker’s Guide to Speculative Decoding

Speculative decoding is an optimization technique for inference that makes educated guesses about future tokens…

Team PyTorch at IBMMay 2, 2024

Blog

Accelerating Llama3 FP8 Inference with Triton Kernels

1.0 Summary We present an optimized Triton FP8 GEMM (General Matrix-Matrix Multiply) kernel TK-GEMM, which…

Adnan Hoque, Less Wright, Chih Chieh YangMay 1, 2024

Accelerate Your AI: PyTorch 2.4 Now Supports Intel GPUs for Faster Workloads

Enabling Fast Gradient Clipping and Ghost Clipping in Opacus

FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention

Quantization-Aware Training for Large Language Models with PyTorch

PyTorch 2.4 Release Blog

Deep Dive on the Hopper TMA Unit for FP8 GEMMs

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

Learn how to develop Android applications with ExecuTorch and Llama models

Accelerated PyTorch inference with torch.compile on AWS Graviton processors

Training MoEs at Scale with PyTorch

Accelerating Neural Network Training with Semi-Structured (2:4) Sparsity

Reducing Model Checkpointing Times by Over 10x with PyTorch Distributed Asynchronous Checkpointing

INT4 Decoding GQA CUDA Optimizations for LLM Inference

Maximizing Training Throughput Using PyTorch FSDP and Torch.compile

Achieving Sustainability Goals with PyTorch and Intel AI

Speeding up ViTs using Block Sparsity

Introducing depyf: mastering torch.compile with ease

Deep Learning Energy Measurement and Optimization

A Hitchhiker’s Guide to Speculative Decoding

Accelerating Llama3 FP8 Inference with Triton Kernels

Docs

Tutorials

Resources

Stay in touch for updates, event info, and the latest news