Deep Dive on the Hopper TMA Unit for FP8 GEMMs Blog Deep Dive on the Hopper TMA Unit for FP8 GEMMs Abstract The Hopper (H100) GPU architecture, billed as the “first truly asynchronous GPU”, includes a…Adnan Hoque, Less Wright, Chih-Chieh YangJuly 22, 2024
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision Blog FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision Attention, as a core layer of the ubiquitous Transformer architecture, is a bottleneck for large…Jay Shah and Ganesh Bikshandi, Colfax Research, Ying Zhang, Meta, Vijay Thakkar and Pradeep Ramani, NVIDIA, Tri Dao, TogetherAI and Princeton UniversityJuly 11, 2024
Learn how to develop Android applications with ExecuTorch and Llama models Blog Learn how to develop Android applications with ExecuTorch and Llama models This blog is courtesy of the PyTorch team at Arm. More details can be found here.…ArmJuly 10, 2024
Accelerated PyTorch inference with torch.compile on AWS Graviton processors Blog Accelerated PyTorch inference with torch.compile on AWS Graviton processors Summary Originally PyTorch, used an eager mode where each PyTorch operation that forms the model…Sunita NadampalliJuly 9, 2024
Announcing Hacker Cup AI Track at NeurIPS 2024 AnnouncementsBlog Announcing Hacker Cup AI Track at NeurIPS 2024 The PyTorch team in partnership with Meta Hacker Cup, and Microsoft Research, are excited to…PyTorch FoundationJuly 3, 2024
Powering the AI Revolution: The PyTorch Documentary Announcements Powering the AI Revolution: The PyTorch Documentary Now live: The official PyTorch Documentary! This film unveils the authentic narrative of PyTorch’s inception, attributing…The PyTorch FoundationJune 25, 2024
Training MoEs at Scale with PyTorch Blog Training MoEs at Scale with PyTorch Over the past year, Mixture of Experts (MoE) models have surged in popularity, fueled by…Brian Chu, Mihir Patel, Less Wright, Vitaliy Chiley, Evan Racah, Wanchao Liang, Iris Zhang, Andrew GuJune 23, 2024
🎉 PyTorch Docathon H1 2024 Wrap-up 🎉 Announcements 🎉 PyTorch Docathon H1 2024 Wrap-up 🎉 We are thrilled to announce the successful completion of the H1 2024 PyTorch Docathon! The…PyTorch FoundationJune 20, 2024
Accelerating Neural Network Training with Semi-Structured (2:4) Sparsity Blog Accelerating Neural Network Training with Semi-Structured (2:4) Sparsity Over the past year, we’ve added support for semi-structured (2:4) sparsity into PyTorch. With just…Jesse Cai, Daniel Haziza, Supriya RaoJune 20, 2024
Reducing Model Checkpointing Times by Over 10x with PyTorch Distributed Asynchronous Checkpointing Blog Reducing Model Checkpointing Times by Over 10x with PyTorch Distributed Asynchronous Checkpointing Summary: With PyTorch distributed’s new asynchronous checkpointing feature, developed with feedback from IBM, we show how…Meta: Lucas Pasqualin, Less Wright, Iris Zhang (PyTorch), Chien-Chin Huang; IBM Research: Swaminathan Sundararaman, Saransh Gupta, Raghu GantiJune 12, 2024
PyTorch Foundation Welcomes New Executive Director Announcements PyTorch Foundation Welcomes New Executive Director The PyTorch Foundation is excited to welcome Matt White, our new executive director. The PyTorch…PyTorch FoundationJune 11, 2024
INT4 Decoding GQA CUDA Optimizations for LLM Inference Blog INT4 Decoding GQA CUDA Optimizations for LLM Inference An efficient decoding Grouped-Query Attention with low-precision KV cache Introduction Generative AI has taken the…Sarunya Pumma, Jongsoo Park, Jianyu Huang, Amy Yang, Jaewon Lee, Daniel Haziza, Grigory Sizov, Jeremy Reizenstein, Jeff Johnson, Ying ZhangJune 6, 2024
Ready, Set, Contribute: PyTorch Docathon Kickoff H1 2024 Announcements Ready, Set, Contribute: PyTorch Docathon Kickoff H1 2024 The PyTorch Docathon is now live! This event is dedicated to enhancing the quality of…PyTorch FoundationJune 4, 2024
AI Helps Duolingo Personalize Language Learning Case Studies AI Helps Duolingo Personalize Language Learning Learning a foreign language was probably one of your goals last year. And the year…PyTorch FoundationMay 25, 2024
Maximizing Training Throughput Using PyTorch FSDP and Torch.compile Blog Maximizing Training Throughput Using PyTorch FSDP and Torch.compile Recently, we demonstrated how FSDP and selective activation checkpointing can be used to achieve 57% MFU…Team PyTorch at IBM and Team PyTorch at MetaMay 21, 2024
Achieving Sustainability Goals with PyTorch and Intel AI Blog Achieving Sustainability Goals with PyTorch and Intel AI This post was contributed by Intel AI in partnership with the PyTorch Foundation. In 2017,…PyTorch FoundationMay 15, 2024
Speeding up ViTs using Block Sparsity Blog Speeding up ViTs using Block Sparsity TLDR: We show promising results of up to a 1.46x speedup with <2% drop in accuracy on float32…FAIR at Meta: Mostafa Elhoushi, Sensors and Systems at Meta Reality Labs Research: Syed Shakib Sarwar, Aaryan Kothapalli, Mia Kasperek, Barbara De Salvo, PyTorch at Meta: Christian Puhrsch, Jesse Cai, Joe Isaacson, Quantsight: Andrew James, Pearu Peterson, Nikita VedeneevMay 14, 2024
Introducing depyf: mastering torch.compile with ease Community Introducing depyf: mastering torch.compile with ease We are thrilled to introduce depyf, a new project to the PyTorch ecosystem designed to help…Kaichao YouMay 11, 2024
Deep Learning Energy Measurement and Optimization Community Deep Learning Energy Measurement and Optimization This post is authored by Jae-Won Chung, a PhD student at the University of Michigan and…Jae-Won ChungMay 11, 2024
Enhancing Deep Learning Workflows: PyTorch Ecosystem Tools AnnouncementsCommunity Enhancing Deep Learning Workflows: PyTorch Ecosystem Tools Welcome to the thriving PyTorch ecosystem, where a wealth of tools and libraries await, purpose-built…PyTorch FoundationMay 11, 2024