Getting started with PyTorch, ExecuTorch, and Ethos-U85 in three easy steps Blog Getting started with PyTorch, ExecuTorch, and Ethos-U85 in three easy steps ExecuTorch support for Ethos-U85 In the rapidly evolving landscape of machine learning, PyTorch has emerged…Robert Elliott, Fredrik Knutsson, and Mark QuartermainOctober 28, 2024
Intel GPU Support Now Available in PyTorch 2.5 Blog Intel GPU Support Now Available in PyTorch 2.5 Support for Intel GPUs is now available in PyTorch® 2.5, providing improved functionality and performance…PyTorch Team at IntelOctober 25, 2024
ExecuTorch Beta: On-Device AI and LLMs, Stability, and Acceleration with Partners Blog ExecuTorch Beta: On-Device AI and LLMs, Stability, and Acceleration with Partners ExecuTorch has achieved Beta status with the release of v0.4, providing stable APIs and runtime,…PyTorch FoundationOctober 24, 2024
TorchRec and FBGEMM 1.0 Stable Release Blog TorchRec and FBGEMM 1.0 Stable Release We are happy to announce the stable release, 1.0, for TorchRec and FBGEMM. TorchRec is the PyTorch native…Paul Zhang, Zain Huda, Sarunya Pumma, Shintaro Iwasaki, Supadchaya Puangpontip, Benson MaOctober 23, 2024
PyTorch 2.5 Release Blog Blog PyTorch 2.5 Release Blog We are excited to announce the release of PyTorch® 2.5 (release note)! This release features…PyTorch FoundationOctober 17, 2024
The Path to Achieve PyTorch Performance Boost on Windows CPU Blog The Path to Achieve PyTorch Performance Boost on Windows CPU The challenge of PyTorch’s lower CPU performance on Windows compared to Linux has been a…Intel CorporationOctober 15, 2024
PyTorch Foundation Technical Advisory Council Elects New Leadership Blog PyTorch Foundation Technical Advisory Council Elects New Leadership We are pleased to announce the first-ever Chair and Vice Chair of the PyTorch Foundation’s…PyTorch FoundationOctober 8, 2024
Challenges and Efforts in PyTorch Multi-Device Integration: Compatibility, Portability, and Integration Efficiencies Blog Challenges and Efforts in PyTorch Multi-Device Integration: Compatibility, Portability, and Integration Efficiencies Introduction As the demand for diverse hardware accelerators grows, the need for a robust and…Zesheng Zong (Huawei), Jiawei Li (Huawei) | Co-authors: Jiong Gong (Intel), Bartosz Sochacki (Intel), Eikan Wang (Intel)September 18, 2024
CUDA-Free Inference for LLMs Blog CUDA-Free Inference for LLMs In this blog, we discuss the methods we used to achieve FP16 inference with popular…Adnan Hoque, Less Wright, Raghu Ganti and Mudhakar SrivatsaSeptember 4, 2024
Accelerate Your AI: PyTorch 2.4 Now Supports Intel GPUs for Faster Workloads Blog Accelerate Your AI: PyTorch 2.4 Now Supports Intel GPUs for Faster Workloads We have exciting news! PyTorch 2.4 now supports Intel® Data Center GPU Max Series and…the PyTorch Team at IntelAugust 29, 2024
Enabling Fast Gradient Clipping and Ghost Clipping in Opacus Blog Enabling Fast Gradient Clipping and Ghost Clipping in Opacus Introduction and Context Differentially Private Stochastic Gradient Descent (DP-SGD) is the canonical method for training machine…Enayat Ullah, Huanyu Zhang, Will Bullock, Ilya MironovAugust 20, 2024
FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention Blog FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention In theory, Attention is All You Need. In practice, however, we also need optimized attention…Team PyTorch: Horace He, Driss Guessous, Yanbo Liang, Joy DongAugust 7, 2024
Quantization-Aware Training for Large Language Models with PyTorch Blog Quantization-Aware Training for Large Language Models with PyTorch In this blog, we present an end-to-end Quantization-Aware Training (QAT) flow for large language models…Andrew Or, Jerry Zhang, Evan Smothers, Kartikay Khandelwal, Supriya RaoJuly 30, 2024
PyTorch 2.4 Release Blog Blog PyTorch 2.4 Release Blog We are excited to announce the release of PyTorch® 2.4 (release note)! PyTorch 2.4 adds…PyTorch FoundationJuly 24, 2024
Deep Dive on the Hopper TMA Unit for FP8 GEMMs Blog Deep Dive on the Hopper TMA Unit for FP8 GEMMs Abstract The Hopper (H100) GPU architecture, billed as the “first truly asynchronous GPU”, includes a…Adnan Hoque, Less Wright, Chih-Chieh YangJuly 22, 2024
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision Blog FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision Attention, as a core layer of the ubiquitous Transformer architecture, is a bottleneck for large…Jay Shah and Ganesh Bikshandi, Colfax Research, Ying Zhang, Meta, Vijay Thakkar and Pradeep Ramani, NVIDIA, Tri Dao, TogetherAI and Princeton UniversityJuly 11, 2024
Learn how to develop Android applications with ExecuTorch and Llama models Blog Learn how to develop Android applications with ExecuTorch and Llama models This blog is courtesy of the PyTorch team at Arm. More details can be found here.…ArmJuly 10, 2024
Accelerated PyTorch inference with torch.compile on AWS Graviton processors Blog Accelerated PyTorch inference with torch.compile on AWS Graviton processors Summary Originally PyTorch, used an eager mode where each PyTorch operation that forms the model…Sunita NadampalliJuly 9, 2024
Training MoEs at Scale with PyTorch Blog Training MoEs at Scale with PyTorch Over the past year, Mixture of Experts (MoE) models have surged in popularity, fueled by…Brian Chu, Mihir Patel, Less Wright, Vitaliy Chiley, Evan Racah, Wanchao Liang, Iris Zhang, Andrew GuJune 23, 2024
Accelerating Neural Network Training with Semi-Structured (2:4) Sparsity Blog Accelerating Neural Network Training with Semi-Structured (2:4) Sparsity Over the past year, we’ve added support for semi-structured (2:4) sparsity into PyTorch. With just…Jesse Cai, Daniel Haziza, Supriya RaoJune 20, 2024