Accelerating Generative AI with PyTorch: Segment Anything 2 – Fast and furious inference with low latency and fast cold starts Blog Accelerating Generative AI with PyTorch: Segment Anything 2 – Fast and furious inference with low latency and fast cold starts This post is a follow-up to our first entry in the multi-series blog focused on how…PyTorch FoundationFebruary 26, 2025
Unlocking the Latest Features in PyTorch 2.6 for Intel Platforms Blog Unlocking the Latest Features in PyTorch 2.6 for Intel Platforms PyTorch* 2.6 has just been released with a set of exciting new features including torch.compile compatibility…the Intel PyTorch TeamFebruary 11, 2025
Enabling advanced GPU features in PyTorch – Warp Specialization Blog Enabling advanced GPU features in PyTorch – Warp Specialization Meta: Hongtao Yu, Manman Ren, Bert Maher, Shane NayNVIDIA: Gustav Zhu, Shuhao Jiang Over the…Meta and NVIDIAFebruary 5, 2025
PyTorch 2.6 Release Blog Blog PyTorch 2.6 Release Blog We are excited to announce the release of PyTorch® 2.6 (release notes)! This release features…PyTorch FoundationJanuary 29, 2025
2025 Priorities for the PyTorch Technical Advisory Council (TAC) Blog 2025 Priorities for the PyTorch Technical Advisory Council (TAC) 2024 has been a year of incredible growth for PyTorch. As that continues in 2025,…Luca Antiga, PyTorch TAC ChairJanuary 28, 2025
How Intel Uses PyTorch to Empower Generative AI through Intel Arc GPUs Blog How Intel Uses PyTorch to Empower Generative AI through Intel Arc GPUs Intel has long been at the forefront of technological innovation, and its recent venture into…PyTorch FoundationJanuary 24, 2025
Accelerating LLM Inference with GemLite, TorchAO and SGLang Blog Accelerating LLM Inference with GemLite, TorchAO and SGLang Large Language Models (LLMs) are typically very resource-intensive, requiring significant amounts of memory, compute and…Teams at PyTorch, Mobius Labs and SGLangJanuary 21, 2025
GenAI Acceleration for PyTorch 2.5 on Intel® Xeon®Processors Blog GenAI Acceleration for PyTorch 2.5 on Intel® Xeon®Processors This blog is the fifth in a series focused on accelerating generative AI models with…the Intel PyTorch TeamJanuary 14, 2025
Integrating Ascend Backend with Torchtune through PyTorch Multi-Device Support Blog Integrating Ascend Backend with Torchtune through PyTorch Multi-Device Support In this blog, we will briefly introduce torchtune, the Ascend backend, and demonstrate how torchtune…Huawei PyTorch Team: Chenguang Li (Huawei), Mengqing Cao (Huawei)January 9, 2025
High-Performance Low-Bit Operators for PyTorch Blog High-Performance Low-Bit Operators for PyTorch We are excited to announce the addition of embedding operators with low-bit weights (1-8 bit)…Scott Roy, Digant Desai, Kimish PatelJanuary 6, 2025
Improve RAG performance with torch.compile on AWS Graviton Processors Blog Improve RAG performance with torch.compile on AWS Graviton Processors Large Language Models (LLMs) are trained on vast volumes of data and use billions of…Sunita Nadampalli(AWS), Ankith Gunapal(Meta), Hamid Shojanazeri(Meta)December 20, 2024
torchcodec: Easy and Efficient Video Decoding for PyTorch Blog torchcodec: Easy and Efficient Video Decoding for PyTorch We are pleased to officially announce torchcodec, a library for decoding videos into PyTorch tensors. It…PyTorch FoundationDecember 11, 2024
Accelerating 2D Dynamic Block Quantized Float8 GEMMs in Triton Blog Accelerating 2D Dynamic Block Quantized Float8 GEMMs in Triton 2D block quantization for Float8 (FP8) holds the promise of improving the accuracy of Float8…Meta: Less Wright, IBM: Adnan HoqueDecember 6, 2024
HadaCore: Tensor Core Accelerated Hadamard Transform Kernel Blog HadaCore: Tensor Core Accelerated Hadamard Transform Kernel IBM: Krish Agarwal, Rishi Astra, Adnan Hoque, Mudhakar Srivatsa, Raghu GantiMeta: Less Wright, Sijia Chen…IBM and MetaDecember 2, 2024
Supercharging Training using float8 and FSDP2 Blog Supercharging Training using float8 and FSDP2 IBM: Tuan Hoang Trong, Alexei Karve, Yan Koyfman, Linsong Chu, Divya Kumari, Shweta Salaria, Robert…IBM and MetaNovember 25, 2024
Distilling Llama3.1 8B into 1B in torchtune Blog Distilling Llama3.1 8B into 1B in torchtune In this blog, we present a case study on distilling a Llama 3.1 8B model…Linda Wang, Evan Smothers, Kartikay KhandelwalNovember 18, 2024
Deep Dive on CUTLASS Ping-Pong GEMM Kernel Blog Deep Dive on CUTLASS Ping-Pong GEMM Kernel Figure 1. FP8 GEMM Throughput Comparison CUTLASS vs Triton Summary In this post, we provide…Less Wright, Adnan HoqueNovember 1, 2024
Deploying LLMs with TorchServe + vLLM Blog Deploying LLMs with TorchServe + vLLM The vLLM engine is currently one of the top-performing ways to execute large language models…Matthias Reso, Ankith Gunapal, Simon Mo, Li Ning, Hamid ShojanazeriOctober 31, 2024
Triton Kernel Compilation Stages Blog Triton Kernel Compilation Stages The Triton open-source programming language and compiler offers a high-level, python-based approach to create efficient…Sara Kokkila-Schumacher*, Brian Vaughan*, Raghu Ganti*, and Less Wright+ (*IBM Research, +Meta)October 30, 2024
Unleashing the Power of AI on Mobile: LLM Inference for Llama 3.2 Quantized Models with ExecuTorch and KleidiAI Blog Unleashing the Power of AI on Mobile: LLM Inference for Llama 3.2 Quantized Models with ExecuTorch and KleidiAI Introduction At the recent PyTorch Conference, Arm highlighted the widespread impact of its technology, spanning from…Gian Marco Iodice, Arm and Digant Desai, MetaOctober 28, 2024