Blog

Community

Introducing depyf: mastering torch.compile with ease

We are thrilled to introduce depyf, a new project to the PyTorch ecosystem designed to help…

Kaichao YouMay 11, 2024

Community

Deep Learning Energy Measurement and Optimization

This post is authored by Jae-Won Chung, a PhD student at the University of Michigan and…

Jae-Won ChungMay 11, 2024

Blog

A Hitchhiker’s Guide to Speculative Decoding

Speculative decoding is an optimization technique for inference that makes educated guesses about future tokens…

Team PyTorch at IBMMay 2, 2024

Blog

Accelerating Llama3 FP8 Inference with Triton Kernels

1.0 Summary We present an optimized Triton FP8 GEMM (General Matrix-Matrix Multiply) kernel TK-GEMM, which…

Adnan Hoque, Less Wright, Chih Chieh YangMay 1, 2024

Blog

ExecuTorch Alpha: Taking LLMs and AI to the Edge with Our Community and Partners

We are excited to announce the release of ExecuTorch alpha, focused on deploying large language models…

PyTorch FoundationApril 30, 2024

Blog

PyTorch 2.3 Release Blog

We are excited to announce the release of PyTorch® 2.3 (release note)! PyTorch 2.3 offers…

PyTorch FoundationApril 24, 2024

Blog

Accelerating MoE model inference with Locality-Aware Kernel Design

1.0 Summary We show that by implementing column-major scheduling to improve data locality, we can…

Adnan Hoque, Less Wright, Antoni Virós Martin, Chih-Chieh YangApril 4, 2024

Blog

Maximizing training throughput using PyTorch FSDP

In this blog, we demonstrate the scalability of FSDP with a pre-training exemplar, a 7B…

Team PyTorch at IBM and Team PyTorch at MetaMarch 13, 2024

Community

Exploring scientific machine learning pipelines through the SimulAI toolkit

SciML, short for Scientific Machine Learning, encompasses work that merges quantitative sciences with machine learning.…

Joao Lucas de Sousa AlmeidaFebruary 15, 2024

Community

Colossal-LLaMA-2: Low Cost and High-quality Domain-specific LLM Solution Using LLaMA and Colossal-AI

The most prominent distinction between LLaMA-1 and LLaMA-2 lies in the incorporation of higher-quality corpora,…

Yang YouJanuary 29, 2024

Community

3D rotations and spatial transformations made easy with RoMa

Struggling with quaternions, rotation vectors, right-hand rules and all these stuffs? Try RoMa: an easy-to-to-use,…

Romain BrégierJanuary 25, 2024

Blog

Accelerating Generative AI with PyTorch IV: Seamless M4T, fast

This post is the fourth part of a multi-series blog focused on how to accelerate…

Yejin Lee, Carole-Jean Wu, Christian Puhrsch, Joel Schlosser, Driss Guessous, Jeffrey Wan, Joe Isaacson, Can Balioglu, Juan PinoJanuary 23, 2024

Blog

Accelerate PyTorch Models Using Quantization Techniques with Intel Extension for PyTorch

Overview PyTorch is a Python-based framework for developing deep learning models. It is one of…

IntelJanuary 18, 2024

Blog

Accelerating Triton Dequantization Kernels for GPTQ

TL;DR Leveraging a first principles approach, we showcase a step by step process undertaken to…

Less Wright, Adnan Hoque (IBM)January 16, 2024

Blog

Finetune LLMs on your own consumer hardware using tools from PyTorch and Hugging Face ecosystem

We demonstrate how to finetune a 7B parameter model on a typical consumer GPU (NVIDIA…

Younes Belkada, Marc Sun, Titus von Köller, Sourab Mangrulkar, Benjamin Bossan, Lysandre Debut, Steven LiuJanuary 10, 2024

Blog

Accelerate AI models on GPU using Amazon SageMaker multi-model endpoints with TorchServe, saving up to 75% on inference costs

Multi-model endpoints (MMEs) are a powerful feature of Amazon SageMaker designed to simplify the deployment and operation…

James Wu, Ankith Gunapal, Li Ning, Subhash Talluri, and Saurabh TrikandeJanuary 9, 2024

Community

torchdistill — a modular, configuration-driven framework for reproducible deep learning and knowledge distillation experiments

This article summarizes key features and concepts of torchdistill (v1.0.0). Refer to the official documentation…

Yoshitomo Matsubara Yoshitomo Matsubara Follow Yoshitomo Matsubara 16 Followers ex-Applied Scientist at Amazon and an ML OSS developer. PhD in Computer Science. https://yoshitomo-matsubara.net/January 4, 2024

Blog

Accelerating Generative AI Part III: Diffusion, Fast

This post is the third part of a multi-series blog focused on how to accelerate…

Sayak Paul and Patrick von Platen (Hugging Face 🤗)January 3, 2024

Blog

Understanding GPU Memory 2: Finding and Removing Reference Cycles

This is part 2 of the Understanding GPU Memory blog series. Our first post Understanding GPU…

Aaron Shi, Zachary DeVitoDecember 19, 2023

Blog

Training Production AI Models with PyTorch 2.0

1. Introduction PyTorch 2.0 (abbreviated as PT2) can significantly improve the training and inference performance of…

CK Luk, Daohang Shi, Yuzhen Huang, Jackie (Jiaqi) Xu, Jade Nie, Zhou Wang, Lu Fang, Flavio Sales Truzzi, Devashish Shankar, Dima Ivashchenko, Chunzhi Yang, Nicolas Macchioni, David Berard, Yu Guo, Xiaodong Wang, Bert Maher, Yanbo Liang, Edward Yang, Brian Hirsh, Michael Voznesensky, Animesh Jain, Michael AndersonDecember 18, 2023

Introducing depyf: mastering torch.compile with ease

Deep Learning Energy Measurement and Optimization

A Hitchhiker’s Guide to Speculative Decoding

Accelerating Llama3 FP8 Inference with Triton Kernels

ExecuTorch Alpha: Taking LLMs and AI to the Edge with Our Community and Partners

PyTorch 2.3 Release Blog

Accelerating MoE model inference with Locality-Aware Kernel Design

Maximizing training throughput using PyTorch FSDP

Exploring scientific machine learning pipelines through the SimulAI toolkit

Colossal-LLaMA-2: Low Cost and High-quality Domain-specific LLM Solution Using LLaMA and Colossal-AI

3D rotations and spatial transformations made easy with RoMa

Accelerating Generative AI with PyTorch IV: Seamless M4T, fast

Accelerate PyTorch Models Using Quantization Techniques with Intel Extension for PyTorch

Accelerating Triton Dequantization Kernels for GPTQ

Finetune LLMs on your own consumer hardware using tools from PyTorch and Hugging Face ecosystem

Accelerate AI models on GPU using Amazon SageMaker multi-model endpoints with TorchServe, saving up to 75% on inference costs

torchdistill — a modular, configuration-driven framework for reproducible deep learning and knowledge distillation experiments

Accelerating Generative AI Part III: Diffusion, Fast

Understanding GPU Memory 2: Finding and Removing Reference Cycles

Training Production AI Models with PyTorch 2.0

Docs

Tutorials

Resources

Stay in touch for updates, event info, and the latest news