June 06, 2024
INT4 Decoding GQA CUDA Optimizations for LLM Inference
An efficient decoding Grouped-Query Attention with low-precision KV cache
June 04, 2024
Ready, Set, Contribute: PyTorch Docathon Kickoff H1 2024
The PyTorch Docathon is now live! This event is dedicated to enhancing the quality of the PyTorch documentation with the invaluable assistance of our community. Our hope with this Docathon is to simplify the process for new users to get started with PyTorch, guide them in effectively utilizing its features, and ultimately expedite the transition from research to production in machine learning.
May 21, 2024
Maximizing Training Throughput Using PyTorch FSDP and Torch.compile
Recently, we demonstrated how FSDP and selective activation checkpointing can be used to achieve 57% MFU (Model Flops Utilization) for training a 7B model on A100 GPUs. We also demonstrated how it can train a high quality model, which we open sourced as Granite 7B base model on Hugging Face Hub under the Apache v2.0 license.
May 15, 2024
Achieving Sustainability Goals with PyTorch and Intel AI
This post was contributed by Intel AI in partnership with the PyTorch Foundation.
May 14, 2024
Speeding up ViTs using Block Sparsity
TLDR: We show promising results of up to a 1.46x speedup with <2% drop in accuracy on float32 Vision Transformers on A100 GPUs by applying block sparsity on MLP module’s weights. This approach can potentially be applied to other types of transformers including large language models. Our implementation and benchmarks to reproduce our results are available at https://github.com/pytorch-labs/superblock.
May 02, 2024
A Hitchhiker’s Guide to Speculative Decoding
Speculative decoding is an optimization technique for inference that makes educated guesses about future tokens while generating the current token, all within a single forward pass. It incorporates a verification mechanism to ensure the correctness of these speculated tokens, thereby guaranteeing that the overall output of speculative decoding is identical to that of vanilla decoding. Optimizing the cost of inference of large language models (LLMs) is arguably one of the most critical factor...
May 02, 2024
Announcing PyTorch Docathon June, 2024
We are thrilled to announce the upcoming PyTorch Docathon in June! The Docathon, akin to a hackathon, is an event dedicated to enhancing the quality of the PyTorch documentation with the invaluable assistance of our community. Documentation is a vital component of any technology. By refining it, we can simplify the process for new users to get started with PyTorch, guide them in effectively utilizing its features, and ultimately expedite the transition from research to production in machine l...