June 12, 2024
Reducing Model Checkpointing Times by Over 10x with PyTorch Distributed Asynchronous Checkpointing
Summary: With PyTorch distributed’s new asynchronous checkpointing feature, developed with feedback from IBM, we show how IBM Research Team is able to implement and reduce effective checkpointing time by a factor of 10-20x. Example: 7B model ‘down time’ for a checkpoint goes from an average of 148.8 seconds to 6.3 seconds, or 23.62x faster.
June 11, 2024
PyTorch Foundation Welcomes New Executive Director
The PyTorch Foundation is excited to welcome Matt White, our new executive director. The PyTorch Foundation formed in 2022 with the goal to drive adoption of AI tooling by fostering and sustaining an ecosystem of open source, vendor-neutral projects with PyTorch. Over the past 2 years, we’ve seen excellent growth across the project – with both contributor and member growth.
June 06, 2024
INT4 Decoding GQA CUDA Optimizations for LLM Inference
An efficient decoding Grouped-Query Attention with low-precision KV cache
June 04, 2024
Ready, Set, Contribute: PyTorch Docathon Kickoff H1 2024
The PyTorch Docathon is now live! This event is dedicated to enhancing the quality of the PyTorch documentation with the invaluable assistance of our community. Our hope with this Docathon is to simplify the process for new users to get started with PyTorch, guide them in effectively utilizing its features, and ultimately expedite the transition from research to production in machine learning.
May 21, 2024
Maximizing Training Throughput Using PyTorch FSDP and Torch.compile
Recently, we demonstrated how FSDP and selective activation checkpointing can be used to achieve 57% MFU (Model Flops Utilization) for training a 7B model on A100 GPUs. We also demonstrated how it can train a high quality model, which we open sourced as Granite 7B base model on Hugging Face Hub under the Apache v2.0 license.
May 15, 2024
Achieving Sustainability Goals with PyTorch and Intel AI
This post was contributed by Intel AI in partnership with the PyTorch Foundation.
May 14, 2024
Speeding up ViTs using Block Sparsity
TLDR: We show promising results of up to a 1.46x speedup with <2% drop in accuracy on float32 Vision Transformers on A100 GPUs by applying block sparsity on MLP module’s weights. This approach can potentially be applied to other types of transformers including large language models. Our implementation and benchmarks to reproduce our results are available at https://github.com/pytorch-labs/superblock.