October 04, 2023
High performance Llama 2 deployments with AWS Inferentia2 using TorchServe
Recently, Llama 2 was released and has attracted a lot of interest from the machine learning community. Amazon EC2 Inf2 instances, powered by AWS Inferentia2, now support training and inference of Llama 2 models. In this post, we show low-latency and cost-effective inference of Llama-2 models on Amazon EC2 Inf2 instances using the latest AWS Neuron SDK release. We first introduce how to create, compile and deploy the Llama-2 model and explain the optimization techniques introduced by AWS Neu...
October 03, 2023
How to Build an Interactive Chat-Generation Model using DialoGPT and PyTorch
The focus on interactive chat-generation (or conversational response-generation) models has greatly increased in the past several months. Conversational response-generation models such as ChatGPT and Google Bard have taken the AI world by storm. The purpose of interactive chat generation is to answer various questions posed by humans, and these AI based models use natural language processing (NLP) to generate conversations almost indistinguishable from those generated by humans.
October 02, 2023
Announcing PyTorch Docathon H2 2023
We are excited to announce that we will be holding a Docathon for PyTorch on November 1, 2023! This event is an opportunity for our community to come together and improve the quality of our documentation.
September 25, 2023
Inside the Matrix: Visualizing Matrix Multiplication, Attention and Beyond
Use 3D to visualize matrix multiplication expressions, attention heads with real weights, and more.
September 13, 2023
Accelerated CPU Inference with PyTorch Inductor using torch.compile
Story at a Glance
September 12, 2023
One Year of PyTorch Foundation
It’s been one year since we announced the formation of the PyTorch Foundation! 🎉