October 17, 2023

Compiling NumPy code into C++ or CUDA via torch.compile

Quansight engineers have implemented support for tracing through NumPy code via torch.compile in PyTorch 2.1. This feature leverages PyTorch’s compiler to generate efficient fused vectorized code without having to modify your original NumPy code. Even more, it also allows for executing NumPy code on CUDA just by running it through torch.compile under torch.device("cuda")!

Read More

October 11, 2023

ML Model Server Resource Saving - Transition From High-Cost GPUs to Intel CPUs and oneAPI powered Software with performance

Reviewers: Yunsang Ju(Naver GplaceAI Leader), Min Jean Cho(Intel), Jing Xu(Intel), Mark Saroufim(Meta)

Read More

October 10, 2023

Real-time Audio-visual Speech Recognition

Audio-Visual Speech Recognition (AV-ASR, or AVSR) is the task of transcribing text from audio and visual streams, which has recently attracted a lot of research attention due to its robustness to noise. The vast majority of work to date has focused on developing AV-ASR models for non-streaming recognition; studies on streaming AV-ASR are very limited.

Read More

October 04, 2023

PyTorch 2.1: automatic dynamic shape compilation, distributed checkpointing

We are excited to announce the release of PyTorch® 2.1 (release note)! PyTorch 2.1 offers automatic dynamic shape support in torch.compile, torch.distributed.checkpoint for saving/loading distributed training jobs on multiple ranks in parallel, and torch.compile support for the NumPy API.

Read More

October 04, 2023

High performance Llama 2 deployments with AWS Inferentia2 using TorchServe

Recently, Llama 2 was released and has attracted a lot of interest from the machine learning community. Amazon EC2 Inf2 instances, powered by AWS Inferentia2, now support training and inference of Llama 2 models. In this post, we show low-latency and cost-effective inference of Llama-2 models on Amazon EC2 Inf2 instances using the latest AWS Neuron SDK release.  We first introduce how to create, compile and deploy the Llama-2 model and explain the optimization techniques introduced by AWS Neu...

Read More