Blog | PyTorch

October 30, 2024

Triton Kernel Compilation Stages

The Triton open-source programming language and compiler offers a high-level, python-based approach to create efficient GPU code. In this blog, we highlight the underlying details of how a triton program is compiled and the intermediate representations. For an introduction to Triton, we refer readers to this blog.

October 28, 2024

Unleashing the Power of AI on Mobile: LLM Inference for Llama 3.2 Quantized Models with ExecuTorch and KleidiAI

Introduction

October 28, 2024

Getting started with PyTorch, ExecuTorch, and Ethos-U85 in three easy steps

ExecuTorch support for Ethos-U85

October 25, 2024

Intel GPU Support Now Available in PyTorch 2.5

Support for Intel GPUs is now available in PyTorch® 2.5, providing improved functionality and performance for Intel GPUs which including Intel® Arc™ discrete graphics, Intel® Core™ Ultra processors with built-in Intel® Arc™ graphics and Intel® Data Center GPU Max Series. This integration brings Intel GPUs and the SYCL* software stack into the official PyTorch stack, ensuring a consistent user experience and enabling more extensive AI application scenarios, particularly in the AI PC domain.

October 24, 2024

ExecuTorch Beta: On-Device AI and LLMs, Stability, and Acceleration with Partners

ExecuTorch has achieved Beta status with the release of v0.4, providing stable APIs and runtime, as well as extensive kernel coverage. ExecuTorch is the recommended on-device inference engine for Llama 3.2 1B/3B models, offering enhanced performance and memory efficiency for both original and quantized models. There has been a significant increase in adoption and ecosystem growth for ExecuTorch, and the focus is now on improving reliability, performance, and coverage for non-CPU backen...

October 23, 2024

TorchRec and FBGEMM 1.0 Stable Release

We are happy to announce the stable release, 1.0, for TorchRec and FBGEMM. TorchRec is the PyTorch native recommendation systems library, powered by FBGEMM’s (Facebook GEneral Matrix Multiplication) efficient, low-level kernels.

October 17, 2024

PyTorch 2.5 Release Blog

We are excited to announce the release of PyTorch® 2.5 (release note)! This release features a new CuDNN backend for SDPA, enabling speedups by default for users of SDPA on H100s or newer GPUs. As well, regional compilation of torch.compile offers a way to reduce the cold start up time for torch.compile by allowing users to compile a repeated nn.Module (e.g. a transformer layer in LLM) without recompilations. Finally, TorchInductor CPP backend offers solid performance speedup with numerous en...

Deploying LLMs with TorchServe + vLLM

Triton Kernel Compilation Stages

Unleashing the Power of AI on Mobile: LLM Inference for Llama 3.2 Quantized Models with ExecuTorch and KleidiAI

Getting started with PyTorch, ExecuTorch, and Ethos-U85 in three easy steps

Intel GPU Support Now Available in PyTorch 2.5

ExecuTorch Beta: On-Device AI and LLMs, Stability, and Acceleration with Partners

TorchRec and FBGEMM 1.0 Stable Release

PyTorch 2.5 Release Blog

Install PyTorch

Quick Start With
Cloud Partners

Docs

Tutorials

Resources

Install PyTorch

Quick Start WithCloud Partners

Docs

Tutorials

Resources

Quick Start With
Cloud Partners