Note
This tutorial describes a prototype feature. Prototype features are typically not available as part of binary distributions like PyPI or Conda, except sometimes behind run-time flags, and are at an early stage for feedback and testing.
How to use TorchInductor on Windows CPU
Created On: Oct 01, 2024 | Last Updated: Oct 22, 2024 | Last Verified: Oct 01, 2024
Author: Zhaoqiong Zheng, Xu, Han
TorchInductor is a compiler backend that transforms FX Graphs generated by TorchDynamo into highly optimized C++/Triton kernels. This tutorial will guide you through the process of using TorchInductor on a Windows CPU.
How to compile and execute a Python function with PyTorch, optimized for Windows CPU
Basics of TorchInductor’s optimization using C++/Triton kernels.
PyTorch v2.5 or later
Microsoft Visual C++ (MSVC)
Miniforge for Windows
Install the Required Software
First, let’s install the required software. C++ compiler is required for TorchInductor optimization. We will use Microsoft Visual C++ (MSVC) for this example.
Download and install MSVC.
During the installation, choose Desktop Development with C++ in the Desktop & Mobile section in Workloads table. Then install the software
Note
We recommend C++ compiler Clang and Intel Compiler. Please check Alternative Compiler for better performance.
Download and install Miniforge3-Windows-x86_64.exe.
Set Up the Environment
Open the command line environment via
cmd.exe
.Activate
MSVC
with the following command:"C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Auxiliary/Build/vcvars64.bat"
Activate
conda
with the following command:"C:/ProgramData/miniforge3/Scripts/activate.bat"
Create and activate a custom conda environment:
conda create -n inductor_cpu_windows python=3.10 -y conda activate inductor_cpu_windows
Install PyTorch 2.5 or later.
Using TorchInductor on Windows CPU
Here’s a simple example to demonstrate how to use TorchInductor:
import torch
def foo(x, y):
a = torch.sin(x)
b = torch.cos(y)
return a + b
opt_foo1 = torch.compile(foo)
print(opt_foo1(torch.randn(10, 10), torch.randn(10, 10)))
Here is the sample output that this code might return:
tensor([[-3.9074e-02, 1.3994e+00, 1.3894e+00, 3.2630e-01, 8.3060e-01,
1.1833e+00, 1.4016e+00, 7.1905e-01, 9.0637e-01, -1.3648e+00],
[ 1.3728e+00, 7.2863e-01, 8.6888e-01, -6.5442e-01, 5.6790e-01,
5.2025e-01, -1.2647e+00, 1.2684e+00, -1.2483e+00, -7.2845e-01],
[-6.7747e-01, 1.2028e+00, 1.1431e+00, 2.7196e-02, 5.5304e-01,
6.1945e-01, 4.6654e-01, -3.7376e-01, 9.3644e-01, 1.3600e+00],
[-1.0157e-01, 7.7200e-02, 1.0146e+00, 8.8175e-02, -1.4057e+00,
8.8119e-01, 6.2853e-01, 3.2773e-01, 8.5082e-01, 8.4615e-01],
[ 1.4140e+00, 1.2130e+00, -2.0762e-01, 3.3914e-01, 4.1122e-01,
8.6895e-01, 5.8852e-01, 9.3310e-01, 1.4101e+00, 9.8318e-01],
[ 1.2355e+00, 7.9290e-02, 1.3707e+00, 1.3754e+00, 1.3768e+00,
9.8970e-01, 1.1171e+00, -5.9944e-01, 1.2553e+00, 1.3394e+00],
[-1.3428e+00, 1.8400e-01, 1.1756e+00, -3.0654e-01, 9.7973e-01,
1.4019e+00, 1.1886e+00, -1.9194e-01, 1.3632e+00, 1.1811e+00],
[-7.1615e-01, 4.6622e-01, 1.2089e+00, 9.2011e-01, 1.0659e+00,
9.0892e-01, 1.1932e+00, 1.3888e+00, 1.3898e+00, 1.3218e+00],
[ 1.4139e+00, -1.4000e-01, 9.1192e-01, 3.0175e-01, -9.6432e-01,
-1.0498e+00, 1.4115e+00, -9.3212e-01, -9.0964e-01, 1.0127e+00],
[ 5.7244e-04, 1.2799e+00, 1.3595e+00, 1.0907e+00, 3.7191e-01,
1.4062e+00, 1.3672e+00, 6.8502e-02, 8.5216e-01, 8.6046e-01]])
Using an Alternative Compiler for Better Performance
To enhance performance on Windows inductor, you can use the Intel Compiler or LLVM Compiler. However, they rely on the runtime libraries from Microsoft Visual C++ (MSVC). Therefore, your first step should be to install MSVC.
Intel Compiler
Download and install Intel Compiler with Windows version.
Set Windows Inductor Compiler with the CXX environment variable
set CXX=icx-cl
.
Intel also provides a comprehensive step-by-step guide, complete with performance data. Please check Intel® oneAPI DPC++/C++ Compiler Boosts PyTorch* Inductor Performance on Windows* for CPU Devices.
LLVM Compiler
Download and install LLVM Compiler and choose win64 version.
Set Windows Inductor Compiler with the CXX environment variable
set CXX=clang-cl
.