How to use TorchInductor on Windows CPU¶
Author: Zhaoqiong Zheng, Xu, Han
TorchInductor is a compiler backend that transforms FX Graphs generated by TorchDynamo into highly optimized C++/Triton kernels. This tutorial will guide you through the process of using TorchInductor on a Windows CPU.
How to compile and execute a Python function with PyTorch, optimized for Windows CPU
Basics of TorchInductor’s optimization using C++/Triton kernels.
PyTorch v2.5 or later
Microsoft Visual C++ (MSVC)
Miniforge for Windows
Install the Required Software¶
First, let’s install the required software. C++ compiler is required for TorchInductor optimization. We will use Microsoft Visual C++ (MSVC) for this example.
Download and install MSVC.
During the installation, choose Desktop Development with C++ in the Desktop & Mobile section in Workloads table. Then install the software
Note
We recommend C++ compiler Clang and Intel Compiler. Please check Alternative Compiler for better performance.
Download and install Miniforge3-Windows-x86_64.exe.
Set Up the Environment¶
Open the command line environment via
cmd.exe
.Activate
MSVC
with the following command:"C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Auxiliary/Build/vcvars64.bat"
Activate
conda
with the following command:"C:/ProgramData/miniforge3/Scripts/activate.bat"
Create and activate a custom conda environment:
conda create -n inductor_cpu_windows python=3.10 -y conda activate inductor_cpu_windows
Install PyTorch 2.5 or later.
Using TorchInductor on Windows CPU¶
Here’s a simple example to demonstrate how to use TorchInductor:
import torch
def foo(x, y):
a = torch.sin(x)
b = torch.cos(y)
return a + b
opt_foo1 = torch.compile(foo)
print(opt_foo1(torch.randn(10, 10), torch.randn(10, 10)))
Here is the sample output that this code might return:
tensor([[-3.9074e-02, 1.3994e+00, 1.3894e+00, 3.2630e-01, 8.3060e-01,
1.1833e+00, 1.4016e+00, 7.1905e-01, 9.0637e-01, -1.3648e+00],
[ 1.3728e+00, 7.2863e-01, 8.6888e-01, -6.5442e-01, 5.6790e-01,
5.2025e-01, -1.2647e+00, 1.2684e+00, -1.2483e+00, -7.2845e-01],
[-6.7747e-01, 1.2028e+00, 1.1431e+00, 2.7196e-02, 5.5304e-01,
6.1945e-01, 4.6654e-01, -3.7376e-01, 9.3644e-01, 1.3600e+00],
[-1.0157e-01, 7.7200e-02, 1.0146e+00, 8.8175e-02, -1.4057e+00,
8.8119e-01, 6.2853e-01, 3.2773e-01, 8.5082e-01, 8.4615e-01],
[ 1.4140e+00, 1.2130e+00, -2.0762e-01, 3.3914e-01, 4.1122e-01,
8.6895e-01, 5.8852e-01, 9.3310e-01, 1.4101e+00, 9.8318e-01],
[ 1.2355e+00, 7.9290e-02, 1.3707e+00, 1.3754e+00, 1.3768e+00,
9.8970e-01, 1.1171e+00, -5.9944e-01, 1.2553e+00, 1.3394e+00],
[-1.3428e+00, 1.8400e-01, 1.1756e+00, -3.0654e-01, 9.7973e-01,
1.4019e+00, 1.1886e+00, -1.9194e-01, 1.3632e+00, 1.1811e+00],
[-7.1615e-01, 4.6622e-01, 1.2089e+00, 9.2011e-01, 1.0659e+00,
9.0892e-01, 1.1932e+00, 1.3888e+00, 1.3898e+00, 1.3218e+00],
[ 1.4139e+00, -1.4000e-01, 9.1192e-01, 3.0175e-01, -9.6432e-01,
-1.0498e+00, 1.4115e+00, -9.3212e-01, -9.0964e-01, 1.0127e+00],
[ 5.7244e-04, 1.2799e+00, 1.3595e+00, 1.0907e+00, 3.7191e-01,
1.4062e+00, 1.3672e+00, 6.8502e-02, 8.5216e-01, 8.6046e-01]])
Using an Alternative Compiler for Better Performance¶
To enhance performance on Windows inductor, you can use the Intel Compiler or LLVM Compiler. However, they rely on the runtime libraries from Microsoft Visual C++ (MSVC). Therefore, your first step should be to install MSVC.
Intel Compiler¶
Download and install Intel Compiler with Windows version.
Set Windows Inductor Compiler with the CXX environment variable
set CXX=icx-cl
.
Intel also provides a comprehensive step-by-step guide, complete with performance data. Please check Intel® oneAPI DPC++/C++ Compiler Boosts PyTorch* Inductor Performance on Windows* for CPU Devices.
LLVM Compiler¶
Download and install LLVM Compiler and choose win64 version.
Set Windows Inductor Compiler with the CXX environment variable
set CXX=clang-cl
.
Conclusion¶
In this tutorial, we have learned how to use Inductor on Windows CPU with PyTorch. In addition, we discussed further performance improvements with Intel Compiler and LLVM Compiler.