Shortcuts

How to use TorchInductor on Windows CPU

Created On: Oct 01, 2024 | Last Updated: Oct 22, 2024 | Last Verified: Oct 01, 2024

Author: Zhaoqiong Zheng, Xu, Han

TorchInductor is a compiler backend that transforms FX Graphs generated by TorchDynamo into highly optimized C++/Triton kernels. This tutorial will guide you through the process of using TorchInductor on a Windows CPU.

What you will learn
  • How to compile and execute a Python function with PyTorch, optimized for Windows CPU

  • Basics of TorchInductor’s optimization using C++/Triton kernels.

Prerequisites
  • PyTorch v2.5 or later

  • Microsoft Visual C++ (MSVC)

  • Miniforge for Windows

Install the Required Software

First, let’s install the required software. C++ compiler is required for TorchInductor optimization. We will use Microsoft Visual C++ (MSVC) for this example.

  1. Download and install MSVC.

  2. During the installation, choose Desktop Development with C++ in the Desktop & Mobile section in Workloads table. Then install the software

Note

We recommend C++ compiler Clang and Intel Compiler. Please check Alternative Compiler for better performance.

  1. Download and install Miniforge3-Windows-x86_64.exe.

Set Up the Environment

  1. Open the command line environment via cmd.exe.

  2. Activate MSVC with the following command:

    "C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Auxiliary/Build/vcvars64.bat"
    
  3. Activate conda with the following command:

    "C:/ProgramData/miniforge3/Scripts/activate.bat"
    
  4. Create and activate a custom conda environment:

    conda create -n inductor_cpu_windows python=3.10 -y
    conda activate inductor_cpu_windows
    
  5. Install PyTorch 2.5 or later.

Using TorchInductor on Windows CPU

Here’s a simple example to demonstrate how to use TorchInductor:

import torch
def foo(x, y):
    a = torch.sin(x)
    b = torch.cos(y)
    return a + b
opt_foo1 = torch.compile(foo)
print(opt_foo1(torch.randn(10, 10), torch.randn(10, 10)))

Here is the sample output that this code might return:

tensor([[-3.9074e-02,  1.3994e+00,  1.3894e+00,  3.2630e-01,  8.3060e-01,
        1.1833e+00,  1.4016e+00,  7.1905e-01,  9.0637e-01, -1.3648e+00],
        [ 1.3728e+00,  7.2863e-01,  8.6888e-01, -6.5442e-01,  5.6790e-01,
        5.2025e-01, -1.2647e+00,  1.2684e+00, -1.2483e+00, -7.2845e-01],
        [-6.7747e-01,  1.2028e+00,  1.1431e+00,  2.7196e-02,  5.5304e-01,
        6.1945e-01,  4.6654e-01, -3.7376e-01,  9.3644e-01,  1.3600e+00],
        [-1.0157e-01,  7.7200e-02,  1.0146e+00,  8.8175e-02, -1.4057e+00,
        8.8119e-01,  6.2853e-01,  3.2773e-01,  8.5082e-01,  8.4615e-01],
        [ 1.4140e+00,  1.2130e+00, -2.0762e-01,  3.3914e-01,  4.1122e-01,
        8.6895e-01,  5.8852e-01,  9.3310e-01,  1.4101e+00,  9.8318e-01],
        [ 1.2355e+00,  7.9290e-02,  1.3707e+00,  1.3754e+00,  1.3768e+00,
        9.8970e-01,  1.1171e+00, -5.9944e-01,  1.2553e+00,  1.3394e+00],
        [-1.3428e+00,  1.8400e-01,  1.1756e+00, -3.0654e-01,  9.7973e-01,
        1.4019e+00,  1.1886e+00, -1.9194e-01,  1.3632e+00,  1.1811e+00],
        [-7.1615e-01,  4.6622e-01,  1.2089e+00,  9.2011e-01,  1.0659e+00,
        9.0892e-01,  1.1932e+00,  1.3888e+00,  1.3898e+00,  1.3218e+00],
        [ 1.4139e+00, -1.4000e-01,  9.1192e-01,  3.0175e-01, -9.6432e-01,
        -1.0498e+00,  1.4115e+00, -9.3212e-01, -9.0964e-01,  1.0127e+00],
        [ 5.7244e-04,  1.2799e+00,  1.3595e+00,  1.0907e+00,  3.7191e-01,
        1.4062e+00,  1.3672e+00,  6.8502e-02,  8.5216e-01,  8.6046e-01]])

Using an Alternative Compiler for Better Performance

To enhance performance on Windows inductor, you can use the Intel Compiler or LLVM Compiler. However, they rely on the runtime libraries from Microsoft Visual C++ (MSVC). Therefore, your first step should be to install MSVC.

Intel Compiler

  1. Download and install Intel Compiler with Windows version.

  2. Set Windows Inductor Compiler with the CXX environment variable set CXX=icx-cl.

Intel also provides a comprehensive step-by-step guide, complete with performance data. Please check Intel® oneAPI DPC++/C++ Compiler Boosts PyTorch* Inductor Performance on Windows* for CPU Devices.

LLVM Compiler

  1. Download and install LLVM Compiler and choose win64 version.

  2. Set Windows Inductor Compiler with the CXX environment variable set CXX=clang-cl.

Conclusion

In this tutorial, we have learned how to use Inductor on Windows CPU with PyTorch. In addition, we discussed further performance improvements with Intel Compiler and LLVM Compiler.

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources