Shortcuts

torchao.dtypes

Layouts and Tensor Subclasses

NF4Tensor

NF4Tensor class for converting a weight to the QLoRA NF4 format

AffineQuantizedTensor

Affine quantized tensor subclass.

Layout

The Layout class serves as a base class for defining different data layouts for tensors.

PlainLayout

PlainLayout is the most basic layout class, inheriting from the Layout base class.

SemiSparseLayout

SemiSparseLayout is a layout class for handling semi-structured sparse matrices in affine quantized tensors.

TensorCoreTiledLayout

TensorCoreTiledLayout is a layout class for handling tensor core tiled layouts in affine quantized tensors.

Float8Layout

Represents the layout configuration for Float8 affine quantized tensors.

MarlinSparseLayout

MarlinSparseLayout is a layout class for handling sparse tensor formats specifically designed for the Marlin sparse kernel.

BlockSparseLayout

BlockSparseLayout is a data class that represents the layout of a block sparse matrix.

UintxLayout

A layout class for Uintx tensors, which are tensors with elements packed into smaller bit-widths than the standard 8-bit byte.

MarlinQQQTensor

MarlinQQQ quantized tensor subclass which inherits AffineQuantizedTensor class.

MarlinQQQLayout

MarlinQQQLayout is a layout class for Marlin QQQ quantization.

Int4CPULayout

Layout class for int4 CPU layout for affine quantized tensor, used by tinygemm kernels _weight_int4pack_mm_for_cpu.

CutlassInt4PackedLayout

Layout class for int4 packed layout for affine quantized tensor, for cutlass kernel.

Quantization techniques

to_affine_quantized_intx

Convert a high precision tensor to an integer affine quantized tensor.

to_affine_quantized_intx_static

Create an integer AffineQuantizedTensor from a high precision tensor using static parameters.

to_affine_quantized_fpx

Create a floatx AffineQuantizedTensor from a high precision tensor.

to_affine_quantized_floatx

Convert a high precision tensor to a float8 quantized tensor.

to_affine_quantized_floatx_static

Create a float8 AffineQuantizedTensor from a high precision tensor using static parameters.

to_marlinqqq_quantized_intx

Converts a floating point tensor to a Marlin QQQ quantized tensor.

to_nf4

Convert a given tensor to normalized float 4-bit tensor.

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources