torchao.sparsity
Convert the weight of linear modules in the model with apply_tensor_subclass. |
|
Convert the weight of linear moduels to semi-structured (2:4) sparsity |
|
Applies int8 dnynamic symmetric per-token activation and int8 per-channel weight quantization + 2:4 sparsity to linear layers. |
|
This function simulates 2:4 sparsity on all linear layers in a model. |
|
Wanda sparsifier |
|
A custom observer that computes the L2 norm of each channel and stores it in a buffer. |