Rate this Page

PerToken#

class torch.ao.quantization.observer.PerToken[source]#

Represents per-token granularity in quantization.

This granularity type calculates a different set of quantization parameters for each token, which is represented as the last dimension of the tensor.

For example, if the input tensor has shape [2, 3, 4], then there are 6 tokens with 4 elements each, and we will calculate 6 sets of quantization parameters, one for each token.

If the input tensor has only two dimensions, e.g. [8, 16], then this is equivalent to PerAxis(axis=0), which yields 8 sets of quantization parameters.