Shortcuts

get_unmasked_sequence_lengths

torchtune.training.get_unmasked_sequence_lengths(mask: Tensor) Tensor[source]

Returns the sequence lengths for each batch element, excluding masked tokens.

Parameters:

mask (torch.Tensor) – Boolean mask with shape [b x s], where True indicates a value to be masked out This is usually a mask for padding tokens, where True indicates a padding token.

Returns:

Sequence indices logits with shape [b]

Return type:

Tensor

Shape notation:
  • b = batch size

  • s = sequence length

Example

>>> input_ids = torch.tensor([
...        [2, 4, 0, 0],
...        [2, 4, 6, 0],
...        [2, 4, 6, 9]
...    ])
>>> mask = input_ids == 0
>>> mask
tensor([[False, False,  True,  True],
        [False, False, False,  True],
        [False, False, False, False]])
>>> get_unmasked_sequence_lengths(mask)
tensor([1, 2, 3])

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources