get_unmasked_sequence_lengths¶
- torchtune.training.get_unmasked_sequence_lengths(mask: Tensor) Tensor [source]¶
Returns the sequence lengths for each batch element, excluding masked tokens.
- Parameters:
mask (torch.Tensor) – Boolean mask with shape [b x s], where True indicates a value to be masked out This is usually a mask for padding tokens, where True indicates a padding token.
- Returns:
Sequence indices logits with shape [b]
- Return type:
Tensor
- Shape notation:
b = batch size
s = sequence length
Example
>>> input_ids = torch.tensor([ ... [2, 4, 0, 0], ... [2, 4, 6, 0], ... [2, 4, 6, 9] ... ]) >>> mask = input_ids == 0 >>> mask tensor([[False, False, True, True], [False, False, False, True], [False, False, False, False]]) >>> get_unmasked_sequence_lengths(mask) tensor([1, 2, 3])