Shortcuts

padded_collate

torchtune.utils.padded_collate(batch: List[Tuple[List[int], List[int]]], padding_idx: int = 0, ignore_idx: int = - 100) Tuple[Tensor, Tensor][source]

Pad a batch of sequences to the longest sequence length in the batch, and convert integer lists to tensors.

Parameters:
  • batch (List[TokenPair]) – A list of tuples containing input, label pairs.

  • padding_idx (int) – Padding index for input ids. Defaults to 0.

  • ignore_idx (int) – Padding index for labels. Defaults to -100.

Returns:

Collated input and label tensors.

Example

>>> token_pairs = [
>>>    ([1, 2, 3], [4, 5, 6]),
>>>    ([7,], [10,],),
>>> ]
>>> inputs, labels = padded_collate(
>>>    batch=token_pairs,
>>>    padding_idx=padding_idx,
>>>    ignore_idx=ignore_idx,
>>> )
>>> inputs
>>> tensor([[1, 2, 3], [7, 0, 0]])
>>> labels
>>> tensor([[4,5,6], [10,-100,-100]])

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources