Shortcuts

stack_exchanged_paired_dataset

torchtune.datasets.stack_exchanged_paired_dataset(tokenizer: ModelTokenizer, *, source: str = 'lvwerra/stack-exchange-paired', max_seq_len: int = 1024) PreferenceDataset[source]

Family of preference datasets similar to StackExchangePaired data.

Parameters:
  • tokenizer (ModelTokenizer) – Tokenizer used by the model that implements the tokenize_messages method.

  • source (str) – path string of dataset, anything supported by Hugging Face’s load_dataset.

  • max_seq_len (int) – Maximum number of tokens in the returned input and label token id lists. Default is 1024.

Returns:

The preference dataset built from source paired data.

Return type:

PreferenceDataset

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources