stack_exchanged_paired_dataset¶
- torchtune.datasets.stack_exchanged_paired_dataset(tokenizer: ModelTokenizer, *, source: str = 'lvwerra/stack-exchange-paired', max_seq_len: int = 1024) PreferenceDataset [source]¶
Family of preference datasets similar to StackExchangePaired data.
- Parameters:
tokenizer (ModelTokenizer) – Tokenizer used by the model that implements the
tokenize_messages
method.source (str) – path string of dataset, anything supported by Hugging Face’s load_dataset.
max_seq_len (int) – Maximum number of tokens in the returned input and label token id lists. Default is 1024.
- Returns:
The preference dataset built from source paired data.
- Return type: