AlpacaToMessages

class torchtune.data.AlpacaToMessages(train_on_input: Optional[bool] = None, column_map: Optional[Dict[str, str]] = None, masking_strategy: Optional[str] = 'train_on_all')[source]

Message transform class for Alpaca-style datasets with “instruction”, “input”, and “output” (or equivalent fields specified in column_map) columns. User messages are formed from the instruction + input columns and assistant messages are formed from the output column. Prompt templating is conditional on the presence of the “input” column, and thus is handled directly in this transform class instead of a dedicated PromptTemplate class due to this custom logic.

Parameters:

train_on_input (Optional[bool]) – whether the model is trained on the user prompt or not. Deprecated parameter and will be removed in a future release. Default is None.
column_map (Optional[Dict[str, str]]) – a mapping to change the expected “instruction”, “input”, and “output” column names to the actual column names in the dataset. Default is None, keeping the default column names.
masking_strategy (Optional[str]) –
masking strategy to use for model training. Must be one of: train_on_all, train_on_assistant, train_on_last. Default is “train_on_all”.
- train_on_all: both user and assistant messages are unmasked
- train_on_assistant: user messages are masked, only assistant messages are unmasked
- train_on_last: only the last assistant message is unmasked

Raises:

ValueError – If column_map is provided and instruction not in column_map, or output not in column_map

AlpacaToMessages

Docs

Tutorials

Resources