AlpacaToMessages
- class torchtune.data.AlpacaToMessages(train_on_input: Optional[bool] = None, column_map: Optional[Dict[str, str]] = None, masking_strategy: Optional[str] = 'train_on_all')[source]
Message transform class for Alpaca-style datasets with “instruction”, “input”, and “output” (or equivalent fields specified in column_map) columns. User messages are formed from the instruction + input columns and assistant messages are formed from the output column. Prompt templating is conditional on the presence of the “input” column, and thus is handled directly in this transform class instead of a dedicated
PromptTemplate
class due to this custom logic.- Parameters:
train_on_input (Optional[bool]) – whether the model is trained on the user prompt or not. Deprecated parameter and will be removed in a future release. Default is None.
column_map (Optional[Dict[str, str]]) – a mapping to change the expected “instruction”, “input”, and “output” column names to the actual column names in the dataset. Default is None, keeping the default column names.
masking_strategy (Optional[str]) –
masking strategy to use for model training. Must be one of: train_on_all, train_on_assistant, train_on_last. Default is “train_on_all”.
train_on_all
: both user and assistant messages are unmaskedtrain_on_assistant
: user messages are masked, only assistant messages are unmaskedtrain_on_last
: only the last assistant message is unmasked
- Raises:
ValueError – If
column_map
is provided andinstruction
not incolumn_map
, oroutput
not incolumn_map