OpenAIToMessages
- class torchtune.data.OpenAIToMessages(train_on_input: Optional[bool] = None, column_map: Optional[Dict[str, str]] = None, new_system_prompt: Optional[str] = None, masking_strategy: Optional[str] = 'train_on_assistant')[source]
Convert a single chat sample adhering to the OpenAI chat completion JSON structure to torchtune’s
Message
structure. This supports both text and image messages.A single sample typically consists of a single optional system prompt and one or multiple turns of user and assistant messages.
For example:
{ "messages": [ { "role": <system|user|assistant>, "content": [ { "type": "text", "text": "What'''s in this image?", }, { "type": "image_url", "image_url": { "url": <url>, }, }, }, ... ] }
Message
follows:[ { "role": <system|user|assistant>, "content": [ { "type": "text", "content": "What'''s in this image?", }, { "type": "image", "content": <PIL.Image.Image>, }, ], }, ... ]
- Parameters:
train_on_input (Optional[bool]) – whether the model is trained on the user prompt or not. Deprecated parameter and will be removed in a future release. Default is None.
column_map (Optional[Dict[str, str]]) – a mapping from the expected columns (“messages”) to the new column names in the dataset. Key should be “messages” and value should be the new column name. If None, keep the default “messages”. Default is None.
new_system_prompt (Optional[str]) – if specified, prepend a system message. This can serve as instructions to guide the model response. Setting this will OVERRIDE any system messages already present in the dataset. Default is None.
masking_strategy (Optional[str]) –
masking strategy to use for model training. Must be one of: train_on_all, train_on_assistant, train_on_last. Default is “train_on_assistant”.
train_on_all
: both user and assistant messages are unmaskedtrain_on_assistant
: user messages are masked, only assistant messages are unmaskedtrain_on_last
: only the last assistant message is unmasked
Note: Multimodal user messages are always masked.
- Raises:
ValueError – If
column_map
is provided andmessages
not incolumn_map
.