Tokenizer
- class torchrl.envs.transforms.Tokenizer(in_keys: Sequence[NestedKey], out_keys: Sequence[NestedKey], in_keys_inv: Sequence[NestedKey] | None = None, out_keys_inv: Sequence[NestedKey] | None = None, *, tokenizer: transformers.PretrainedTokenizerBase = None, use_raw_nontensor: bool = False, additional_tokens: List[str] | None = None, skip_special_tokens: bool = True, add_special_tokens: bool = False, padding: bool = True, max_length: int | None = None)[source]
Applies a tokenization operation on the specified inputs.
- Parameters:
in_keys (sequence of NestedKey) – the keys of inputs to the tokenization operation.
out_keys (sequence of NestedKey) – the keys of the outputs of the tokenization operation.
in_keys_inv (sequence of NestedKey, optional) – the keys of inputs to the tokenization operation during inverse call.
out_keys_inv (sequence of NestedKey, optional) – the keys of the outputs of the tokenization operation during inverse call.
- Keyword Arguments:
tokenizer (transformers.PretrainedTokenizerBase or str, optional) – the tokenizer to use. If
None
, “bert-base-uncased” will be used by default. If a string is provided, it should be the name of a pre-trained tokenizer.use_raw_nontensor (bool, optional) – if
False
, data is extracted fromNonTensorData
/NonTensorStack
inputs before the tokenization function is called on them. IfTrue
, the rawNonTensorData
/NonTensorStack
inputs are given directly to the tokenization function, which must support those inputs. Default isFalse
.additional_tokens (List[str], optional) – list of additional tokens to add to the tokenizer’s vocabulary.
Note
This transform can be used both to transform output strings into tokens and to transform back tokenized actions or states into strings. If the environment has a string state-spec, the transformed version will have a tokenized state-spec. If it is a string action spec, it will result in a tokenized action spec.
- transform_input_spec(input_spec: Composite) Composite [source]
Transforms the input spec such that the resulting spec matches transform mapping.
- Parameters:
input_spec (TensorSpec) – spec before the transform
- Returns:
expected spec after the transform
- transform_output_spec(output_spec: Composite) Composite [source]
Transforms the output spec such that the resulting spec matches transform mapping.
This method should generally be left untouched. Changes should be implemented using
transform_observation_spec()
,transform_reward_spec()
andtransform_full_done_spec()
. :param output_spec: spec before the transform :type output_spec: TensorSpec- Returns:
expected spec after the transform