torchtext.transforms
Transforms are common text transforms. They can be chained together using torch.nn.Sequential
or using torchtext.transforms.Sequential
to support torch-scriptability.
SentencePieceTokenizer
-
class
torchtext.transforms.
SentencePieceTokenizer
(sp_model_path: str)[source] Transform for Sentence Piece tokenizer from pre-trained sentencepiece model
Additiona details: https://github.com/google/sentencepiece
- Parameters
sp_model_path (str) – Path to pre-trained sentencepiece model
- Example
>>> from torchtext.transforms import SpmTokenizerTransform >>> transform = SentencePieceTokenizer("spm_model") >>> transform(["hello world", "attention is all you need!"])
- Tutorials using
SentencePieceTokenizer
:
GPT2BPETokenizer
CLIPTokenizer
VocabTransform
-
class
torchtext.transforms.
VocabTransform
(vocab: torchtext.vocab.vocab.Vocab)[source] Vocab transform to convert input batch of tokens into corresponding token ids
- Parameters
vocab – an instance of
torchtext.vocab.Vocab
class.
Example
>>> import torch >>> from torchtext.vocab import vocab >>> from torchtext.transforms import VocabTransform >>> from collections import OrderedDict >>> vocab_obj = vocab(OrderedDict([('a', 1), ('b', 1), ('c', 1)])) >>> vocab_transform = VocabTransform(vocab_obj) >>> output = vocab_transform([['a','b'],['a','b','c']]) >>> jit_vocab_transform = torch.jit.script(vocab_transform)
- Tutorials using
VocabTransform
:
ToTensor
-
class
torchtext.transforms.
ToTensor
(padding_value: Optional[int] = None, dtype: torch.dtype = torch.int64)[source] Convert input to torch tensor
- Parameters
padding_value (Optional[int]) – Pad value to make each input in the batch of length equal to the longest sequence in the batch.
dtype (
torch.dtype
) –torch.dtype
of output tensor
-
forward
(input: Any) → torch.Tensor[source]
LabelToIndex
Truncate
AddToken
Sequential
-
class
torchtext.transforms.
Sequential
(*args: torch.nn.modules.module.Module)[source] -
class
torchtext.transforms.
Sequential
(arg: OrderedDict[str, Module]) A container to host a sequence of text transforms.
- Tutorials using
Sequential
:
-
forward
(input: Any) → Any[source] - Parameters
input (Any) – Input sequence or batch. The input type must be supported by the first transform in the sequence.
- Tutorials using