• Docs >
  • torchtext.experimental.vectors
Shortcuts

torchtext.experimental.vectors

Vector

class torchtext.experimental.vectors.Vectors(tokens, vectors, unk_tensor=None)[source]

Creates a vectors object which maps tokens to vectors.

Parameters
  • tokens (List[str]) – a list of tokens.

  • vectors (torch.Tensor) – a 2d tensor representing the vector associated with each token.

  • unk_tensor (torch.Tensor) – a 1d tensors representing the vector associated with an unknown token.

Raises
  • ValueError – if vectors is empty and a default unk_tensor isn’t provided.

  • RuntimeError – if tokens and vectors have different sizes or tokens has duplicates.

  • TypeError – if all tensors within`vectors` are not of data type torch.float.

__getitem__(token: str) → torch.Tensor[source]
Parameters

token (str) – the token used to lookup the corresponding vector.

Returns

a tensor (the vector) corresponding to the associated token.

Return type

vector (Tensor)

__init__(tokens, vectors, unk_tensor=None)[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

__len__() → int[source]

Get length of vectors object.

Returns

the length of the vectors.

Return type

length (int)

__setitem__(token: str, vector: torch.Tensor) → None[source]
Parameters
  • token (str) – the token used to lookup the corresponding vector.

  • vector (Tensor) – a 1d tensor representing a vector associated with the token.

Raises

TypeError – if vector is not of data type torch.float.

lookup_vectors(tokens: List[str]) → torch.Tensor[source]

Look up embedding vectors for a list of tokens. :param tokens: a list of tokens

Returns

returns a 2-D tensor of shape=(len(tokens), vector_dim) or an empty tensor if tokens is empty

Return type

vectors (Tensor)

Examples

>>> examples = ['chip', 'baby', 'Beautiful']
>>> vec = text.vocab.GloVe(name='6B', dim=50)
>>> ret = vec.get_vectors_by_tokens(tokens)

vectors_from_file_object

torchtext.experimental.vectors.vectors_from_file_object(file_like_object, delimiter=', ', unk_tensor=None)[source]

Create a Vectors object from a csv file like object.

Note that the tensor corresponding to each vector is of type torch.float.

Format for csv file:

token1<delimiter>num1 num2 num3 token2<delimiter>num4 num5 num6 … token_n<delimiter>num_m num_j num_k

Parameters
  • file_like_object (FileObject) – a file like object to read data from.

  • delimiter (char) – a character to delimit between the token and the vector. Default value is “,”

  • unk_tensor (Tensor) – a 1d tensor representing the vector associated with an unknown token.

Returns

a Vectors object.

Return type

Vectors

Raises:

ValueError: if duplicate tokens are found in FastText file.

FastText

torchtext.experimental.vectors.FastText(language='en', unk_tensor=None, root='.data', validate_file=True)[source]

Create a FastText Vectors object.

Parameters
  • language (str) – the language to use for FastText. The list of supported languages options can be found at https://fasttext.cc/docs/en/language-identification.html

  • unk_tensor (Tensor) – a 1d tensor representing the vector associated with an unknown token

  • root (str) – folder used to store downloaded files in (.data)

  • validate_file (bool) – flag to determine whether to validate the downloaded files checksum. Should be False when running tests with a local asset.

Returns

a Vectors object.

Return type

Vectors

Raises

ValueError – if duplicate tokens are found in FastText file.

GloVe

torchtext.experimental.vectors.GloVe(name='840B', dim=300, unk_tensor=None, root='.data', validate_file=True)[source]

Create a GloVe Vectors object.

Parameters
  • name (str) – the name of the GloVe dataset to use. Options are: - 42B - 840B - twitter.27B - 6B

  • dim (int) –

    the dimension for the GloVe dataset to load. Options are: 42B:

    • 300

    840B:
    • 300

    twitter.27B:
    • 25

    • 50

    • 100

    • 200

    6B:
    • 50

    • 100

    • 200

    • 300

  • unk_tensor (Tensor) – a 1d tensor representing the vector associated with an unknown token.

  • root (str) – folder used to store downloaded files in (.data)

  • validate_file (bool) – flag to determine whether to validate the downloaded files checksum. Should be False when running tests with a local asset.

Returns

a Vectors object.

Return type

Vectors

Raises

ValueError – if unexpected duplicate tokens are found in GloVe file.

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources