torch.nn.functional.embedding_bag¶
- torch.nn.functional.embedding_bag(input, weight, offsets=None, max_norm=None, norm_type=2, scale_grad_by_freq=False, mode='mean', sparse=False, per_sample_weights=None, include_last_offset=False, padding_idx=None)[source][source]¶
Compute sums, means or maxes of bags of embeddings.
Calculation is done without instantiating the intermediate embeddings. See
torch.nn.EmbeddingBag
for more details.Note
This operation may produce nondeterministic gradients when given tensors on a CUDA device. See Reproducibility for more information.
- Parameters
input (LongTensor) – Tensor containing bags of indices into the embedding matrix
weight (Tensor) – The embedding matrix with number of rows equal to the maximum possible index + 1, and number of columns equal to the embedding size
offsets (LongTensor, optional) – Only used when
input
is 1D.offsets
determines the starting index position of each bag (sequence) ininput
.max_norm (float, optional) – If given, each embedding vector with norm larger than
max_norm
is renormalized to have normmax_norm
. Note: this will modifyweight
in-place.norm_type (float, optional) – The
p
in thep
-norm to compute for themax_norm
option. Default2
.scale_grad_by_freq (bool, optional) – if given, this will scale gradients by the inverse of frequency of the words in the mini-batch. Default
False
. Note: this option is not supported whenmode="max"
.mode (str, optional) –
"sum"
,"mean"
or"max"
. Specifies the way to reduce the bag. Default:"mean"
sparse (bool, optional) – if
True
, gradient w.r.t.weight
will be a sparse tensor. See Notes undertorch.nn.Embedding
for more details regarding sparse gradients. Note: this option is not supported whenmode="max"
.per_sample_weights (Tensor, optional) – a tensor of float / double weights, or None to indicate all weights should be taken to be 1. If specified,
per_sample_weights
must have exactly the same shape as input and is treated as having the sameoffsets
, if those are not None.include_last_offset (bool, optional) – if
True
, the size of offsets is equal to the number of bags + 1. The last element is the size of the input, or the ending index position of the last bag (sequence).padding_idx (int, optional) – If specified, the entries at
padding_idx
do not contribute to the gradient; therefore, the embedding vector atpadding_idx
is not updated during training, i.e. it remains as a fixed “pad”. Note that the embedding vector atpadding_idx
is excluded from the reduction.
- Return type
- Shape:
input
(LongTensor) andoffsets
(LongTensor, optional)If
input
is 2D of shape (B, N), it will be treated asB
bags (sequences) each of fixed lengthN
, and this will returnB
values aggregated in a way depending on themode
.offsets
is ignored and required to beNone
in this case.If
input
is 1D of shape (N), it will be treated as a concatenation of multiple bags (sequences).offsets
is required to be a 1D tensor containing the starting index positions of each bag ininput
. Therefore, foroffsets
of shape (B),input
will be viewed as havingB
bags. Empty bags (i.e., having 0-length) will have returned vectors filled by zeros.
weight
(Tensor): the learnable weights of the module of shape (num_embeddings, embedding_dim)per_sample_weights
(Tensor, optional). Has the same shape asinput
.output
: aggregated embedding values of shape (B, embedding_dim)
Examples:
>>> # an Embedding module containing 10 tensors of size 3 >>> embedding_matrix = torch.rand(10, 3) >>> # a batch of 2 samples of 4 indices each >>> input = torch.tensor([1, 2, 4, 5, 4, 3, 2, 9]) >>> offsets = torch.tensor([0, 4]) >>> F.embedding_bag(input, embedding_matrix, offsets) tensor([[ 0.3397, 0.3552, 0.5545], [ 0.5893, 0.4386, 0.5882]]) >>> # example with padding_idx >>> embedding_matrix = torch.rand(10, 3) >>> input = torch.tensor([2, 2, 2, 2, 4, 3, 2, 9]) >>> offsets = torch.tensor([0, 4]) >>> F.embedding_bag(input, embedding_matrix, offsets, padding_idx=2, mode='sum') tensor([[ 0.0000, 0.0000, 0.0000], [-0.7082, 3.2145, -2.6251]])