torch.nn.functional

Convolution functions

`conv1d`	Applies a 1D convolution over an input signal composed of several input planes.
`conv2d`	Applies a 2D convolution over an input image composed of several input planes.
`conv3d`	Applies a 3D convolution over an input image composed of several input planes.
`conv_transpose1d`	Applies a 1D transposed convolution operator over an input signal composed of several input planes, sometimes also called "deconvolution".
`conv_transpose2d`	Applies a 2D transposed convolution operator over an input image composed of several input planes, sometimes also called "deconvolution".
`conv_transpose3d`	Applies a 3D transposed convolution operator over an input image composed of several input planes, sometimes also called "deconvolution"
`unfold`	Extracts sliding local blocks from a batched input tensor.
`fold`	Combines an array of sliding local blocks into a large containing tensor.

Pooling functions

`avg_pool1d`	Applies a 1D average pooling over an input signal composed of several input planes.
`avg_pool2d`	Applies 2D average-pooling operation in $kH \times kW$ regions by step size $sH \times sW$ steps.
`avg_pool3d`	Applies 3D average-pooling operation in $kT \times kH \times kW$ regions by step size $sT \times sH \times sW$ steps.
`max_pool1d`	Applies a 1D max pooling over an input signal composed of several input planes.
`max_pool2d`	Applies a 2D max pooling over an input signal composed of several input planes.
`max_pool3d`	Applies a 3D max pooling over an input signal composed of several input planes.
`max_unpool1d`	Computes a partial inverse of `MaxPool1d`.
`max_unpool2d`	Computes a partial inverse of `MaxPool2d`.
`max_unpool3d`	Computes a partial inverse of `MaxPool3d`.
`lp_pool1d`	Applies a 1D power-average pooling over an input signal composed of several input planes.
`lp_pool2d`	Applies a 2D power-average pooling over an input signal composed of several input planes.
`adaptive_max_pool1d`	Applies a 1D adaptive max pooling over an input signal composed of several input planes.
`adaptive_max_pool2d`	Applies a 2D adaptive max pooling over an input signal composed of several input planes.
`adaptive_max_pool3d`	Applies a 3D adaptive max pooling over an input signal composed of several input planes.
`adaptive_avg_pool1d`	Applies a 1D adaptive average pooling over an input signal composed of several input planes.
`adaptive_avg_pool2d`	Applies a 2D adaptive average pooling over an input signal composed of several input planes.
`adaptive_avg_pool3d`	Applies a 3D adaptive average pooling over an input signal composed of several input planes.
`fractional_max_pool2d`	Applies 2D fractional max pooling over an input signal composed of several input planes.
`fractional_max_pool3d`	Applies 3D fractional max pooling over an input signal composed of several input planes.

Attention Mechanisms

scaled_dot_product_attention

Computes scaled dot product attention on query, key and value tensors, using an optional attention mask if passed, and applying dropout if a probability greater than 0.0 is specified.

Non-linear activation functions

`threshold`	Thresholds each element of the input Tensor.
`threshold_`	In-place version of `threshold()`.
`relu`	Applies the rectified linear unit function element-wise.
`relu_`	In-place version of `relu()`.
`hardtanh`	Applies the HardTanh function element-wise.
`hardtanh_`	In-place version of `hardtanh()`.
`hardswish`	Applies the hardswish function, element-wise, as described in the paper:
`relu6`	Applies the element-wise function $\text{ReLU6}(x) = \min(\max(0,x), 6)$ .
`elu`	Applies the Exponential Linear Unit (ELU) function element-wise.
`elu_`	In-place version of `elu()`.
`selu`	Applies element-wise, $\text{SELU}(x) = scale * (\max(0,x) + \min(0, \alpha * (\exp(x) - 1)))$ , with $\alpha=1.6732632423543772848170429916717$ and $scale=1.0507009873554804934193349852946$ .
`celu`	Applies element-wise, $\text{CELU}(x) = \max(0,x) + \min(0, \alpha * (\exp(x/\alpha) - 1))$ .
`leaky_relu`	Applies element-wise, $\text{LeakyReLU}(x) = \max(0, x) + \text{negative\_slope} * \min(0, x)$
`leaky_relu_`	In-place version of `leaky_relu()`.
`prelu`	Applies element-wise the function $\text{PReLU}(x) = \max(0,x) + \text{weight} * \min(0,x)$ where weight is a learnable parameter.
`rrelu`	Randomized leaky ReLU.
`rrelu_`	In-place version of `rrelu()`.
`glu`	The gated linear unit.
`gelu`	When the approximate argument is 'none', it applies element-wise the function $\text{GELU}(x) = x * \Phi(x)$
`logsigmoid`	Applies element-wise $\text{LogSigmoid}(x_i) = \log \left(\frac{1}{1 + \exp(-x_i)}\right)$
`hardshrink`	Applies the hard shrinkage function element-wise
`tanhshrink`	Applies element-wise, $\text{Tanhshrink}(x) = x - \text{Tanh}(x)$
`softsign`	Applies element-wise, the function $\text{SoftSign}(x) = \frac{x}{1 + \|x\|}$
`softplus`	Applies element-wise, the function $\text{Softplus}(x) = \frac{1}{\beta} * \log(1 + \exp(\beta * x))$ .
`softmin`	Applies a softmin function.
`softmax`	Applies a softmax function.
`softshrink`	Applies the soft shrinkage function elementwise
`gumbel_softmax`	Samples from the Gumbel-Softmax distribution (Link 1 Link 2) and optionally discretizes.
`log_softmax`	Applies a softmax followed by a logarithm.
`tanh`	Applies element-wise, $\text{Tanh}(x) = \tanh(x) = \frac{\exp(x) - \exp(-x)}{\exp(x) + \exp(-x)}$
`sigmoid`	Applies the element-wise function $\text{Sigmoid}(x) = \frac{1}{1 + \exp(-x)}$
`hardsigmoid`	Applies the element-wise function
`silu`	Applies the Sigmoid Linear Unit (SiLU) function, element-wise.
`mish`	Applies the Mish function, element-wise.
`batch_norm`	Applies Batch Normalization for each channel across a batch of data.
`group_norm`	Applies Group Normalization for last certain number of dimensions.
`instance_norm`	Applies Instance Normalization for each channel in each data sample in a batch.
`layer_norm`	Applies Layer Normalization for last certain number of dimensions.
`local_response_norm`	Applies local response normalization over an input signal composed of several input planes, where channels occupy the second dimension.
`normalize`	Performs $L_p$ normalization of inputs over specified dimension.

Linear functions

linear

Applies a linear transformation to the incoming data: $y = xA^T + b$ .

bilinear

Applies a bilinear transformation to the incoming data: $y = x_1^T A x_2 + b$

Dropout functions

`dropout`	During training, randomly zeroes some of the elements of the input tensor with probability `p` using samples from a Bernoulli distribution.
`alpha_dropout`	Applies alpha dropout to the input.
`feature_alpha_dropout`	Randomly masks out entire channels (a channel is a feature map, e.g.
`dropout1d`	Randomly zero out entire channels (a channel is a 1D feature map, e.g., the $j$ -th channel of the $i$ -th sample in the batched input is a 1D tensor $\text{input}[i, j]$ ) of the input tensor).
`dropout2d`	Randomly zero out entire channels (a channel is a 2D feature map, e.g., the $j$ -th channel of the $i$ -th sample in the batched input is a 2D tensor $\text{input}[i, j]$ ) of the input tensor).
`dropout3d`	Randomly zero out entire channels (a channel is a 3D feature map, e.g., the $j$ -th channel of the $i$ -th sample in the batched input is a 3D tensor $\text{input}[i, j]$ ) of the input tensor).

Sparse functions

embedding

A simple lookup table that looks up embeddings in a fixed dictionary and size.

embedding_bag

Computes sums, means or maxes of bags of embeddings, without instantiating the intermediate embeddings.

one_hot

Takes LongTensor with index values of shape (*) and returns a tensor of shape (*, num_classes) that have zeros everywhere except where the index of last dimension matches the corresponding value of the input tensor, in which case it will be 1.

Distance functions

pairwise_distance

See torch.nn.PairwiseDistance for details

cosine_similarity

Returns cosine similarity between x1 and x2, computed along dim.

pdist

Computes the p-norm distance between every pair of row vectors in the input.

Loss functions

`binary_cross_entropy`	Function that measures the Binary Cross Entropy between the target and input probabilities.
`binary_cross_entropy_with_logits`	Function that measures Binary Cross Entropy between target and input logits.
`poisson_nll_loss`	Poisson negative log likelihood loss.
`cosine_embedding_loss`	See `CosineEmbeddingLoss` for details.
`cross_entropy`	This criterion computes the cross entropy loss between input logits and target.
`ctc_loss`	The Connectionist Temporal Classification loss.
`gaussian_nll_loss`	Gaussian negative log likelihood loss.
`hinge_embedding_loss`	See `HingeEmbeddingLoss` for details.
`kl_div`	The Kullback-Leibler divergence Loss
`l1_loss`	Function that takes the mean element-wise absolute value difference.
`mse_loss`	Measures the element-wise mean squared error.
`margin_ranking_loss`	See `MarginRankingLoss` for details.
`multilabel_margin_loss`	See `MultiLabelMarginLoss` for details.
`multilabel_soft_margin_loss`	See `MultiLabelSoftMarginLoss` for details.
`multi_margin_loss`	See `MultiMarginLoss` for details.
`nll_loss`	The negative log likelihood loss.
`huber_loss`	Function that uses a squared term if the absolute element-wise error falls below delta and a delta-scaled L1 term otherwise.
`smooth_l1_loss`	Function that uses a squared term if the absolute element-wise error falls below beta and an L1 term otherwise.
`soft_margin_loss`	See `SoftMarginLoss` for details.
`triplet_margin_loss`	See `TripletMarginLoss` for details
`triplet_margin_with_distance_loss`	See `TripletMarginWithDistanceLoss` for details.

Vision functions

`pixel_shuffle`	Rearranges elements in a tensor of shape $(, C \times r^2, H, W)$ to a tensor of shape $(, C, H \times r, W \times r)$ , where r is the `upscale_factor`.
`pixel_unshuffle`	Reverses the `PixelShuffle` operation by rearranging elements in a tensor of shape $(, C, H \times r, W \times r)$ to a tensor of shape $(, C \times r^2, H, W)$ , where r is the `downscale_factor`.
`pad`	Pads tensor.
`interpolate`	Down/up samples the input to either the given `size` or the given `scale_factor`
`upsample`	Upsamples the input to either the given `size` or the given `scale_factor`
`upsample_nearest`	Upsamples the input, using nearest neighbours' pixel values.
`upsample_bilinear`	Upsamples the input, using bilinear upsampling.
`grid_sample`	Given an `input` and a flow-field `grid`, computes the `output` using `input` values and pixel locations from `grid`.
`affine_grid`	Generates a 2D or 3D flow field (sampling grid), given a batch of affine matrices `theta`.

DataParallel functions (multi-GPU, distributed)

data_parallel

torch.nn.parallel.data_parallel

Evaluates module(input) in parallel across the GPUs given in device_ids.