torch.nn.functional

Convolution functions

`conv1d`	Applies a 1D convolution over an input signal composed of several input planes.
`conv2d`	Applies a 2D convolution over an input image composed of several input planes.
`conv3d`	Applies a 3D convolution over an input image composed of several input planes.
`conv_transpose1d`	Applies a 1D transposed convolution operator over an input signal composed of several input planes, sometimes also called "deconvolution".
`conv_transpose2d`	Applies a 2D transposed convolution operator over an input image composed of several input planes, sometimes also called "deconvolution".
`conv_transpose3d`	Applies a 3D transposed convolution operator over an input image composed of several input planes, sometimes also called "deconvolution"
`unfold`	Extract sliding local blocks from a batched input tensor.
`fold`	Combine an array of sliding local blocks into a large containing tensor.

Pooling functions

`avg_pool1d`	Applies a 1D average pooling over an input signal composed of several input planes.
`avg_pool2d`	Applies 2D average-pooling operation in $kH \times kW$ regions by step size $sH \times sW$ steps.
`avg_pool3d`	Applies 3D average-pooling operation in $kT \times kH \times kW$ regions by step size $sT \times sH \times sW$ steps.
`max_pool1d`	Applies a 1D max pooling over an input signal composed of several input planes.
`max_pool2d`	Applies a 2D max pooling over an input signal composed of several input planes.
`max_pool3d`	Applies a 3D max pooling over an input signal composed of several input planes.
`max_unpool1d`	Compute a partial inverse of `MaxPool1d`.
`max_unpool2d`	Compute a partial inverse of `MaxPool2d`.
`max_unpool3d`	Compute a partial inverse of `MaxPool3d`.
`lp_pool1d`	Apply a 1D power-average pooling over an input signal composed of several input planes.
`lp_pool2d`	Apply a 2D power-average pooling over an input signal composed of several input planes.
`lp_pool3d`	Apply a 3D power-average pooling over an input signal composed of several input planes.
`adaptive_max_pool1d`	Applies a 1D adaptive max pooling over an input signal composed of several input planes.
`adaptive_max_pool2d`	Applies a 2D adaptive max pooling over an input signal composed of several input planes.
`adaptive_max_pool3d`	Applies a 3D adaptive max pooling over an input signal composed of several input planes.
`adaptive_avg_pool1d`	Applies a 1D adaptive average pooling over an input signal composed of several input planes.
`adaptive_avg_pool2d`	Apply a 2D adaptive average pooling over an input signal composed of several input planes.
`adaptive_avg_pool3d`	Apply a 3D adaptive average pooling over an input signal composed of several input planes.
`fractional_max_pool2d`	Applies 2D fractional max pooling over an input signal composed of several input planes.
`fractional_max_pool3d`	Applies 3D fractional max pooling over an input signal composed of several input planes.

Attention Mechanisms

The torch.nn.attention.bias module contains attention_biases that are designed to be used with scaled_dot_product_attention.

scaled_dot_product_attention

scaled_dot_product_attention(query, key, value, attn_mask=None, dropout_p=0.0,

Non-linear activation functions

`threshold`	Apply a threshold to each element of the input Tensor.
`threshold_`	In-place version of `threshold()`.
`relu`	Applies the rectified linear unit function element-wise.
`relu_`	In-place version of `relu()`.
`hardtanh`	Applies the HardTanh function element-wise.
`hardtanh_`	In-place version of `hardtanh()`.
`hardswish`	Apply hardswish function, element-wise.
`relu6`	Applies the element-wise function $\text{$ .
`elu`	Apply the Exponential Linear Unit (ELU) function element-wise.
`elu_`	In-place version of `elu()`.
`selu`	Applies element-wise, $\text{SELU}(x) = scale * (\max(0,x) + \min(0, \alpha * (\exp(x) - 1)))$ , with $\alpha=1.6732632423543772848170429916717$ and $scale=1.0507009873554804934193349852946$ .
`celu`	Applies element-wise, $\text{CELU}(x) = \max(0,x) + \min(0, \alpha * (\exp(x/\alpha) - 1))$ .
`leaky_relu`	Applies element-wise, $\text{Leaky$
`leaky_relu_`	In-place version of `leaky_relu()`.
`prelu`	Applies element-wise the function $\text{P$ where weight is a learnable parameter.
`rrelu`	Randomized leaky ReLU.
`rrelu_`	In-place version of `rrelu()`.
`glu`	The gated linear unit.
`gelu`	When the approximate argument is 'none', it applies element-wise the function $\text{GELU}(x) = x * \Phi(x)$
`logsigmoid`	Applies element-wise $\text{LogSigmoid}(x_i) = \log \left(\frac{1}{1 + \exp(-x_i)}\right)$
`hardshrink`	Applies the hard shrinkage function element-wise
`tanhshrink`	Applies element-wise, $\text{Tanhshrink}(x) = x - \text{Tanh}(x)$
`softsign`	Applies element-wise, the function $\text{SoftSign}(x) = \frac{x}{1 + \|x\|}$
`softplus`	Applies element-wise, the function $\text{Softplus}(x) = \frac{1}{\beta} * \log(1 + \exp(\beta * x))$ .
`softmin`	Apply a softmin function.
`softmax`	Apply a softmax function.
`softshrink`	Applies the soft shrinkage function elementwise
`gumbel_softmax`	Sample from the Gumbel-Softmax distribution (Link 1 Link 2) and optionally discretize.
`log_softmax`	Apply a softmax followed by a logarithm.
`tanh`	Applies element-wise, $\text{Tanh}(x) = \tanh(x) = \frac{\exp(x) - \exp(-x)}{\exp(x) + \exp(-x)}$
`sigmoid`	Applies the element-wise function $\text{Sigmoid}(x) = \frac{1}{1 + \exp(-x)}$
`hardsigmoid`	Apply the Hardsigmoid function element-wise.
`silu`	Apply the Sigmoid Linear Unit (SiLU) function, element-wise.
`mish`	Apply the Mish function, element-wise.
`batch_norm`	Apply Batch Normalization for each channel across a batch of data.
`group_norm`	Apply Group Normalization for last certain number of dimensions.
`instance_norm`	Apply Instance Normalization independently for each channel in every data sample within a batch.
`layer_norm`	Apply Layer Normalization for last certain number of dimensions.
`local_response_norm`	Apply local response normalization over an input signal.
`rms_norm`	Apply Root Mean Square Layer Normalization.
`normalize`	Perform $L_p$ normalization of inputs over specified dimension.

Linear functions

linear

Applies a linear transformation to the incoming data: $y = xA^T + b$ .

bilinear

Applies a bilinear transformation to the incoming data: $y = x_1^T A x_2 + b$

Dropout functions

`dropout`	During training, randomly zeroes some elements of the input tensor with probability `p`.
`alpha_dropout`	Apply alpha dropout to the input.
`feature_alpha_dropout`	Randomly masks out entire channels (a channel is a feature map).
`dropout1d`	Randomly zero out entire channels (a channel is a 1D feature map).
`dropout2d`	Randomly zero out entire channels (a channel is a 2D feature map).
`dropout3d`	Randomly zero out entire channels (a channel is a 3D feature map).

Sparse functions

embedding

Generate a simple lookup table that looks up embeddings in a fixed dictionary and size.

embedding_bag

Compute sums, means or maxes of bags of embeddings.

one_hot

Takes LongTensor with index values of shape (*) and returns a tensor of shape (*, num_classes) that have zeros everywhere except where the index of last dimension matches the corresponding value of the input tensor, in which case it will be 1.

Distance functions

pairwise_distance

See torch.nn.PairwiseDistance for details

cosine_similarity

Returns cosine similarity between x1 and x2, computed along dim.

pdist

Computes the p-norm distance between every pair of row vectors in the input.

Loss functions

`binary_cross_entropy`	Measure Binary Cross Entropy between the target and input probabilities.
`binary_cross_entropy_with_logits`	Calculate Binary Cross Entropy between target and input logits.
`poisson_nll_loss`	Poisson negative log likelihood loss.
`cosine_embedding_loss`	See `CosineEmbeddingLoss` for details.
`cross_entropy`	Compute the cross entropy loss between input logits and target.
`ctc_loss`	Apply the Connectionist Temporal Classification loss.
`gaussian_nll_loss`	Gaussian negative log likelihood loss.
`hinge_embedding_loss`	See `HingeEmbeddingLoss` for details.
`kl_div`	Compute the KL Divergence loss.
`l1_loss`	Function that takes the mean element-wise absolute value difference.
`mse_loss`	Measures the element-wise mean squared error, with optional weighting.
`margin_ranking_loss`	See `MarginRankingLoss` for details.
`multilabel_margin_loss`	See `MultiLabelMarginLoss` for details.
`multilabel_soft_margin_loss`	See `MultiLabelSoftMarginLoss` for details.
`multi_margin_loss`	See `MultiMarginLoss` for details.
`nll_loss`	Compute the negative log likelihood loss.
`huber_loss`	Computes the Huber loss, with optional weighting.
`smooth_l1_loss`	Compute the Smooth L1 loss.
`soft_margin_loss`	See `SoftMarginLoss` for details.
`triplet_margin_loss`	Compute the triplet loss between given input tensors and a margin greater than 0.
`triplet_margin_with_distance_loss`	Compute the triplet margin loss for input tensors using a custom distance function.

Vision functions

`pixel_shuffle`	Rearranges elements in a tensor of shape $(, C \times r^2, H, W)$ to a tensor of shape $(, C, H \times r, W \times r)$ , where r is the `upscale_factor`.
`pixel_unshuffle`	Reverses the `PixelShuffle` operation by rearranging elements in a tensor of shape $(, C, H \times r, W \times r)$ to a tensor of shape $(, C \times r^2, H, W)$ , where r is the `downscale_factor`.
`pad`	Pads tensor.
`interpolate`	Down/up samples the input.
`upsample`	Upsample input.
`upsample_nearest`	Upsamples the input, using nearest neighbours' pixel values.
`upsample_bilinear`	Upsamples the input, using bilinear upsampling.
`grid_sample`	Compute grid sample.
`affine_grid`	Generate 2D or 3D flow field (sampling grid), given a batch of affine matrices `theta`.

DataParallel functions (multi-GPU, distributed)

data_parallel

torch.nn.parallel.data_parallel

Evaluate module(input) in parallel across the GPUs given in device_ids.