torch.nn.functional¶
Convolution functions¶
Applies a 1D convolution over an input signal composed of several input planes. 

Applies a 2D convolution over an input image composed of several input planes. 

Applies a 3D convolution over an input image composed of several input planes. 

Applies a 1D transposed convolution operator over an input signal composed of several input planes, sometimes also called "deconvolution". 

Applies a 2D transposed convolution operator over an input image composed of several input planes, sometimes also called "deconvolution". 

Applies a 3D transposed convolution operator over an input image composed of several input planes, sometimes also called "deconvolution" 

Extract sliding local blocks from a batched input tensor. 

Combine an array of sliding local blocks into a large containing tensor. 
Pooling functions¶
Applies a 1D average pooling over an input signal composed of several input planes. 

Applies 2D averagepooling operation in $kH \times kW$ regions by step size $sH \times sW$ steps. 

Applies 3D averagepooling operation in $kT \times kH \times kW$ regions by step size $sT \times sH \times sW$ steps. 

Applies a 1D max pooling over an input signal composed of several input planes. 

Applies a 2D max pooling over an input signal composed of several input planes. 

Applies a 3D max pooling over an input signal composed of several input planes. 

Compute a partial inverse of 

Compute a partial inverse of 

Compute a partial inverse of 

Apply a 1D poweraverage pooling over an input signal composed of several input planes. 

Apply a 2D poweraverage pooling over an input signal composed of several input planes. 

Apply a 3D poweraverage pooling over an input signal composed of several input planes. 

Applies a 1D adaptive max pooling over an input signal composed of several input planes. 

Applies a 2D adaptive max pooling over an input signal composed of several input planes. 

Applies a 3D adaptive max pooling over an input signal composed of several input planes. 

Applies a 1D adaptive average pooling over an input signal composed of several input planes. 

Apply a 2D adaptive average pooling over an input signal composed of several input planes. 

Apply a 3D adaptive average pooling over an input signal composed of several input planes. 

Applies 2D fractional max pooling over an input signal composed of several input planes. 

Applies 3D fractional max pooling over an input signal composed of several input planes. 
Attention Mechanisms¶
The torch.nn.attention.bias
module contains attention_biases that are designed to be used with
scaled_dot_product_attention.
Computes scaled dot product attention on query, key and value tensors, using an optional attention mask if passed, and applying dropout if a probability greater than 0.0 is specified. 
Nonlinear activation functions¶
Apply a threshold to each element of the input Tensor. 

Inplace version of 

Applies the rectified linear unit function elementwise. 

Inplace version of 

Applies the HardTanh function elementwise. 

Inplace version of 

Apply hardswish function, elementwise. 

Applies the elementwise function $\text{ReLU6}(x) = \min(\max(0,x), 6)$. 

Apply the Exponential Linear Unit (ELU) function elementwise. 

Inplace version of 

Applies elementwise, $\text{SELU}(x) = scale * (\max(0,x) + \min(0, \alpha * (\exp(x)  1)))$, with $\alpha=1.6732632423543772848170429916717$ and $scale=1.0507009873554804934193349852946$. 

Applies elementwise, $\text{CELU}(x) = \max(0,x) + \min(0, \alpha * (\exp(x/\alpha)  1))$. 

Applies elementwise, $\text{LeakyReLU}(x) = \max(0, x) + \text{negative\_slope} * \min(0, x)$ 

Inplace version of 

Applies elementwise the function $\text{PReLU}(x) = \max(0,x) + \text{weight} * \min(0,x)$ where weight is a learnable parameter. 

Randomized leaky ReLU. 

Inplace version of 

The gated linear unit. 

When the approximate argument is 'none', it applies elementwise the function $\text{GELU}(x) = x * \Phi(x)$ 

Applies elementwise $\text{LogSigmoid}(x_i) = \log \left(\frac{1}{1 + \exp(x_i)}\right)$ 

Applies the hard shrinkage function elementwise 

Applies elementwise, $\text{Tanhshrink}(x) = x  \text{Tanh}(x)$ 

Applies elementwise, the function $\text{SoftSign}(x) = \frac{x}{1 + x}$ 

Applies elementwise, the function $\text{Softplus}(x) = \frac{1}{\beta} * \log(1 + \exp(\beta * x))$. 

Apply a softmin function. 

Apply a softmax function. 

Applies the soft shrinkage function elementwise 

Sample from the GumbelSoftmax distribution (Link 1 Link 2) and optionally discretize. 

Apply a softmax followed by a logarithm. 

Applies elementwise, $\text{Tanh}(x) = \tanh(x) = \frac{\exp(x)  \exp(x)}{\exp(x) + \exp(x)}$ 

Applies the elementwise function $\text{Sigmoid}(x) = \frac{1}{1 + \exp(x)}$ 

Apply the Hardsigmoid function elementwise. 

Apply the Sigmoid Linear Unit (SiLU) function, elementwise. 

Apply the Mish function, elementwise. 

Apply Batch Normalization for each channel across a batch of data. 

Apply Group Normalization for last certain number of dimensions. 

Apply Instance Normalization independently for each channel in every data sample within a batch. 

Apply Layer Normalization for last certain number of dimensions. 

Apply local response normalization over an input signal. 

Apply Root Mean Square Layer Normalization. 

Perform $L_p$ normalization of inputs over specified dimension. 
Linear functions¶
Applies a linear transformation to the incoming data: $y = xA^T + b$. 

Applies a bilinear transformation to the incoming data: $y = x_1^T A x_2 + b$ 
Dropout functions¶
During training, randomly zeroes some elements of the input tensor with probability 

Apply alpha dropout to the input. 

Randomly masks out entire channels (a channel is a feature map). 

Randomly zero out entire channels (a channel is a 1D feature map). 

Randomly zero out entire channels (a channel is a 2D feature map). 

Randomly zero out entire channels (a channel is a 3D feature map). 
Sparse functions¶
Generate a simple lookup table that looks up embeddings in a fixed dictionary and size. 

Compute sums, means or maxes of bags of embeddings. 

Takes LongTensor with index values of shape 
Distance functions¶
See 

Returns cosine similarity between 

Computes the pnorm distance between every pair of row vectors in the input. 
Loss functions¶
Measure Binary Cross Entropy between the target and input probabilities. 

Calculate Binary Cross Entropy between target and input logits. 

Poisson negative log likelihood loss. 

See 

Compute the cross entropy loss between input logits and target. 

Apply the Connectionist Temporal Classification loss. 

Gaussian negative log likelihood loss. 

See 

Compute the KL Divergence loss. 

Function that takes the mean elementwise absolute value difference. 

Measures the elementwise mean squared error. 

See 

See 

See 

See 

Compute the negative log likelihood loss. 

Compute the Huber loss. 

Compute the Smooth L1 loss. 

See 

Compute the triplet loss between given input tensors and a margin greater than 0. 

Compute the triplet margin loss for input tensors using a custom distance function. 
Vision functions¶
Rearranges elements in a tensor of shape $(*, C \times r^2, H, W)$ to a tensor of shape $(*, C, H \times r, W \times r)$, where r is the 

Reverses the 

Pads tensor. 

Down/up samples the input. 

Upsample input. 

Upsamples the input, using nearest neighbours' pixel values. 

Upsamples the input, using bilinear upsampling. 

Compute grid sample. 

Generate 2D or 3D flow field (sampling grid), given a batch of affine matrices 