torch.nn.functional¶
Convolution functions¶
Applies a 1D convolution over an input signal composed of several input planes. 

Applies a 2D convolution over an input image composed of several input planes. 

Applies a 3D convolution over an input image composed of several input planes. 

Applies a 1D transposed convolution operator over an input signal composed of several input planes, sometimes also called "deconvolution". 

Applies a 2D transposed convolution operator over an input image composed of several input planes, sometimes also called "deconvolution". 

Applies a 3D transposed convolution operator over an input image composed of several input planes, sometimes also called "deconvolution" 

Extracts sliding local blocks from a batched input tensor. 

Combines an array of sliding local blocks into a large containing tensor. 
Pooling functions¶
Applies a 1D average pooling over an input signal composed of several input planes. 

Applies 2D averagepooling operation in $kH \times kW$ regions by step size $sH \times sW$ steps. 

Applies 3D averagepooling operation in $kT \times kH \times kW$ regions by step size $sT \times sH \times sW$ steps. 

Applies a 1D max pooling over an input signal composed of several input planes. 

Applies a 2D max pooling over an input signal composed of several input planes. 

Applies a 3D max pooling over an input signal composed of several input planes. 

Computes a partial inverse of 

Computes a partial inverse of 

Computes a partial inverse of 

Applies a 1D poweraverage pooling over an input signal composed of several input planes. 

Applies a 2D poweraverage pooling over an input signal composed of several input planes. 

Applies a 1D adaptive max pooling over an input signal composed of several input planes. 

Applies a 2D adaptive max pooling over an input signal composed of several input planes. 

Applies a 3D adaptive max pooling over an input signal composed of several input planes. 

Applies a 1D adaptive average pooling over an input signal composed of several input planes. 

Applies a 2D adaptive average pooling over an input signal composed of several input planes. 

Applies a 3D adaptive average pooling over an input signal composed of several input planes. 

Applies 2D fractional max pooling over an input signal composed of several input planes. 

Applies 3D fractional max pooling over an input signal composed of several input planes. 
Attention Mechanisms¶
Computes scaled dot product attention on query, key and value tensors, using an optional attention mask if passed, and applying dropout if a probability greater than 0.0 is specified. 
Nonlinear activation functions¶
Thresholds each element of the input Tensor. 

Inplace version of 

Applies the rectified linear unit function elementwise. 

Inplace version of 

Applies the HardTanh function elementwise. 

Inplace version of 

Applies the hardswish function, elementwise, as described in the paper: 

Applies the elementwise function $\text{ReLU6}(x) = \min(\max(0,x), 6)$. 

Applies the Exponential Linear Unit (ELU) function elementwise. 

Inplace version of 

Applies elementwise, $\text{SELU}(x) = scale * (\max(0,x) + \min(0, \alpha * (\exp(x)  1)))$, with $\alpha=1.6732632423543772848170429916717$ and $scale=1.0507009873554804934193349852946$. 

Applies elementwise, $\text{CELU}(x) = \max(0,x) + \min(0, \alpha * (\exp(x/\alpha)  1))$. 

Applies elementwise, $\text{LeakyReLU}(x) = \max(0, x) + \text{negative\_slope} * \min(0, x)$ 

Inplace version of 

Applies elementwise the function $\text{PReLU}(x) = \max(0,x) + \text{weight} * \min(0,x)$ where weight is a learnable parameter. 

Randomized leaky ReLU. 

Inplace version of 

The gated linear unit. 

When the approximate argument is 'none', it applies elementwise the function $\text{GELU}(x) = x * \Phi(x)$ 

Applies elementwise $\text{LogSigmoid}(x_i) = \log \left(\frac{1}{1 + \exp(x_i)}\right)$ 

Applies the hard shrinkage function elementwise 

Applies elementwise, $\text{Tanhshrink}(x) = x  \text{Tanh}(x)$ 

Applies elementwise, the function $\text{SoftSign}(x) = \frac{x}{1 + x}$ 

Applies elementwise, the function $\text{Softplus}(x) = \frac{1}{\beta} * \log(1 + \exp(\beta * x))$. 

Applies a softmin function. 

Applies a softmax function. 

Applies the soft shrinkage function elementwise 

Samples from the GumbelSoftmax distribution (Link 1 Link 2) and optionally discretizes. 

Applies a softmax followed by a logarithm. 

Applies elementwise, $\text{Tanh}(x) = \tanh(x) = \frac{\exp(x)  \exp(x)}{\exp(x) + \exp(x)}$ 

Applies the elementwise function $\text{Sigmoid}(x) = \frac{1}{1 + \exp(x)}$ 

Applies the elementwise function 

Applies the Sigmoid Linear Unit (SiLU) function, elementwise. 

Applies the Mish function, elementwise. 

Applies Batch Normalization for each channel across a batch of data. 

Applies Group Normalization for last certain number of dimensions. 

Applies Instance Normalization for each channel in each data sample in a batch. 

Applies Layer Normalization for last certain number of dimensions. 

Applies local response normalization over an input signal composed of several input planes, where channels occupy the second dimension. 

Performs $L_p$ normalization of inputs over specified dimension. 
Linear functions¶
Applies a linear transformation to the incoming data: $y = xA^T + b$. 

Applies a bilinear transformation to the incoming data: $y = x_1^T A x_2 + b$ 
Dropout functions¶
During training, randomly zeroes some of the elements of the input tensor with probability 

Applies alpha dropout to the input. 

Randomly masks out entire channels (a channel is a feature map, e.g. 

Randomly zero out entire channels (a channel is a 1D feature map, e.g., the $j$th channel of the $i$th sample in the batched input is a 1D tensor $\text{input}[i, j]$) of the input tensor). 

Randomly zero out entire channels (a channel is a 2D feature map, e.g., the $j$th channel of the $i$th sample in the batched input is a 2D tensor $\text{input}[i, j]$) of the input tensor). 

Randomly zero out entire channels (a channel is a 3D feature map, e.g., the $j$th channel of the $i$th sample in the batched input is a 3D tensor $\text{input}[i, j]$) of the input tensor). 
Sparse functions¶
A simple lookup table that looks up embeddings in a fixed dictionary and size. 

Computes sums, means or maxes of bags of embeddings, without instantiating the intermediate embeddings. 

Takes LongTensor with index values of shape 
Distance functions¶
See 

Returns cosine similarity between 

Computes the pnorm distance between every pair of row vectors in the input. 
Loss functions¶
Function that measures the Binary Cross Entropy between the target and input probabilities. 

Function that measures Binary Cross Entropy between target and input logits. 

Poisson negative log likelihood loss. 

See 

This criterion computes the cross entropy loss between input logits and target. 

The Connectionist Temporal Classification loss. 

Gaussian negative log likelihood loss. 

See 

Function that takes the mean elementwise absolute value difference. 

Measures the elementwise mean squared error. 

See 

See 

See 

See 

The negative log likelihood loss. 

Function that uses a squared term if the absolute elementwise error falls below delta and a deltascaled L1 term otherwise. 

Function that uses a squared term if the absolute elementwise error falls below beta and an L1 term otherwise. 

See 

See 

See 
Vision functions¶
Rearranges elements in a tensor of shape $(*, C \times r^2, H, W)$ to a tensor of shape $(*, C, H \times r, W \times r)$, where r is the 

Reverses the 

Pads tensor. 

Down/up samples the input to either the given 

Upsamples the input to either the given 

Upsamples the input, using nearest neighbours' pixel values. 

Upsamples the input, using bilinear upsampling. 

Given an 

Generates a 2D or 3D flow field (sampling grid), given a batch of affine matrices 
DataParallel functions (multiGPU, distributed)¶
data_parallel¶

Evaluates module(input) in parallel across the GPUs given in device_ids. 