Shortcuts

torch.nn.functional

Convolution functions

conv1d

Applies a 1D convolution over an input signal composed of several input planes.

conv2d

Applies a 2D convolution over an input image composed of several input planes.

conv3d

Applies a 3D convolution over an input image composed of several input planes.

conv_transpose1d

Applies a 1D transposed convolution operator over an input signal composed of several input planes, sometimes also called "deconvolution".

conv_transpose2d

Applies a 2D transposed convolution operator over an input image composed of several input planes, sometimes also called "deconvolution".

conv_transpose3d

Applies a 3D transposed convolution operator over an input image composed of several input planes, sometimes also called "deconvolution"

unfold

Extract sliding local blocks from a batched input tensor.

fold

Combine an array of sliding local blocks into a large containing tensor.

Pooling functions

avg_pool1d

Applies a 1D average pooling over an input signal composed of several input planes.

avg_pool2d

Applies 2D average-pooling operation in kH×kWkH \times kW regions by step size sH×sWsH \times sW steps.

avg_pool3d

Applies 3D average-pooling operation in kT×kH×kWkT \times kH \times kW regions by step size sT×sH×sWsT \times sH \times sW steps.

max_pool1d

Applies a 1D max pooling over an input signal composed of several input planes.

max_pool2d

Applies a 2D max pooling over an input signal composed of several input planes.

max_pool3d

Applies a 3D max pooling over an input signal composed of several input planes.

max_unpool1d

Compute a partial inverse of MaxPool1d.

max_unpool2d

Compute a partial inverse of MaxPool2d.

max_unpool3d

Compute a partial inverse of MaxPool3d.

lp_pool1d

Apply a 1D power-average pooling over an input signal composed of several input planes.

lp_pool2d

Apply a 2D power-average pooling over an input signal composed of several input planes.

lp_pool3d

Apply a 3D power-average pooling over an input signal composed of several input planes.

adaptive_max_pool1d

Applies a 1D adaptive max pooling over an input signal composed of several input planes.

adaptive_max_pool2d

Applies a 2D adaptive max pooling over an input signal composed of several input planes.

adaptive_max_pool3d

Applies a 3D adaptive max pooling over an input signal composed of several input planes.

adaptive_avg_pool1d

Applies a 1D adaptive average pooling over an input signal composed of several input planes.

adaptive_avg_pool2d

Apply a 2D adaptive average pooling over an input signal composed of several input planes.

adaptive_avg_pool3d

Apply a 3D adaptive average pooling over an input signal composed of several input planes.

fractional_max_pool2d

Applies 2D fractional max pooling over an input signal composed of several input planes.

fractional_max_pool3d

Applies 3D fractional max pooling over an input signal composed of several input planes.

Attention Mechanisms

The torch.nn.attention.bias module contains attention_biases that are designed to be used with scaled_dot_product_attention.

scaled_dot_product_attention

scaled_dot_product_attention(query, key, value, attn_mask=None, dropout_p=0.0,

Non-linear activation functions

threshold

Apply a threshold to each element of the input Tensor.

threshold_

In-place version of threshold().

relu

Applies the rectified linear unit function element-wise.

relu_

In-place version of relu().

hardtanh

Applies the HardTanh function element-wise.

hardtanh_

In-place version of hardtanh().

hardswish

Apply hardswish function, element-wise.

relu6

Applies the element-wise function ReLU6(x)=min(max(0,x),6)\text{ReLU6}(x) = \min(\max(0,x), 6).

elu

Apply the Exponential Linear Unit (ELU) function element-wise.

elu_

In-place version of elu().

selu

Applies element-wise, SELU(x)=scale(max(0,x)+min(0,α(exp(x)1)))\text{SELU}(x) = scale * (\max(0,x) + \min(0, \alpha * (\exp(x) - 1))), with α=1.6732632423543772848170429916717\alpha=1.6732632423543772848170429916717 and scale=1.0507009873554804934193349852946scale=1.0507009873554804934193349852946.

celu

Applies element-wise, CELU(x)=max(0,x)+min(0,α(exp(x/α)1))\text{CELU}(x) = \max(0,x) + \min(0, \alpha * (\exp(x/\alpha) - 1)).

leaky_relu

Applies element-wise, LeakyReLU(x)=max(0,x)+negative_slopemin(0,x)\text{LeakyReLU}(x) = \max(0, x) + \text{negative\_slope} * \min(0, x)

leaky_relu_

In-place version of leaky_relu().

prelu

Applies element-wise the function PReLU(x)=max(0,x)+weightmin(0,x)\text{PReLU}(x) = \max(0,x) + \text{weight} * \min(0,x) where weight is a learnable parameter.

rrelu

Randomized leaky ReLU.

rrelu_

In-place version of rrelu().

glu

The gated linear unit.

gelu

When the approximate argument is 'none', it applies element-wise the function GELU(x)=xΦ(x)\text{GELU}(x) = x * \Phi(x)

logsigmoid

Applies element-wise LogSigmoid(xi)=log(11+exp(xi))\text{LogSigmoid}(x_i) = \log \left(\frac{1}{1 + \exp(-x_i)}\right)

hardshrink

Applies the hard shrinkage function element-wise

tanhshrink

Applies element-wise, Tanhshrink(x)=xTanh(x)\text{Tanhshrink}(x) = x - \text{Tanh}(x)

softsign

Applies element-wise, the function SoftSign(x)=x1+x\text{SoftSign}(x) = \frac{x}{1 + |x|}

softplus

Applies element-wise, the function Softplus(x)=1βlog(1+exp(βx))\text{Softplus}(x) = \frac{1}{\beta} * \log(1 + \exp(\beta * x)).

softmin

Apply a softmin function.

softmax

Apply a softmax function.

softshrink

Applies the soft shrinkage function elementwise

gumbel_softmax

Sample from the Gumbel-Softmax distribution (Link 1 Link 2) and optionally discretize.

log_softmax

Apply a softmax followed by a logarithm.

tanh

Applies element-wise, Tanh(x)=tanh(x)=exp(x)exp(x)exp(x)+exp(x)\text{Tanh}(x) = \tanh(x) = \frac{\exp(x) - \exp(-x)}{\exp(x) + \exp(-x)}

sigmoid

Applies the element-wise function Sigmoid(x)=11+exp(x)\text{Sigmoid}(x) = \frac{1}{1 + \exp(-x)}

hardsigmoid

Apply the Hardsigmoid function element-wise.

silu

Apply the Sigmoid Linear Unit (SiLU) function, element-wise.

mish

Apply the Mish function, element-wise.

batch_norm

Apply Batch Normalization for each channel across a batch of data.

group_norm

Apply Group Normalization for last certain number of dimensions.

instance_norm

Apply Instance Normalization independently for each channel in every data sample within a batch.

layer_norm

Apply Layer Normalization for last certain number of dimensions.

local_response_norm

Apply local response normalization over an input signal.

rms_norm

Apply Root Mean Square Layer Normalization.

normalize

Perform LpL_p normalization of inputs over specified dimension.

Linear functions

linear

Applies a linear transformation to the incoming data: y=xAT+by = xA^T + b.

bilinear

Applies a bilinear transformation to the incoming data: y=x1TAx2+by = x_1^T A x_2 + b

Dropout functions

dropout

During training, randomly zeroes some elements of the input tensor with probability p.

alpha_dropout

Apply alpha dropout to the input.

feature_alpha_dropout

Randomly masks out entire channels (a channel is a feature map).

dropout1d

Randomly zero out entire channels (a channel is a 1D feature map).

dropout2d

Randomly zero out entire channels (a channel is a 2D feature map).

dropout3d

Randomly zero out entire channels (a channel is a 3D feature map).

Sparse functions

embedding

Generate a simple lookup table that looks up embeddings in a fixed dictionary and size.

embedding_bag

Compute sums, means or maxes of bags of embeddings.

one_hot

Takes LongTensor with index values of shape (*) and returns a tensor of shape (*, num_classes) that have zeros everywhere except where the index of last dimension matches the corresponding value of the input tensor, in which case it will be 1.

Distance functions

pairwise_distance

See torch.nn.PairwiseDistance for details

cosine_similarity

Returns cosine similarity between x1 and x2, computed along dim.

pdist

Computes the p-norm distance between every pair of row vectors in the input.

Loss functions

binary_cross_entropy

Measure Binary Cross Entropy between the target and input probabilities.

binary_cross_entropy_with_logits

Calculate Binary Cross Entropy between target and input logits.

poisson_nll_loss

Poisson negative log likelihood loss.

cosine_embedding_loss

See CosineEmbeddingLoss for details.

cross_entropy

Compute the cross entropy loss between input logits and target.

ctc_loss

Apply the Connectionist Temporal Classification loss.

gaussian_nll_loss

Gaussian negative log likelihood loss.

hinge_embedding_loss

See HingeEmbeddingLoss for details.

kl_div

Compute the KL Divergence loss.

l1_loss

Function that takes the mean element-wise absolute value difference.

mse_loss

Measures the element-wise mean squared error, with optional weighting.

margin_ranking_loss

See MarginRankingLoss for details.

multilabel_margin_loss

See MultiLabelMarginLoss for details.

multilabel_soft_margin_loss

See MultiLabelSoftMarginLoss for details.

multi_margin_loss

See MultiMarginLoss for details.

nll_loss

Compute the negative log likelihood loss.

huber_loss

Computes the Huber loss, with optional weighting.

smooth_l1_loss

Compute the Smooth L1 loss.

soft_margin_loss

See SoftMarginLoss for details.

triplet_margin_loss

Compute the triplet loss between given input tensors and a margin greater than 0.

triplet_margin_with_distance_loss

Compute the triplet margin loss for input tensors using a custom distance function.

Vision functions

pixel_shuffle

Rearranges elements in a tensor of shape (,C×r2,H,W)(*, C \times r^2, H, W) to a tensor of shape (,C,H×r,W×r)(*, C, H \times r, W \times r), where r is the upscale_factor.

pixel_unshuffle

Reverses the PixelShuffle operation by rearranging elements in a tensor of shape (,C,H×r,W×r)(*, C, H \times r, W \times r) to a tensor of shape (,C×r2,H,W)(*, C \times r^2, H, W), where r is the downscale_factor.

pad

Pads tensor.

interpolate

Down/up samples the input.

upsample

Upsample input.

upsample_nearest

Upsamples the input, using nearest neighbours' pixel values.

upsample_bilinear

Upsamples the input, using bilinear upsampling.

grid_sample

Compute grid sample.

affine_grid

Generate 2D or 3D flow field (sampling grid), given a batch of affine matrices theta.

DataParallel functions (multi-GPU, distributed)

data_parallel

torch.nn.parallel.data_parallel

Evaluate module(input) in parallel across the GPUs given in device_ids.

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources