# torch.nn¶

These are the basic building blocks for graphs:

 Parameter A kind of Tensor that is to be considered a module parameter. UninitializedParameter A parameter that is not initialized. UninitializedBuffer A buffer that is not initialized.

## Containers¶

 Module Base class for all neural network modules. Sequential A sequential container. ModuleList Holds submodules in a list. ModuleDict Holds submodules in a dictionary. ParameterList Holds parameters in a list. ParameterDict Holds parameters in a dictionary.

Global Hooks For Module

 register_module_forward_pre_hook Registers a forward pre-hook common to all modules. register_module_forward_hook Registers a global forward hook for all the modules register_module_backward_hook Registers a backward hook common to all the modules. register_module_full_backward_hook Registers a backward hook common to all the modules.

## Convolution Layers¶

 nn.Conv1d Applies a 1D convolution over an input signal composed of several input planes. nn.Conv2d Applies a 2D convolution over an input signal composed of several input planes. nn.Conv3d Applies a 3D convolution over an input signal composed of several input planes. nn.ConvTranspose1d Applies a 1D transposed convolution operator over an input image composed of several input planes. nn.ConvTranspose2d Applies a 2D transposed convolution operator over an input image composed of several input planes. nn.ConvTranspose3d Applies a 3D transposed convolution operator over an input image composed of several input planes. nn.LazyConv1d A torch.nn.Conv1d module with lazy initialization of the in_channels argument of the Conv1d that is inferred from the input.size(1). nn.LazyConv2d A torch.nn.Conv2d module with lazy initialization of the in_channels argument of the Conv2d that is inferred from the input.size(1). nn.LazyConv3d A torch.nn.Conv3d module with lazy initialization of the in_channels argument of the Conv3d that is inferred from the input.size(1). nn.LazyConvTranspose1d A torch.nn.ConvTranspose1d module with lazy initialization of the in_channels argument of the ConvTranspose1d that is inferred from the input.size(1). nn.LazyConvTranspose2d A torch.nn.ConvTranspose2d module with lazy initialization of the in_channels argument of the ConvTranspose2d that is inferred from the input.size(1). nn.LazyConvTranspose3d A torch.nn.ConvTranspose3d module with lazy initialization of the in_channels argument of the ConvTranspose3d that is inferred from the input.size(1). nn.Unfold Extracts sliding local blocks from a batched input tensor. nn.Fold Combines an array of sliding local blocks into a large containing tensor.

## Pooling layers¶

 nn.MaxPool1d Applies a 1D max pooling over an input signal composed of several input planes. nn.MaxPool2d Applies a 2D max pooling over an input signal composed of several input planes. nn.MaxPool3d Applies a 3D max pooling over an input signal composed of several input planes. nn.MaxUnpool1d Computes a partial inverse of MaxPool1d. nn.MaxUnpool2d Computes a partial inverse of MaxPool2d. nn.MaxUnpool3d Computes a partial inverse of MaxPool3d. nn.AvgPool1d Applies a 1D average pooling over an input signal composed of several input planes. nn.AvgPool2d Applies a 2D average pooling over an input signal composed of several input planes. nn.AvgPool3d Applies a 3D average pooling over an input signal composed of several input planes. nn.FractionalMaxPool2d Applies a 2D fractional max pooling over an input signal composed of several input planes. nn.FractionalMaxPool3d Applies a 3D fractional max pooling over an input signal composed of several input planes. nn.LPPool1d Applies a 1D power-average pooling over an input signal composed of several input planes. nn.LPPool2d Applies a 2D power-average pooling over an input signal composed of several input planes. nn.AdaptiveMaxPool1d Applies a 1D adaptive max pooling over an input signal composed of several input planes. nn.AdaptiveMaxPool2d Applies a 2D adaptive max pooling over an input signal composed of several input planes. nn.AdaptiveMaxPool3d Applies a 3D adaptive max pooling over an input signal composed of several input planes. nn.AdaptiveAvgPool1d Applies a 1D adaptive average pooling over an input signal composed of several input planes. nn.AdaptiveAvgPool2d Applies a 2D adaptive average pooling over an input signal composed of several input planes. nn.AdaptiveAvgPool3d Applies a 3D adaptive average pooling over an input signal composed of several input planes.

 nn.ReflectionPad1d Pads the input tensor using the reflection of the input boundary. nn.ReflectionPad2d Pads the input tensor using the reflection of the input boundary. nn.ReflectionPad3d Pads the input tensor using the reflection of the input boundary. nn.ReplicationPad1d Pads the input tensor using replication of the input boundary. nn.ReplicationPad2d Pads the input tensor using replication of the input boundary. nn.ReplicationPad3d Pads the input tensor using replication of the input boundary. nn.ZeroPad2d Pads the input tensor boundaries with zero. nn.ConstantPad1d Pads the input tensor boundaries with a constant value. nn.ConstantPad2d Pads the input tensor boundaries with a constant value. nn.ConstantPad3d Pads the input tensor boundaries with a constant value.

## Non-linear Activations (weighted sum, nonlinearity)¶

 nn.ELU Applies the Exponential Linear Unit (ELU) function, element-wise, as described in the paper: Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs). nn.Hardshrink Applies the Hard Shrinkage (Hardshrink) function element-wise. nn.Hardsigmoid Applies the Hardsigmoid function element-wise. nn.Hardtanh Applies the HardTanh function element-wise. nn.Hardswish Applies the hardswish function, element-wise, as described in the paper: nn.LeakyReLU Applies the element-wise function: nn.LogSigmoid Applies the element-wise function: nn.MultiheadAttention Allows the model to jointly attend to information from different representation subspaces as described in the paper: Attention Is All You Need. nn.PReLU Applies the element-wise function: nn.ReLU Applies the rectified linear unit function element-wise: nn.ReLU6 Applies the element-wise function: nn.RReLU Applies the randomized leaky rectified liner unit function, element-wise, as described in the paper: nn.SELU Applied element-wise, as: nn.CELU Applies the element-wise function: nn.GELU Applies the Gaussian Error Linear Units function: nn.Sigmoid Applies the element-wise function: nn.SiLU Applies the Sigmoid Linear Unit (SiLU) function, element-wise. nn.Mish Applies the Mish function, element-wise. nn.Softplus Applies the Softplus function $\text{Softplus}(x) = \frac{1}{\beta} * \log(1 + \exp(\beta * x))$ element-wise. nn.Softshrink Applies the soft shrinkage function elementwise: nn.Softsign Applies the element-wise function: nn.Tanh Applies the Hyperbolic Tangent (Tanh) function element-wise. nn.Tanhshrink Applies the element-wise function: nn.Threshold Thresholds each element of the input Tensor. nn.GLU Applies the gated linear unit function ${GLU}(a, b)= a \otimes \sigma(b)$ where $a$ is the first half of the input matrices and $b$ is the second half.

## Non-linear Activations (other)¶

 nn.Softmin Applies the Softmin function to an n-dimensional input Tensor rescaling them so that the elements of the n-dimensional output Tensor lie in the range [0, 1] and sum to 1. nn.Softmax Applies the Softmax function to an n-dimensional input Tensor rescaling them so that the elements of the n-dimensional output Tensor lie in the range [0,1] and sum to 1. nn.Softmax2d Applies SoftMax over features to each spatial location. nn.LogSoftmax Applies the $\log(\text{Softmax}(x))$ function to an n-dimensional input Tensor. nn.AdaptiveLogSoftmaxWithLoss Efficient softmax approximation as described in Efficient softmax approximation for GPUs by Edouard Grave, Armand Joulin, Moustapha Cissé, David Grangier, and Hervé Jégou.

## Normalization Layers¶

 nn.BatchNorm1d Applies Batch Normalization over a 2D or 3D input as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift . nn.BatchNorm2d Applies Batch Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension) as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift . nn.BatchNorm3d Applies Batch Normalization over a 5D input (a mini-batch of 3D inputs with additional channel dimension) as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift . nn.LazyBatchNorm1d A torch.nn.BatchNorm1d module with lazy initialization of the num_features argument of the BatchNorm1d that is inferred from the input.size(1). nn.LazyBatchNorm2d A torch.nn.BatchNorm2d module with lazy initialization of the num_features argument of the BatchNorm2d that is inferred from the input.size(1). nn.LazyBatchNorm3d A torch.nn.BatchNorm3d module with lazy initialization of the num_features argument of the BatchNorm3d that is inferred from the input.size(1). nn.GroupNorm Applies Group Normalization over a mini-batch of inputs as described in the paper Group Normalization nn.SyncBatchNorm Applies Batch Normalization over a N-Dimensional input (a mini-batch of [N-2]D inputs with additional channel dimension) as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift . nn.InstanceNorm1d Applies Instance Normalization over a 2D (unbatched) or 3D (batched) input as described in the paper Instance Normalization: The Missing Ingredient for Fast Stylization. nn.InstanceNorm2d Applies Instance Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension) as described in the paper Instance Normalization: The Missing Ingredient for Fast Stylization. nn.InstanceNorm3d Applies Instance Normalization over a 5D input (a mini-batch of 3D inputs with additional channel dimension) as described in the paper Instance Normalization: The Missing Ingredient for Fast Stylization. nn.LazyInstanceNorm1d A torch.nn.InstanceNorm1d module with lazy initialization of the num_features argument of the InstanceNorm1d that is inferred from the input.size(1). nn.LazyInstanceNorm2d A torch.nn.InstanceNorm2d module with lazy initialization of the num_features argument of the InstanceNorm2d that is inferred from the input.size(1). nn.LazyInstanceNorm3d A torch.nn.InstanceNorm3d module with lazy initialization of the num_features argument of the InstanceNorm3d that is inferred from the input.size(1). nn.LayerNorm Applies Layer Normalization over a mini-batch of inputs as described in the paper Layer Normalization nn.LocalResponseNorm Applies local response normalization over an input signal composed of several input planes, where channels occupy the second dimension.

## Recurrent Layers¶

 nn.RNNBase nn.RNN Applies a multi-layer Elman RNN with $\tanh$ or $\text{ReLU}$ non-linearity to an input sequence. nn.LSTM Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence. nn.GRU Applies a multi-layer gated recurrent unit (GRU) RNN to an input sequence. nn.RNNCell An Elman RNN cell with tanh or ReLU non-linearity. nn.LSTMCell A long short-term memory (LSTM) cell. nn.GRUCell A gated recurrent unit (GRU) cell

## Transformer Layers¶

 nn.Transformer A transformer model. nn.TransformerEncoder TransformerEncoder is a stack of N encoder layers nn.TransformerDecoder TransformerDecoder is a stack of N decoder layers nn.TransformerEncoderLayer TransformerEncoderLayer is made up of self-attn and feedforward network. nn.TransformerDecoderLayer TransformerDecoderLayer is made up of self-attn, multi-head-attn and feedforward network.

## Linear Layers¶

 nn.Identity A placeholder identity operator that is argument-insensitive. nn.Linear Applies a linear transformation to the incoming data: $y = xA^T + b$ nn.Bilinear Applies a bilinear transformation to the incoming data: $y = x_1^T A x_2 + b$ nn.LazyLinear A torch.nn.Linear module where in_features is inferred.

## Dropout Layers¶

 nn.Dropout During training, randomly zeroes some of the elements of the input tensor with probability p using samples from a Bernoulli distribution. nn.Dropout1d Randomly zero out entire channels (a channel is a 1D feature map, e.g., the $j$-th channel of the $i$-th sample in the batched input is a 1D tensor $\text{input}[i, j]$). nn.Dropout2d Randomly zero out entire channels (a channel is a 2D feature map, e.g., the $j$-th channel of the $i$-th sample in the batched input is a 2D tensor $\text{input}[i, j]$). nn.Dropout3d Randomly zero out entire channels (a channel is a 3D feature map, e.g., the $j$-th channel of the $i$-th sample in the batched input is a 3D tensor $\text{input}[i, j]$). nn.AlphaDropout Applies Alpha Dropout over the input. nn.FeatureAlphaDropout Randomly masks out entire channels (a channel is a feature map, e.g.

## Sparse Layers¶

 nn.Embedding A simple lookup table that stores embeddings of a fixed dictionary and size. nn.EmbeddingBag Computes sums or means of ‘bags’ of embeddings, without instantiating the intermediate embeddings.

## Distance Functions¶

 nn.CosineSimilarity Returns cosine similarity between $x_1$ and $x_2$, computed along dim. nn.PairwiseDistance Computes the pairwise distance between vectors $v_1$, $v_2$ using the p-norm:

## Loss Functions¶

 nn.L1Loss Creates a criterion that measures the mean absolute error (MAE) between each element in the input $x$ and target $y$. nn.MSELoss Creates a criterion that measures the mean squared error (squared L2 norm) between each element in the input $x$ and target $y$. nn.CrossEntropyLoss This criterion computes the cross entropy loss between input and target. nn.CTCLoss The Connectionist Temporal Classification loss. nn.NLLLoss The negative log likelihood loss. nn.PoissonNLLLoss Negative log likelihood loss with Poisson distribution of target. nn.GaussianNLLLoss Gaussian negative log likelihood loss. nn.KLDivLoss The Kullback-Leibler divergence loss. nn.BCELoss Creates a criterion that measures the Binary Cross Entropy between the target and the input probabilities: nn.BCEWithLogitsLoss This loss combines a Sigmoid layer and the BCELoss in one single class. nn.MarginRankingLoss Creates a criterion that measures the loss given inputs $x1$, $x2$, two 1D mini-batch or 0D Tensors, and a label 1D mini-batch or 0D Tensor $y$ (containing 1 or -1). nn.HingeEmbeddingLoss Measures the loss given an input tensor $x$ and a labels tensor $y$ (containing 1 or -1). nn.MultiLabelMarginLoss Creates a criterion that optimizes a multi-class multi-classification hinge loss (margin-based loss) between input $x$ (a 2D mini-batch Tensor) and output $y$ (which is a 2D Tensor of target class indices). nn.HuberLoss Creates a criterion that uses a squared term if the absolute element-wise error falls below delta and a delta-scaled L1 term otherwise. nn.SmoothL1Loss Creates a criterion that uses a squared term if the absolute element-wise error falls below beta and an L1 term otherwise. nn.SoftMarginLoss Creates a criterion that optimizes a two-class classification logistic loss between input tensor $x$ and target tensor $y$ (containing 1 or -1). nn.MultiLabelSoftMarginLoss Creates a criterion that optimizes a multi-label one-versus-all loss based on max-entropy, between input $x$ and target $y$ of size $(N, C)$. nn.CosineEmbeddingLoss Creates a criterion that measures the loss given input tensors $x_1$, $x_2$ and a Tensor label $y$ with values 1 or -1. nn.MultiMarginLoss Creates a criterion that optimizes a multi-class classification hinge loss (margin-based loss) between input $x$ (a 2D mini-batch Tensor) and output $y$ (which is a 1D tensor of target class indices, $0 \leq y \leq \text{x.size}(1)-1$): nn.TripletMarginLoss Creates a criterion that measures the triplet loss given an input tensors $x1$, $x2$, $x3$ and a margin with a value greater than $0$. nn.TripletMarginWithDistanceLoss Creates a criterion that measures the triplet loss given input tensors $a$, $p$, and $n$ (representing anchor, positive, and negative examples, respectively), and a nonnegative, real-valued function (“distance function”) used to compute the relationship between the anchor and positive example (“positive distance”) and the anchor and negative example (“negative distance”).

## Vision Layers¶

 nn.PixelShuffle Rearranges elements in a tensor of shape $(*, C \times r^2, H, W)$ to a tensor of shape $(*, C, H \times r, W \times r)$, where r is an upscale factor. nn.PixelUnshuffle Reverses the PixelShuffle operation by rearranging elements in a tensor of shape $(*, C, H \times r, W \times r)$ to a tensor of shape $(*, C \times r^2, H, W)$, where r is a downscale factor. nn.Upsample Upsamples a given multi-channel 1D (temporal), 2D (spatial) or 3D (volumetric) data. nn.UpsamplingNearest2d Applies a 2D nearest neighbor upsampling to an input signal composed of several input channels. nn.UpsamplingBilinear2d Applies a 2D bilinear upsampling to an input signal composed of several input channels.

## Shuffle Layers¶

 nn.ChannelShuffle Divide the channels in a tensor of shape $(*, C , H, W)$ into g groups and rearrange them as $(*, C \frac g, g, H, W)$, while keeping the original tensor shape.

## DataParallel Layers (multi-GPU, distributed)¶

 nn.DataParallel Implements data parallelism at the module level. nn.parallel.DistributedDataParallel Implements distributed data parallelism that is based on torch.distributed package at the module level.

## Utilities¶

From the torch.nn.utils module

 clip_grad_norm_ Clips gradient norm of an iterable of parameters. clip_grad_value_ Clips gradient of an iterable of parameters at specified value. parameters_to_vector Convert parameters to one vector vector_to_parameters Convert one vector to the parameters prune.BasePruningMethod Abstract base class for creation of new pruning techniques.
 prune.PruningContainer Container holding a sequence of pruning methods for iterative pruning. prune.Identity Utility pruning method that does not prune any units but generates the pruning parametrization with a mask of ones. prune.RandomUnstructured Prune (currently unpruned) units in a tensor at random. prune.L1Unstructured Prune (currently unpruned) units in a tensor by zeroing out the ones with the lowest L1-norm. prune.RandomStructured Prune entire (currently unpruned) channels in a tensor at random. prune.LnStructured Prune entire (currently unpruned) channels in a tensor based on their Ln-norm. prune.CustomFromMask prune.identity Applies pruning reparametrization to the tensor corresponding to the parameter called name in module without actually pruning any units. prune.random_unstructured Prunes tensor corresponding to parameter called name in module by removing the specified amount of (currently unpruned) units selected at random. prune.l1_unstructured Prunes tensor corresponding to parameter called name in module by removing the specified amount of (currently unpruned) units with the lowest L1-norm. prune.random_structured Prunes tensor corresponding to parameter called name in module by removing the specified amount of (currently unpruned) channels along the specified dim selected at random. prune.ln_structured Prunes tensor corresponding to parameter called name in module by removing the specified amount of (currently unpruned) channels along the specified dim with the lowest Ln-norm. prune.global_unstructured Globally prunes tensors corresponding to all parameters in parameters by applying the specified pruning_method. prune.custom_from_mask Prunes tensor corresponding to parameter called name in module by applying the pre-computed mask in mask. prune.remove Removes the pruning reparameterization from a module and the pruning method from the forward hook. prune.is_pruned Check whether module is pruned by looking for forward_pre_hooks in its modules that inherit from the BasePruningMethod. weight_norm Applies weight normalization to a parameter in the given module. remove_weight_norm Removes the weight normalization reparameterization from a module. spectral_norm Applies spectral normalization to a parameter in the given module. remove_spectral_norm Removes the spectral normalization reparameterization from a module. skip_init Given a module class object and args / kwargs, instantiates the module without initializing parameters / buffers.

Parametrizations implemented using the new parametrization functionality in torch.nn.utils.parameterize.register_parametrization().

 parametrizations.orthogonal Applies an orthogonal or unitary parametrization to a matrix or a batch of matrices. parametrizations.spectral_norm Applies spectral normalization to a parameter in the given module.

Utility functions to parametrize Tensors on existing Modules. Note that these functions can be used to parametrize a given Parameter or Buffer given a specific function that maps from an input space to the parametrized space. They are not parameterizations that would transform an object into a parameter. See the Parametrizations tutorial for more information on how to implement your own parametrizations.

 parametrize.register_parametrization Adds a parametrization to a tensor in a module. parametrize.remove_parametrizations Removes the parametrizations on a tensor in a module. parametrize.cached Context manager that enables the caching system within parametrizations registered with register_parametrization(). parametrize.is_parametrized Returns True if module has an active parametrization.
 parametrize.ParametrizationList A sequential container that holds and manages the original or original0, original1, .

Utility functions to calls a given Module in a stateless manner.

 stateless.functional_call Performs a functional call on the module by replacing the module parameters and buffers with the provided ones.

Utility functions in other modules

 nn.utils.rnn.PackedSequence Holds the data and list of batch_sizes of a packed sequence. nn.utils.rnn.pack_padded_sequence Packs a Tensor containing padded sequences of variable length. nn.utils.rnn.pad_packed_sequence Pads a packed batch of variable length sequences. nn.utils.rnn.pad_sequence Pad a list of variable length Tensors with padding_value nn.utils.rnn.pack_sequence Packs a list of variable length Tensors
 nn.Flatten Flattens a contiguous range of dims into a tensor. nn.Unflatten Unflattens a tensor dim expanding it to a desired shape.

## Quantized Functions¶

Quantization refers to techniques for performing computations and storing tensors at lower bitwidths than floating point precision. PyTorch supports both per tensor and per channel asymmetric linear quantization. To learn more how to use quantized functions in PyTorch, please refer to the Quantization documentation.

## Lazy Modules Initialization¶

 nn.modules.lazy.LazyModuleMixin A mixin for modules that lazily initialize parameters, also known as “lazy modules.”