torch.nn

These are the basic building blocks for graphs:

torch.nn

Parameter

A kind of Tensor that is to be considered a module parameter.

UninitializedParameter

A parameter that is not initialized.

UninitializedBuffer

A buffer that is not initialized.

Containers

`Module`	Base class for all neural network modules.
`Sequential`	A sequential container.
`ModuleList`	Holds submodules in a list.
`ModuleDict`	Holds submodules in a dictionary.
`ParameterList`	Holds parameters in a list.
`ParameterDict`	Holds parameters in a dictionary.

Global Hooks For Module

register_module_forward_pre_hook

Registers a forward pre-hook common to all modules.

register_module_forward_hook

Registers a global forward hook for all the modules

register_module_backward_hook

Registers a backward hook common to all the modules.

Convolution Layers

`nn.Conv1d`	Applies a 1D convolution over an input signal composed of several input planes.
`nn.Conv2d`	Applies a 2D convolution over an input signal composed of several input planes.
`nn.Conv3d`	Applies a 3D convolution over an input signal composed of several input planes.
`nn.ConvTranspose1d`	Applies a 1D transposed convolution operator over an input image composed of several input planes.
`nn.ConvTranspose2d`	Applies a 2D transposed convolution operator over an input image composed of several input planes.
`nn.ConvTranspose3d`	Applies a 3D transposed convolution operator over an input image composed of several input planes.
`nn.LazyConv1d`	A `torch.nn.Conv1d` module with lazy initialization of the `in_channels` argument of the `Conv1d` that is inferred from the `input.size(1)`.
`nn.LazyConv2d`	A `torch.nn.Conv2d` module with lazy initialization of the `in_channels` argument of the `Conv2d` that is inferred from the `input.size(1)`.
`nn.LazyConv3d`	A `torch.nn.Conv3d` module with lazy initialization of the `in_channels` argument of the `Conv3d` that is inferred from the `input.size(1)`.
`nn.LazyConvTranspose1d`	A `torch.nn.ConvTranspose1d` module with lazy initialization of the `in_channels` argument of the `ConvTranspose1d` that is inferred from the `input.size(1)`.
`nn.LazyConvTranspose2d`	A `torch.nn.ConvTranspose2d` module with lazy initialization of the `in_channels` argument of the `ConvTranspose2d` that is inferred from the `input.size(1)`.
`nn.LazyConvTranspose3d`	A `torch.nn.ConvTranspose3d` module with lazy initialization of the `in_channels` argument of the `ConvTranspose3d` that is inferred from the `input.size(1)`.
`nn.Unfold`	Extracts sliding local blocks from a batched input tensor.
`nn.Fold`	Combines an array of sliding local blocks into a large containing tensor.

Pooling layers

`nn.MaxPool1d`	Applies a 1D max pooling over an input signal composed of several input planes.
`nn.MaxPool2d`	Applies a 2D max pooling over an input signal composed of several input planes.
`nn.MaxPool3d`	Applies a 3D max pooling over an input signal composed of several input planes.
`nn.MaxUnpool1d`	Computes a partial inverse of `MaxPool1d`.
`nn.MaxUnpool2d`	Computes a partial inverse of `MaxPool2d`.
`nn.MaxUnpool3d`	Computes a partial inverse of `MaxPool3d`.
`nn.AvgPool1d`	Applies a 1D average pooling over an input signal composed of several input planes.
`nn.AvgPool2d`	Applies a 2D average pooling over an input signal composed of several input planes.
`nn.AvgPool3d`	Applies a 3D average pooling over an input signal composed of several input planes.
`nn.FractionalMaxPool2d`	Applies a 2D fractional max pooling over an input signal composed of several input planes.
`nn.FractionalMaxPool3d`	Applies a 3D fractional max pooling over an input signal composed of several input planes.
`nn.LPPool1d`	Applies a 1D power-average pooling over an input signal composed of several input planes.
`nn.LPPool2d`	Applies a 2D power-average pooling over an input signal composed of several input planes.
`nn.AdaptiveMaxPool1d`	Applies a 1D adaptive max pooling over an input signal composed of several input planes.
`nn.AdaptiveMaxPool2d`	Applies a 2D adaptive max pooling over an input signal composed of several input planes.
`nn.AdaptiveMaxPool3d`	Applies a 3D adaptive max pooling over an input signal composed of several input planes.
`nn.AdaptiveAvgPool1d`	Applies a 1D adaptive average pooling over an input signal composed of several input planes.
`nn.AdaptiveAvgPool2d`	Applies a 2D adaptive average pooling over an input signal composed of several input planes.
`nn.AdaptiveAvgPool3d`	Applies a 3D adaptive average pooling over an input signal composed of several input planes.

Padding Layers

`nn.ReflectionPad1d`	Pads the input tensor using the reflection of the input boundary.
`nn.ReflectionPad2d`	Pads the input tensor using the reflection of the input boundary.
`nn.ReplicationPad1d`	Pads the input tensor using replication of the input boundary.
`nn.ReplicationPad2d`	Pads the input tensor using replication of the input boundary.
`nn.ReplicationPad3d`	Pads the input tensor using replication of the input boundary.
`nn.ZeroPad2d`	Pads the input tensor boundaries with zero.
`nn.ConstantPad1d`	Pads the input tensor boundaries with a constant value.
`nn.ConstantPad2d`	Pads the input tensor boundaries with a constant value.
`nn.ConstantPad3d`	Pads the input tensor boundaries with a constant value.

Non-linear Activations (weighted sum, nonlinearity)

`nn.ELU`	Applies the element-wise function:
`nn.Hardshrink`	Applies the hard shrinkage function element-wise:
`nn.Hardsigmoid`	Applies the element-wise function:
`nn.Hardtanh`	Applies the HardTanh function element-wise
`nn.Hardswish`	Applies the hardswish function, element-wise, as described in the paper:
`nn.LeakyReLU`	Applies the element-wise function:
`nn.LogSigmoid`	Applies the element-wise function:
`nn.MultiheadAttention`	Allows the model to jointly attend to information from different representation subspaces.
`nn.PReLU`	Applies the element-wise function:
`nn.ReLU`	Applies the rectified linear unit function element-wise:
`nn.ReLU6`	Applies the element-wise function:
`nn.RReLU`	Applies the randomized leaky rectified liner unit function, element-wise, as described in the paper:
`nn.SELU`	Applied element-wise, as:
`nn.CELU`	Applies the element-wise function:
`nn.GELU`	Applies the Gaussian Error Linear Units function:
`nn.Sigmoid`	Applies the element-wise function:
`nn.SiLU`	Applies the Sigmoid Linear Unit (SiLU) function, element-wise.
`nn.Mish`	Applies the Mish function, element-wise.
`nn.Softplus`	Applies the element-wise function:
`nn.Softshrink`	Applies the soft shrinkage function elementwise:
`nn.Softsign`	Applies the element-wise function:
`nn.Tanh`	Applies the element-wise function:
`nn.Tanhshrink`	Applies the element-wise function:
`nn.Threshold`	Thresholds each element of the input Tensor.

Non-linear Activations (other)

`nn.Softmin`	Applies the Softmin function to an n-dimensional input Tensor rescaling them so that the elements of the n-dimensional output Tensor lie in the range [0, 1] and sum to 1.
`nn.Softmax`	Applies the Softmax function to an n-dimensional input Tensor rescaling them so that the elements of the n-dimensional output Tensor lie in the range [0,1] and sum to 1.
`nn.Softmax2d`	Applies SoftMax over features to each spatial location.
`nn.LogSoftmax`	Applies the $\log(\text{Softmax}(x))$ function to an n-dimensional input Tensor.
`nn.AdaptiveLogSoftmaxWithLoss`	Efficient softmax approximation as described in Efficient softmax approximation for GPUs by Edouard Grave, Armand Joulin, Moustapha Cissé, David Grangier, and Hervé Jégou.

Normalization Layers

`nn.BatchNorm1d`	Applies Batch Normalization over a 2D or 3D input (a mini-batch of 1D inputs with optional additional channel dimension) as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift .
`nn.BatchNorm2d`	Applies Batch Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension) as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift .
`nn.BatchNorm3d`	Applies Batch Normalization over a 5D input (a mini-batch of 3D inputs with additional channel dimension) as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift .
`nn.LazyBatchNorm1d`	A `torch.nn.BatchNorm1d` module with lazy initialization of the `num_features` argument of the `BatchNorm1d` that is inferred from the `input.size(1)`.
`nn.LazyBatchNorm2d`	A `torch.nn.BatchNorm2d` module with lazy initialization of the `num_features` argument of the `BatchNorm2d` that is inferred from the `input.size(1)`.
`nn.LazyBatchNorm3d`	A `torch.nn.BatchNorm3d` module with lazy initialization of the `num_features` argument of the `BatchNorm3d` that is inferred from the `input.size(1)`.
`nn.GroupNorm`	Applies Group Normalization over a mini-batch of inputs as described in the paper Group Normalization
`nn.SyncBatchNorm`	Applies Batch Normalization over a N-Dimensional input (a mini-batch of [N-2]D inputs with additional channel dimension) as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift .
`nn.InstanceNorm1d`	Applies Instance Normalization over a 3D input (a mini-batch of 1D inputs with optional additional channel dimension) as described in the paper Instance Normalization: The Missing Ingredient for Fast Stylization.
`nn.InstanceNorm2d`	Applies Instance Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension) as described in the paper Instance Normalization: The Missing Ingredient for Fast Stylization.
`nn.InstanceNorm3d`	Applies Instance Normalization over a 5D input (a mini-batch of 3D inputs with additional channel dimension) as described in the paper Instance Normalization: The Missing Ingredient for Fast Stylization.
`nn.LayerNorm`	Applies Layer Normalization over a mini-batch of inputs as described in the paper Layer Normalization
`nn.LocalResponseNorm`	Applies local response normalization over an input signal composed of several input planes, where channels occupy the second dimension.

Recurrent Layers

`nn.RNNBase`
`nn.RNN`	Applies a multi-layer Elman RNN with $\tanh$ or $\text{ReLU}$ non-linearity to an input sequence.
`nn.LSTM`	Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence.
`nn.GRU`	Applies a multi-layer gated recurrent unit (GRU) RNN to an input sequence.
`nn.RNNCell`	An Elman RNN cell with tanh or ReLU non-linearity.
`nn.LSTMCell`	A long short-term memory (LSTM) cell.
`nn.GRUCell`	A gated recurrent unit (GRU) cell

Transformer Layers

`nn.Transformer`	A transformer model.
`nn.TransformerEncoder`	TransformerEncoder is a stack of N encoder layers
`nn.TransformerDecoder`	TransformerDecoder is a stack of N decoder layers
`nn.TransformerEncoderLayer`	TransformerEncoderLayer is made up of self-attn and feedforward network.
`nn.TransformerDecoderLayer`	TransformerDecoderLayer is made up of self-attn, multi-head-attn and feedforward network.

Linear Layers

`nn.Identity`	A placeholder identity operator that is argument-insensitive.
`nn.Linear`	Applies a linear transformation to the incoming data: $y = xA^T + b$
`nn.Bilinear`	Applies a bilinear transformation to the incoming data: $y = x_1^T A x_2 + b$
`nn.LazyLinear`	A `torch.nn.Linear` module where in_features is inferred.

Dropout Layers

`nn.Dropout`	During training, randomly zeroes some of the elements of the input tensor with probability `p` using samples from a Bernoulli distribution.
`nn.Dropout2d`	Randomly zero out entire channels (a channel is a 2D feature map, e.g., the $j$ -th channel of the $i$ -th sample in the batched input is a 2D tensor $\text{input}[i, j]$ ).
`nn.Dropout3d`	Randomly zero out entire channels (a channel is a 3D feature map, e.g., the $j$ -th channel of the $i$ -th sample in the batched input is a 3D tensor $\text{input}[i, j]$ ).
`nn.AlphaDropout`	Applies Alpha Dropout over the input.

Sparse Layers

`nn.Embedding`	A simple lookup table that stores embeddings of a fixed dictionary and size.
`nn.EmbeddingBag`	Computes sums or means of ‘bags’ of embeddings, without instantiating the intermediate embeddings.

Distance Functions

`nn.CosineSimilarity`	Returns cosine similarity between $x_1$ and $x_2$ , computed along dim.
`nn.PairwiseDistance`	Computes the batchwise pairwise distance between vectors $v_1$ , $v_2$ using the p-norm:

Loss Functions

`nn.L1Loss`	Creates a criterion that measures the mean absolute error (MAE) between each element in the input $x$ and target $y$ .
`nn.MSELoss`	Creates a criterion that measures the mean squared error (squared L2 norm) between each element in the input $x$ and target $y$ .
`nn.CrossEntropyLoss`	This criterion combines `LogSoftmax` and `NLLLoss` in one single class.
`nn.CTCLoss`	The Connectionist Temporal Classification loss.
`nn.NLLLoss`	The negative log likelihood loss.
`nn.PoissonNLLLoss`	Negative log likelihood loss with Poisson distribution of target.
`nn.GaussianNLLLoss`	Gaussian negative log likelihood loss.
`nn.KLDivLoss`	The Kullback-Leibler divergence loss measure
`nn.BCELoss`	Creates a criterion that measures the Binary Cross Entropy between the target and the output:
`nn.BCEWithLogitsLoss`	This loss combines a Sigmoid layer and the BCELoss in one single class.
`nn.MarginRankingLoss`	Creates a criterion that measures the loss given inputs $x1$ , $x2$ , two 1D mini-batch Tensors, and a label 1D mini-batch tensor $y$ (containing 1 or -1).
`nn.HingeEmbeddingLoss`	Measures the loss given an input tensor $x$ and a labels tensor $y$ (containing 1 or -1).
`nn.MultiLabelMarginLoss`	Creates a criterion that optimizes a multi-class multi-classification hinge loss (margin-based loss) between input $x$ (a 2D mini-batch Tensor) and output $y$ (which is a 2D Tensor of target class indices).
`nn.HuberLoss`	Creates a criterion that uses a squared term if the absolute element-wise error falls below delta and a delta-scaled L1 term otherwise.
`nn.SmoothL1Loss`	Creates a criterion that uses a squared term if the absolute element-wise error falls below beta and an L1 term otherwise.
`nn.SoftMarginLoss`	Creates a criterion that optimizes a two-class classification logistic loss between input tensor $x$ and target tensor $y$ (containing 1 or -1).
`nn.MultiLabelSoftMarginLoss`	Creates a criterion that optimizes a multi-label one-versus-all loss based on max-entropy, between input $x$ and target $y$ of size $(N, C)$ .
`nn.CosineEmbeddingLoss`	Creates a criterion that measures the loss given input tensors $x_1$ , $x_2$ and a Tensor label $y$ with values 1 or -1.
`nn.MultiMarginLoss`	Creates a criterion that optimizes a multi-class classification hinge loss (margin-based loss) between input $x$ (a 2D mini-batch Tensor) and output $y$ (which is a 1D tensor of target class indices, $0 \leq y \leq \text{x.size}(1)-1$ ):
`nn.TripletMarginLoss`	Creates a criterion that measures the triplet loss given an input tensors $x1$ , $x2$ , $x3$ and a margin with a value greater than $0$ .
`nn.TripletMarginWithDistanceLoss`	Creates a criterion that measures the triplet loss given input tensors $a$ , $p$ , and $n$ (representing anchor, positive, and negative examples, respectively), and a nonnegative, real-valued function (“distance function”) used to compute the relationship between the anchor and positive example (“positive distance”) and the anchor and negative example (“negative distance”).

Vision Layers

`nn.PixelShuffle`	Rearranges elements in a tensor of shape $(, C \times r^2, H, W)$ to a tensor of shape $(, C, H \times r, W \times r)$ , where r is an upscale factor.
`nn.PixelUnshuffle`	Reverses the `PixelShuffle` operation by rearranging elements in a tensor of shape $(, C, H \times r, W \times r)$ to a tensor of shape $(, C \times r^2, H, W)$ , where r is a downscale factor.
`nn.Upsample`	Upsamples a given multi-channel 1D (temporal), 2D (spatial) or 3D (volumetric) data.
`nn.UpsamplingNearest2d`	Applies a 2D nearest neighbor upsampling to an input signal composed of several input channels.
`nn.UpsamplingBilinear2d`	Applies a 2D bilinear upsampling to an input signal composed of several input channels.

Shuffle Layers

nn.ChannelShuffle

Divide the channels in a tensor of shape $(*, C , H, W)$ into g groups and rearrange them as $(*, C \frac g, g, H, W)$ , while keeping the original tensor shape.

DataParallel Layers (multi-GPU, distributed)

`nn.DataParallel`	Implements data parallelism at the module level.
`nn.parallel.DistributedDataParallel`	Implements distributed data parallelism that is based on `torch.distributed` package at the module level.

Utilities

From the torch.nn.utils module

`clip_grad_norm_`	Clips gradient norm of an iterable of parameters.
`clip_grad_value_`	Clips gradient of an iterable of parameters at specified value.
`parameters_to_vector`	Convert parameters to one vector
`vector_to_parameters`	Convert one vector to the parameters
`prune.BasePruningMethod`	Abstract base class for creation of new pruning techniques.

`prune.PruningContainer`	Container holding a sequence of pruning methods for iterative pruning.
`prune.Identity`	Utility pruning method that does not prune any units but generates the pruning parametrization with a mask of ones.
`prune.RandomUnstructured`	Prune (currently unpruned) units in a tensor at random.
`prune.L1Unstructured`	Prune (currently unpruned) units in a tensor by zeroing out the ones with the lowest L1-norm.
`prune.RandomStructured`	Prune entire (currently unpruned) channels in a tensor at random.
`prune.LnStructured`	Prune entire (currently unpruned) channels in a tensor based on their Ln-norm.
`prune.CustomFromMask`
`prune.identity`	Applies pruning reparametrization to the tensor corresponding to the parameter called `name` in `module` without actually pruning any units.
`prune.random_unstructured`	Prunes tensor corresponding to parameter called `name` in `module` by removing the specified `amount` of (currently unpruned) units selected at random.
`prune.l1_unstructured`	Prunes tensor corresponding to parameter called `name` in `module` by removing the specified amount of (currently unpruned) units with the lowest L1-norm.
`prune.random_structured`	Prunes tensor corresponding to parameter called `name` in `module` by removing the specified `amount` of (currently unpruned) channels along the specified `dim` selected at random.
`prune.ln_structured`	Prunes tensor corresponding to parameter called `name` in `module` by removing the specified `amount` of (currently unpruned) channels along the specified `dim` with the lowest L``n``-norm.
`prune.global_unstructured`	Globally prunes tensors corresponding to all parameters in `parameters` by applying the specified `pruning_method`.
`prune.custom_from_mask`	Prunes tensor corresponding to parameter called `name` in `module` by applying the pre-computed mask in `mask`.
`prune.remove`	Removes the pruning reparameterization from a module and the pruning method from the forward hook.
`prune.is_pruned`	Check whether `module` is pruned by looking for `forward_pre_hooks` in its modules that inherit from the `BasePruningMethod`.
`weight_norm`	Applies weight normalization to a parameter in the given module.
`remove_weight_norm`	Removes the weight normalization reparameterization from a module.
`spectral_norm`	Applies spectral normalization to a parameter in the given module.
`remove_spectral_norm`	Removes the spectral normalization reparameterization from a module.

Parametrizations implemented using the new parametrization functionality in torch.nn.utils.parameterize.register_parametrization().

parametrizations.spectral_norm

Applies spectral normalization to a parameter in the given module.

Utility functions to parametrize Tensors on existing Modules. Note that these functions can be used to parametrize a given Parameter or Buffer given a specific function that maps from an input space to the parametrized space. They are not parameterizations that would transform an object into a parameter. See the Parametrizations tutorial for more information on how to implement your own parametrizations.

`parametrize.register_parametrization`	Adds a parametrization to a tensor in a module.
`parametrize.remove_parametrizations`	Removes the parametrizations on a tensor in a module.
`parametrize.cached`	Context manager that enables the caching system within parametrizations registered with `register_parametrization()`.
`parametrize.is_parametrized`	Returns `True` if module has an active parametrization.

parametrize.ParametrizationList

A sequential container that holds and manages the original parameter or buffer of a parametrized torch.nn.Module.

Utility functions in other modules

`nn.utils.rnn.PackedSequence`	Holds the data and list of `batch_sizes` of a packed sequence.
`nn.utils.rnn.pack_padded_sequence`	Packs a Tensor containing padded sequences of variable length.
`nn.utils.rnn.pad_packed_sequence`	Pads a packed batch of variable length sequences.
`nn.utils.rnn.pad_sequence`	Pad a list of variable length Tensors with `padding_value`
`nn.utils.rnn.pack_sequence`	Packs a list of variable length Tensors

`nn.Flatten`	Flattens a contiguous range of dims into a tensor.
`nn.Unflatten`	Unflattens a tensor dim expanding it to a desired shape.

Quantized Functions

Quantization refers to techniques for performing computations and storing tensors at lower bitwidths than floating point precision. PyTorch supports both per tensor and per channel asymmetric linear quantization. To learn more how to use quantized functions in PyTorch, please refer to the Quantization documentation.

Lazy Modules Initialization

nn.modules.lazy.LazyModuleMixin

A mixin for modules that lazily initialize parameters, also known as “lazy modules.”

torch.nn

Containers

Convolution Layers

Pooling layers

Padding Layers

Non-linear Activations (weighted sum, nonlinearity)

Non-linear Activations (other)

Normalization Layers

Recurrent Layers

Transformer Layers

Linear Layers

Dropout Layers

Sparse Layers

Distance Functions

Loss Functions

Vision Layers

Shuffle Layers

DataParallel Layers (multi-GPU, distributed)

Utilities

Quantized Functions

Lazy Modules Initialization

Docs

Tutorials

Resources