Class TransformerDecoderLayerImpl

Inheritance Relationships

Base Type

Class Documentation

class torch::nn::TransformerDecoderLayerImpl : public torch::nn::Cloneable<TransformerDecoderLayerImpl>

TransformerDecoderLayer is made up of self-attn, multi-head-attn and feedforward network.

This standard decoder layer is based on the paper “Attention Is All You Need”. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems, pages 6000-6010. Users may modify or implement in a different way during application. See to learn about the exact behavior of this module.

See the documentation for torch::nn::TransformerDecoderLayerOptions class to learn what constructor arguments are supported for this module.


TransformerDecoderLayer model(TransformerDecoderLayerOptions(512, 8).dropout(0.2));

Public Functions

TransformerDecoderLayerImpl(int64_t d_model, int64_t nhead)
TransformerDecoderLayerImpl(const TransformerDecoderLayerOptions &options_)
void reset() override

reset() must perform initialization of all members with reference semantics, most importantly parameters, buffers and submodules.

void reset_parameters()
Tensor forward(Tensor tgt, const Tensor &memory, const Tensor &tgt_mask = {}, const Tensor &memory_mask = {}, const Tensor &tgt_key_padding_mask = {}, const Tensor &memory_key_padding_mask = {})

Pass the inputs (and mask) through the decoder layer.

Args: tgt: the sequence to the decoder layer (required). memory: the sequence from the last layer of the encoder (required). tgt_mask: the mask for the tgt sequence (optional). memory_mask: the mask for the memory sequence (optional). tgt_key_padding_mask: the mask for the tgt keys per batch (optional). memory_key_padding_mask: the mask for the memory keys per batch (optional).

Public Members

TransformerDecoderLayerOptions options

The options used to configure this module.

MultiheadAttention self_attn = {nullptr}

self attention

Dropout dropout1 = {nullptr}

Dropout, post self attention.

LayerNorm norm1 = {nullptr}

Normalization, post self attention.

MultiheadAttention multihead_attn = {nullptr}

Multi-headed attention.

Dropout dropout2 = {nullptr}

Dropout, post multi-headed attention.

LayerNorm norm2 = {nullptr}

Normalization, post multi-headed attention.

Linear linear1 = {nullptr}

Feed forward first linear layer.

Dropout dropout = {nullptr}

Feed forward dropout layer.

Linear linear2 = {nullptr}

Feed forward second linear layer.

Dropout dropout3 = {nullptr}

Dropout, post feed forward.

LayerNorm norm3 = {nullptr}

Normalization, post feed forward.

Protected Functions

bool _forward_has_default_args() override

The following three functions allow a module with default arguments in its forward method to be used in a Sequential module.

You should NEVER override these functions manually. Instead, you should use the FORWARD_HAS_DEFAULT_ARGS macro.

unsigned int _forward_num_required_args() override
std::vector<torch::nn::AnyValue> _forward_populate_default_args(std::vector<torch::nn::AnyValue> &&arguments) override
Tensor activation(const Tensor &input)

Apply activation based on configuration.


friend struct torch::nn::AnyModuleHolder


Access comprehensive developer documentation for PyTorch

View Docs


Get in-depth tutorials for beginners and advanced developers

View Tutorials


Find development resources and get your questions answered

View Resources