class torch.nn.TransformerEncoder(encoder_layer, num_layers, norm=None, enable_nested_tensor=True, mask_check=True)[source]

TransformerEncoder is a stack of N encoder layers.

Users can build the BERT( model with corresponding parameters.

  • encoder_layer (TransformerEncoderLayer) – an instance of the TransformerEncoderLayer() class (required).

  • num_layers (int) – the number of sub-encoder-layers in the encoder (required).

  • norm (Optional[Module]) – the layer normalization component (optional).

  • enable_nested_tensor (bool) – if True, input will automatically convert to nested tensor (and convert back on output). This will improve the overall performance of TransformerEncoder when padding rate is high. Default: True (enabled).

>>> encoder_layer = nn.TransformerEncoderLayer(d_model=512, nhead=8)
>>> transformer_encoder = nn.TransformerEncoder(encoder_layer, num_layers=6)
>>> src = torch.rand(10, 32, 512)
>>> out = transformer_encoder(src)
forward(src, mask=None, src_key_padding_mask=None, is_causal=None)[source]

Pass the input through the encoder layers in turn.

  • src (Tensor) – the sequence to the encoder (required).

  • mask (Optional[Tensor]) – the mask for the src sequence (optional).

  • src_key_padding_mask (Optional[Tensor]) – the mask for the src keys per batch (optional).

  • is_causal (Optional[bool]) – If specified, applies a causal mask as mask. Default: None; try to detect a causal mask. Warning: is_causal provides a hint that mask is the causal mask. Providing incorrect hints can result in incorrect execution, including forward and backward compatibility.

Return type



see the docs in Transformer.


Access comprehensive developer documentation for PyTorch

View Docs


Get in-depth tutorials for beginners and advanced developers

View Tutorials


Find development resources and get your questions answered

View Resources