FusionLayer¶
- class torchtune.modules.model_fusion.FusionLayer(layer: Module, fusion_layer: Module, fusion_first: bool = True)[source]¶
Fusion layer as introduced in Flamingo: a Visual Language Model for Few-Shot Learning.
Deep Fusion model architectures combine pretrained encoder models with pretrained language models by infusing the encoder outputs into the middle layers of the LLM. This allows the language model to interpret the enocder outputs as text and “understand” any modality for which you can train an encoder. To enable the language model to adapt to the encoder outputs, the FusionLayer fuses a new learnable layer to an existing decoder (language model) layer. This additional layer can take the encoder embeddings and learn to combine them with the token embeddings from the decoder. The module supports fusing the new layer before or after the original, in Flamingo the new layer is fused before the original.
The original layer is wrapped in FusionLayer such that it maintains its original state_dict key and the pre-trained checkpoint isn’t broken. The new layer parameters are available through
fusion_params
to separately control if they’re trainable or not.Example
>>> # Original decoder style transformer >>> layer = nn.TransformerSelfAttentionLayer(...) >>> model = TransformerDecoder(layers=layer, num_layers=32, ...) >>> >>> # Fuse a cross attention layer to each self attention layer to adapt for the encoder >>> fusion_layer = nn.TransformerCrossAttentionLayer(...) >>> fused_layer = FusionLayer(layer, fusion_layer) >>> model = TransformerDecoder(layers=fused_layer, num_layers=32, ...) >>> >>> # Original decoder state_dict still works >>> model.load_state_dict(..., strict=False)
- Parameters:
layer (nn.Module) – original decoder layer
fusion_layer (nn.Module) – new fusion layer
fusion_first (bool) – boolean to insert fusion layer before or after the decoder layer.
- caches_are_enabled() bool [source]¶
Checks if the key value caches on
self.layer
are enabled. See :func:~torchtune.modules.TransformerDecoder.caches_are_enabled`.
- caches_are_setup() bool [source]¶
Check if the key value caches are setup on
self.layer
. See :func:~torchtune.modules.TransformerDecoder.caches_are_setup`.
- forward(x: Tensor, **kwargs: Dict) Tensor [source]¶
- Parameters:
x (torch.Tensor) – input tensor with shape [batch_size x seq_length x embed_dim]
**kwargs (Dict) – all additional layer args
- Returns:
- output tensor with same shape as input
[batch_size x seq_length x embed_dim]`
- Return type:
Tensor