Shortcuts

torchtune.modules

Modeling Components and Building Blocks

MultiHeadAttention

Multi-headed attention layer with support for grouped query attention (GQA) introduced in https://arxiv.org/abs/2305.13245v1.

FeedForward

This class implements the feed-forward network derived from Llama2.

KVCache

Standalone nn.Module containing a kv-cache to cache past key and values during inference.

RotaryPositionalEmbeddings

This class implements Rotary Positional Embeddings (RoPE) proposed in https://arxiv.org/abs/2104.09864.

RMSNorm

Root Mean Square Normalization in fp32.

Fp32LayerNorm

Wrapper around LayerNorm to support mixed-precision training.

TanhGate

Implements a basic learnable gate to scale layer outputs

TiedLinear

A tied linear layer, without bias, that shares the same weight as another linear layer.

TransformerSelfAttentionLayer

Transformer layer derived from the Llama2 model.

TransformerCrossAttentionLayer

Cross attention Transformer layer following the same conventions as the TransformerSelfAttentionLayer.

TransformerDecoder

Transformer Decoder derived from the Llama2 architecture.

VisionTransformer

Implementation of the ViT architecture (https://arxiv.org/abs/2010.11929), with support for tile-cropped images, outputting of hidden layers and optional CLS projection.

LayerDropout

A module that applies layer dropout to the input tensor of an underlying module.

prepare_layer_dropout

Prepare a model's layers for layer dropout by wrapping each layer with a ModuleLayerDropoutWrapper.

Losses

loss.CEWithChunkedOutputLoss

Cross-entropy with chunked outputs that saves memory by only upcasting one chunk at a time.

loss.ForwardKLLoss

The Kullback-Leibler divergence loss for valid indexes.

loss.ForwardKLWithChunkedOutputLoss

Forward KL with chunked outputs that saves memory by only upcasting one chunk at a time.

Base Tokenizers

Base tokenizers are tokenizer models that perform the direct encoding of text into token IDs and decoding of token IDs into text. These are typically byte pair encodings that underlie the model specific tokenizers.

tokenizers.SentencePieceBaseTokenizer

A light-weight wrapper around SentencePieceProcessor that additionally handles trimming leading whitespaces.

tokenizers.TikTokenBaseTokenizer

A lightweight wrapper around tiktoken Encoding.

tokenizers.ModelTokenizer

Abstract tokenizer that implements model-specific special token logic in the tokenize_messages method.

tokenizers.BaseTokenizer

Abstract token encoding model that implements encode and decode methods.

Tokenizer Utilities

These are helper methods that can be used by any tokenizer.

tokenizers.tokenize_messages_no_special_tokens

Tokenize a list of messages one at a time then concatenate them, returning a list of tokens and a list of masks.

tokenizers.parse_hf_tokenizer_json

Parse the tokenizer.json file from a Hugging Face model to extract the special token str to id mapping.

PEFT Components

peft.LoRALinear

LoRA linear layer as introduced in LoRA: Low-Rank Adaptation of Large Language Models.

peft.DoRALinear

DoRA linear layer as introduced in DoRA: Weight-Decomposed Low-Rank Adaptation of Large Language Models.

peft.AdapterModule

Interface for an nn.Module containing adapter weights.

peft.get_adapter_params

Return the subset of parameters from a model that correspond to an adapter.

peft.set_trainable_params

Set trainable parameters for an nn.Module based on a state dict of adapter parameters.

peft.get_adapter_state_dict

Return the subset of the full state_dict from a model that correspond to an adapter.

peft.validate_missing_and_unexpected_for_lora

A more memory-efficient way to validate that LoRA state dict loading was done properly.

peft.disable_adapter

Temporarily disable the adapters in a model.

Fusion Components

Components for building models that are a fusion of two+ pre-trained models.

model_fusion.DeepFusionModel

DeepFusion is a type of fused model architecture where a pretrained encoder is combined with a pretrained decoder (LLM) in the internal decoder layers.

model_fusion.FusionLayer

Fusion layer as introduced in Flamingo: a Visual Language Model for Few-Shot Learning.

model_fusion.FusionEmbedding

Fusion embedding supports training additional special tokens while keeping the original embedding frozen.

model_fusion.register_fusion_module

Add the method fusion_params to an nn.Module that marks all of the Modules parameters as fusion params.

model_fusion.get_fusion_params

Return the subset of parameters from a model that correspond to fused modules.

Module Utilities

These are utilities that are common to and can be used by all modules.

common_utils.reparametrize_as_dtype_state_dict_post_hook

A state_dict hook that replaces NF4 tensors with their restored higher-precision weight and optionally offloads the restored weight to CPU.

common_utils.local_kv_cache

This context manager temporarily enables KV-cacheing on a given model, which does not already have KV-caches setup.

common_utils.disable_kv_cache

This context manager temporarily disables KV-cacheing on a given model, which must already already have KV-caches setup.

common_utils.delete_kv_caches

Deletes KV caches from all attention layers in a model, and also ensures cache_enabled is set to False.

Vision Transforms

Functions used for preprocessing images.

transforms.Transform

Loose interface for all data and model transforms.

transforms.VisionCrossAttentionMask

Computes the cross-attention mask for text + image inputs.

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources