Table of Contents

Shortcuts

torchtune.modules

Modeling Components and Building Blocks

`MultiHeadAttention`	Multi-headed attention layer with support for grouped query attention (GQA) introduced in https://arxiv.org/abs/2305.13245v1.
`FeedForward`	This class implements the feed-forward network derived from Llama2.
`KVCache`	Standalone `nn.Module` containing a kv-cache to cache past key and values during inference.
`RotaryPositionalEmbeddings`	This class implements Rotary Positional Embeddings (RoPE) proposed in https://arxiv.org/abs/2104.09864.
`RMSNorm`	Root Mean Square Normalization in fp32.
`Fp32LayerNorm`	Wrapper around `LayerNorm` to support mixed-precision training.
`TanhGate`	Implements a basic learnable gate to scale layer outputs
`TiedLinear`	A tied linear layer, without bias, that shares the same weight as another linear layer.
`TransformerSelfAttentionLayer`	Transformer layer derived from the Llama2 model.
`TransformerCrossAttentionLayer`	Cross attention Transformer layer following the same conventions as the TransformerSelfAttentionLayer.
`TransformerDecoder`	Transformer Decoder derived from the Llama2 architecture.
`VisionTransformer`	Implementation of the ViT architecture (https://arxiv.org/abs/2010.11929), with support for tile-cropped images, outputting of hidden layers and optional CLS projection.
`LayerDropout`	A module that applies layer dropout to the input tensor of an underlying module.
`prepare_layer_dropout`	Prepare a model's layers for layer dropout by wrapping each layer with a ModuleLayerDropoutWrapper.
`classifier_model`	Create a classifier model from a base model by adapting the output layer.

Losses

`loss.CEWithChunkedOutputLoss`	Cross-entropy with chunked outputs that saves memory by only upcasting one chunk at a time.
`loss.ForwardKLLoss`	The Kullback-Leibler divergence loss for valid indexes.
`loss.ForwardKLWithChunkedOutputLoss`	Forward KL with chunked outputs that saves memory by only upcasting one chunk at a time.

Base Tokenizers

Base tokenizers are tokenizer models that perform the direct encoding of text into token IDs and decoding of token IDs into text. These are typically byte pair encodings that underlie the model specific tokenizers.

`transforms.tokenizers.SentencePieceBaseTokenizer`	A light-weight wrapper around SentencePieceProcessor that additionally handles trimming leading whitespaces.
`transforms.tokenizers.TikTokenBaseTokenizer`	A lightweight wrapper around tiktoken Encoding.
`transforms.tokenizers.HuggingFaceBaseTokenizer`	A wrapper around Hugging Face tokenizers.
`transforms.tokenizers.ModelTokenizer`	Abstract tokenizer that implements model-specific special token logic in the `tokenize_messages` method.
`transforms.tokenizers.BaseTokenizer`	Abstract token encoding model that implements `encode` and `decode` methods.

Tokenizer Utilities

These are helper methods that can be used by any tokenizer.

`transforms.tokenizers.tokenize_messages_no_special_tokens`	Tokenize a list of messages one at a time then concatenate them, returning a list of tokens and a list of masks.
`transforms.tokenizers.parse_hf_tokenizer_json`	Parse the `tokenizer.json` file from a Hugging Face model to extract the special token str to id mapping.

PEFT Components

`peft.LoRALinear`	LoRA linear layer as introduced in LoRA: Low-Rank Adaptation of Large Language Models.
`peft.DoRALinear`	DoRA linear layer as introduced in DoRA: Weight-Decomposed Low-Rank Adaptation of Large Language Models.
`peft.AdapterModule`	Interface for an `nn.Module` containing adapter weights.
`peft.get_adapter_params`	Return the subset of parameters from a model that correspond to an adapter.
`peft.set_trainable_params`	Set trainable parameters for an nn.Module based on a state dict of adapter parameters.
`peft.get_adapter_state_dict`	Return the subset of the full state_dict from a model that correspond to an adapter.
`peft.validate_missing_and_unexpected_for_lora`	This function checks that LoRA and/or base model weights are loaded into the full model correctly.
`peft.disable_adapter`	Temporarily disable the adapters in a model.

Fusion Components

Components for building models that are a fusion of two+ pre-trained models.

`model_fusion.DeepFusionModel`	DeepFusion is a type of fused model architecture where a pretrained encoder is combined with a pretrained decoder (LLM) in the internal decoder layers.
`model_fusion.FusionLayer`	Fusion layer as introduced in Flamingo: a Visual Language Model for Few-Shot Learning.
`model_fusion.FusionEmbedding`	Fusion embedding supports training additional special tokens while keeping the original embedding frozen.
`model_fusion.register_fusion_module`	Add the method fusion_params to an nn.Module that marks all of the Modules parameters as fusion params.
`model_fusion.get_fusion_params`	Return the subset of parameters from a model that correspond to fused modules.

Module Utilities

These are utilities that are common to and can be used by all modules.

`common_utils.reparametrize_as_dtype_state_dict_post_hook`	A state_dict hook that replaces NF4 tensors with their restored higher-precision weight and optionally offloads the restored weight to CPU.
`common_utils.local_kv_cache`	This context manager temporarily enables KV-cacheing on a given model, which does not already have KV-caches setup.
`common_utils.disable_kv_cache`	This context manager temporarily disables KV-cacheing on a given model, which must already already have KV-caches setup.
`common_utils.delete_kv_caches`	Deletes KV caches from all attention layers in a model, and also ensures `cache_enabled` is set to False.

Vision Transforms

Functions used for preprocessing images.

`transforms.Transform`	Loose interface for all data and model transforms.
`transforms.VisionCrossAttentionMask`	Computes the cross-attention mask for text + image inputs.

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources