Shortcuts

torchtune.modules

Modeling Components and Building Blocks

CausalSelfAttention

Multi-headed grouped query self-attention (GQA) layer introduced in https://arxiv.org/abs/2305.13245v1.

FeedForward

This class implements the feed-forward network derived from Llama2.

KVCache

Standalone nn.Module containing a kv-cache to cache past key and values during inference.

get_cosine_schedule_with_warmup

Create a learning rate schedule that linearly increases the learning rate from 0.0 to lr over num_warmup_steps, then decreases to 0.0 on a cosine schedule over the remaining num_training_steps-num_warmup_steps (assuming num_cycles = 0.5).

RotaryPositionalEmbeddings

This class implements Rotary Positional Embeddings (RoPE) proposed in https://arxiv.org/abs/2104.09864.

RMSNorm

Implements Root Mean Square Normalization introduced in https://arxiv.org/abs/1910.07467.

TransformerDecoderLayer

Transformer layer derived from the Llama2 model.

TransformerDecoder

Transformer Decoder derived from the Llama2 architecture.

Base Tokenizers

Base tokenizers are tokenizer models that perform the direct encoding of text into token IDs and decoding of token IDs into text. These are typically byte pair encodings that underlie the model specific tokenizers.

tokenizers.SentencePieceBaseTokenizer

A light-weight wrapper around SentencePieceProcessor that additionally handles trimming leading whitespaces.

tokenizers.TikTokenBaseTokenizer

A lightweight wrapper around tiktoken Encoding.

Tokenizer Utilities

These are helper methods that can be used by any tokenizer.

tokenizers.tokenize_messages_no_special_tokens

Tokenize a list of messages one at a time then concatenate them, returning a list of tokens and a list of masks.

tokenizers.parse_hf_tokenizer_json

Parse the tokenizer.json file from a Hugging Face model to extract the special token str to id mapping.

PEFT Components

peft.LoRALinear

LoRA linear layer as introduced in LoRA: Low-Rank Adaptation of Large Language Models.

peft.AdapterModule

Interface for an nn.Module containing adapter weights.

peft.get_adapter_params

Return the subset of parameters from a model that correspond to an adapter.

peft.set_trainable_params

Set trainable parameters for an nn.Module based on a state dict of adapter parameters.

peft.validate_missing_and_unexpected_for_lora

A more memory-efficient way to validate that LoRA state dict loading was done properly.

peft.validate_state_dict_for_lora

Validate that the state dict keys for a LoRA model are as expected.

peft.disable_adapter

Temporarily disable the adapters in a neural network model.

Module Utilities

These are utilities that are common to and can be used by all modules.

common_utils.reparametrize_as_dtype_state_dict_post_hook

A state_dict hook that replaces NF4 tensors with their restored higher-precision weight and optionally offloads the restored weight to CPU.

Loss

loss.DPOLoss

Direct Preference Optimization (DPO) Loss module: https://arxiv.org/abs/2305.18290.

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources