torchtune.modules¶
Modeling Components and Building Blocks¶
Multi-headed grouped query self-attention (GQA) layer introduced in https://arxiv.org/abs/2305.13245v1. |
|
This class implements the feed-forward network derived from Llama2. |
|
Standalone |
|
Create a learning rate schedule that linearly increases the learning rate from 0.0 to lr over num_warmup_steps, then decreases to 0.0 on a cosine schedule over the remaining num_training_steps-num_warmup_steps (assuming num_cycles = 0.5). |
|
This class implements Rotary Positional Embeddings (RoPE) proposed in https://arxiv.org/abs/2104.09864. |
|
Implements Root Mean Square Normalization introduced in https://arxiv.org/abs/1910.07467. |
|
Transformer layer derived from the Llama2 model. |
|
Transformer Decoder derived from the Llama2 architecture. |
Base Tokenizers¶
Base tokenizers are tokenizer models that perform the direct encoding of text into token IDs and decoding of token IDs into text. These are typically byte pair encodings that underlie the model specific tokenizers.
A light-weight wrapper around SentencePieceProcessor that additionally handles trimming leading whitespaces. |
|
A lightweight wrapper around tiktoken Encoding. |
Tokenizer Utilities¶
These are helper methods that can be used by any tokenizer.
Tokenize a list of messages one at a time then concatenate them, returning a list of tokens and a list of masks. |
|
Parse the |
PEFT Components¶
LoRA linear layer as introduced in LoRA: Low-Rank Adaptation of Large Language Models. |
|
Interface for an nn.Module containing adapter weights. |
|
Return the subset of parameters from a model that correspond to an adapter. |
|
Set trainable parameters for an nn.Module based on a state dict of adapter parameters. |
|
A more memory-efficient way to validate that LoRA state dict loading was done properly. |
|
Validate that the state dict keys for a LoRA model are as expected. |
|
Temporarily disable the adapters in a neural network model. |
Module Utilities¶
These are utilities that are common to and can be used by all modules.
A state_dict hook that replaces NF4 tensors with their restored higher-precision weight and optionally offloads the restored weight to CPU. |
Loss¶
Direct Preference Optimization (DPO) Loss module: https://arxiv.org/abs/2305.18290. |