torchtune.modules¶
Modeling Components and Building Blocks¶
Multi-headed grouped query self-attention (GQA) layer introduced in https://arxiv.org/pdf/2305.13245v1.pdf. |
|
This class implements the feed-forward network derived from Llama2. |
|
Standalone nn.Module containing a kv-cache to cache past key and values during inference. |
|
Create a learning rate schedule that linearly increases the learning rate from 0.0 to lr over num_warmup_steps, then decreases to 0.0 on a cosine schedule over the remaining num_training_steps-num_warmup_steps (assuming num_cycles = 0.5). |
|
This class implements Rotary Positional Embeddings (RoPE) proposed in https://arxiv.org/abs/2104.09864. |
|
Implements Root Mean Square Normalization introduced in https://arxiv.org/pdf/1910.07467.pdf. |
|
Transformer layer derived from the Llama2 model. |
|
Transformer Decoder derived from the Llama2 architecture. |
Tokenizers¶
A wrapper around SentencePieceProcessor. |
|
A wrapper around tiktoken Encoding. |
PEFT Components¶
LoRA linear layer as introduced in LoRA: Low-Rank Adaptation of Large Language Models. |
|
Interface for an nn.Module containing adapter weights. |
|
Return the subset of parameters from a model that correspond to an adapter. |
|
Set trainable parameters for an nn.Module based on a state dict of adapter parameters. |
Module Utilities¶
These are utilities that are common to and can be used by all modules.
A state_dict hook that replaces NF4 tensors with their restored higher-precision weight and optionally offloads the restored weight to CPU. |