DoRALinear¶
- class torchtune.modules.peft.DoRALinear(in_dim: int, out_dim: int, rank: int, alpha: float, dropout: float = 0.0, use_bias: bool = False, quantize_base: bool = False, **quantization_kwargs)[source]¶
DoRA linear layer as introduced in DoRA: Weight-Decomposed Low-Rank Adaptation of Large Language Models.
DoRA (Weight-Decomposed Low-Rank Adaptation) fine-tunes a layer by decomposing the pre-trained weights into two components: magnitude and direction. The magnitude component is a learnable scalar vector that scales each output channel, while the direction component, modified via LoRA, adjusts the orientation of weights. By scaling the LoRA update component \(BAx\) with the magnitude vector, DoRA allows the model to apply distinct scaling adjustments across different output dimensions.
- Parameters:
in_dim (int) – input dimension
out_dim (int) – output dimension
rank (int) – rank of the low-rank approximation
alpha (float) – scaling factor for the low-rank approximation
dropout (float) – dropout probability. Default: 0.0
use_bias (bool) – whether to include bias in the original linear layer. Default: False
quantize_base (bool) – Whether to quantize base linear weight or not. Default: False
**quantization_kwargs – Keyword arguments to pass to to_nf4 when quantizing the base linear weight. Examples of valid arguments are block_size and scaler_block_size, which control the granularity of weight quantization and scaler quantization respectively. This is only used if quantize_base is True. Default None
- Raises:
ValueError – If
quantize_base
is False, but quantization kwargs are provided.
- adapter_params() List[str] [source]¶
Return a list of strings corresponding to the names of the
nn.Parameter
s in the model coming from the adapter.For DoRA this means lora_a.weight, lora_b.weight, and magnitude.
- forward(x: Tensor) Tensor [source]¶
- Parameters:
x (torch.Tensor) – input tensor with shape
(..., in_dim)
- Returns:
output tensor with shape
(..., out_dim)
- Return type:
Tensor
- initialize_dora_magnitude()[source]¶
DoRA initializes the magnitude vector such that its outputs are initially identical to standard LoRA’s outputs.
This must be called after loading/initializing base model and LoRA params.
- Raises:
RuntimeError – If base or LoRA parameters are still on meta device.
- to_empty(*, device: Optional[Union[str, device, int]], recurse: bool = True)[source]¶
Move the parameters and buffers to the specified device without copying storage.
- Parameters:
device (
torch.device
) – The desired device of the parameters and buffers in this module.recurse (bool) – Whether parameters and buffers of submodules should be recursively moved to the specified device.
- Returns:
self
- Return type:
Module