DoRALinear

class torchtune.modules.peft.DoRALinear(in_dim: int, out_dim: int, rank: int, alpha: float, dropout: float = 0.0, use_bias: bool = False, quantize_base: bool = False, **quantization_kwargs)[source]

DoRA linear layer as introduced in DoRA: Weight-Decomposed Low-Rank Adaptation of Large Language Models.

DoRA (Weight-Decomposed Low-Rank Adaptation) fine-tunes a layer by decomposing the pre-trained weights into two components: magnitude and direction. The magnitude component is a learnable scalar vector that scales each output channel, while the direction component, modified via LoRA, adjusts the orientation of weights. By scaling the LoRA update component $B A x$ with the magnitude vector, DoRA allows the model to apply distinct scaling adjustments across different output dimensions.

Parameters:

in_dim (int) – input dimension
out_dim (int) – output dimension
rank (int) – rank of the low-rank approximation
alpha (float) – scaling factor for the low-rank approximation
dropout (float) – dropout probability. Default: 0.0
use_bias (bool) – whether to include bias in the original linear layer. Default: False
quantize_base (bool) – Whether to quantize base linear weight or not. Default: False
**quantization_kwargs – Keyword arguments to pass to to_nf4 when quantizing the base linear weight. Examples of valid arguments are block_size and scaler_block_size, which control the granularity of weight quantization and scaler quantization respectively. This is only used if quantize_base is True. Default None

Raises:

ValueError – If quantize_base is False, but quantization kwargs are provided.

adapter_params() → List[str][source]

Return a list of strings corresponding to the names of the nn.Parameter s in the model coming from the adapter.

For DoRA this means lora_a.weight, lora_b.weight, and magnitude.

forward(x: Tensor) → Tensor[source]

Parameters:: x (torch.Tensor) – input tensor with shape (..., in_dim)
Returns:: output tensor with shape (..., out_dim)
Return type:: Tensor

initialize_dora_magnitude()[source]

DoRA initializes the magnitude vector such that its outputs are initially identical to standard LoRA’s outputs.

This must be called after loading/initializing base model and LoRA params.

Raises:: RuntimeError – If base or LoRA parameters are still on meta device.

to_empty(*, device: Optional[Union[str, device, int]], recurse: bool = True)[source]

Move the parameters and buffers to the specified device without copying storage.

Parameters:

device (torch.device) – The desired device of the parameters and buffers in this module.
recurse (bool) – Whether parameters and buffers of submodules should be recursively moved to the specified device.

Returns:

self

Return type:

Module

DoRALinear

Docs

Tutorials

Resources