RotaryPositionalEmbeddings¶
- class torchtune.modules.RotaryPositionalEmbeddings(dim: int, max_seq_len: int = 4096, base: int = 10000)[source]¶
This class implements Rotary Positional Embeddings (RoPE) proposed in https://arxiv.org/abs/2104.09864.
Reference implementation (used for correctness verfication) can be found here: https://github.com/facebookresearch/llama/blob/main/llama/model.py#L450
In this implementation we cache the embeddings for each position upto
max_seq_len
by computing this during init.- Parameters:
dim (int) – Embedding dimension. This is usually set to the dim of each head in the attention module computed as
``embed_dim
//num_heads``
max_seq_len (int) – Maximum expected sequence length for the model, if exceeded the cached freqs will be recomputed
base (int) – The base for the geometric progression used to compute the rotation angles
- forward(x: Tensor, input_pos: Optional[Tensor] = None) Tensor [source]¶
- Parameters:
x (Tensor) – input tensor with shape [bsz, seq_len, num_heads, head_dim]
input_pos (Optional[Tensor]) – Optional tensor which contains the position of the current token. This is only used during inference. Default is None
- Returns:
output tensor with RoPE applied
- Return type:
Tensor
- Notation used for tensor shapes:
b: batch size
s: sequence length
n_h: num heads
h_d: head dim
TODO: The implementation below can be made more efficient for inference.