Shortcuts

RotaryPositionalEmbeddings

class torchtune.modules.RotaryPositionalEmbeddings(dim: int, max_seq_len: int = 4096, base: int = 10000)[source]

This class implements Rotary Positional Embeddings (RoPE) proposed in https://arxiv.org/abs/2104.09864.

Reference implementation (used for correctness verfication) can be found here: https://github.com/meta-llama/llama/blob/main/llama/model.py#L80

In this implementation we cache the embeddings for each position upto max_seq_len by computing this during init.

Parameters:
  • dim (int) – Embedding dimension. This is usually set to the dim of each head in the attention module computed as ``embed_dim // num_heads``

  • max_seq_len (int) – Maximum expected sequence length for the model, if exceeded the cached freqs will be recomputed

  • base (int) – The base for the geometric progression used to compute the rotation angles

forward(x: Tensor, *, input_pos: Optional[Tensor] = None) Tensor[source]
Parameters:
  • x (Tensor) – input tensor with shape [b, s, n_h, h_d]

  • input_pos (Optional[Tensor]) – Optional tensor which contains the position ids of each token. During training, this is used to indicate the positions of each token relative to its sample when packed, shape [b, s]. During inference, this indicates the position of the current token. If none, assume the index of the token is its position id. Default is None.

Returns:

output tensor with RoPE applied

Return type:

Tensor

Notation used for tensor shapes:
  • b: batch size

  • s: sequence length

  • n_h: num heads

  • h_d: head dim

TODO: The implementation below can be made more efficient for inference.

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources