

class torchtune.modules.RotaryPositionalEmbeddings(dim: int, max_seq_len: int = 4096, base: int = 10000)[source]

This class implements Rotary Positional Embeddings (RoPE) proposed in

Reference implementation (used for correctness verfication) can be found here:

In this implementation we cache the embeddings for each position upto max_seq_len by computing this during init.

  • dim (int) – Embedding dimension. This is usually set to the dim of each head in the attention module computed as ``embed_dim // num_heads``

  • max_seq_len (int) – Maximum expected sequence length for the model, if exceeded the cached freqs will be recomputed

  • base (int) – The base for the geometric progression used to compute the rotation angles

forward(x: Tensor, *, input_pos: Optional[Tensor] = None) Tensor[source]
  • x (Tensor) – input tensor with shape [b, s, n_h, h_d]

  • input_pos (Optional[Tensor]) – Optional tensor which contains the position ids of each token. During training, this is used to indicate the positions of each token relative to its sample when packed, shape [b, s]. During inference, this indicates the position of the current token. If none, assume the index of the token is its position id. Default is None.


output tensor with RoPE applied

Return type:


Notation used for tensor shapes:
  • b: batch size

  • s: sequence length

  • n_h: num heads

  • h_d: head dim

TODO: The implementation below can be made more efficient for inference.


