Shortcuts

gemma2

torchtune.models.gemma2.gemma2(vocab_size: int, num_layers: int, num_heads: int, head_dim: int, num_kv_heads: int, embed_dim: int, intermediate_dim: int, max_seq_len: int, attn_dropout: float = 0.0, norm_eps: float = 1e-06, rope_base: int = 10000, hidden_capping_value: float = 50.0, final_capping_value: float = 30.0, sliding_window_size: int = 4096, query_pre_attn_scalar: Optional[int] = None) TransformerDecoder[source]

Build the decoder associated with the gemma2 model. This includes: - Token embeddings - num_layers number of TransformerSelfAttentionLayer blocks - RMS Norm layer applied to the output of the transformer - Final projection into token space

Parameters:
  • vocab_size (int) – number of tokens in vocabulary.

  • num_layers (int) – number of layers in the transformer decoder.

  • num_heads (int) – number of query heads. For MHA this is also the number of heads for key and value

  • head_dim (int) – dimension of head

  • num_kv_heads (int) – number of key and value heads.

  • embed_dim (int) – embedding dimension for self-attention

  • intermediate_dim (int) – intermediate dimension for MLP

  • max_seq_len (int) – maximum sequence length the model will be run with,

  • attn_dropout (float) – dropout value passed onto scaled_dot_product_attention. Default: 0.0

  • norm_eps (float) – epsilon in RMS norms Default: 1e-6

  • rope_base (int) – base for the rotary positional embeddings. Default: 10_000

Returns:

Instantiation of gemma model.

Return type:

TransformerDecoder

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources