ForwardKLLoss

class torchtune.modules.loss.ForwardKLLoss(ignore_index: int = - 100)[source]

Parameters:: ignore_index (int) – Specifies a target value that is ignored and does not contribute to the input gradient. The loss is divided over non-ignored targets. Default: -100.

forward(student_logits: Tensor, teacher_logits: Tensor, labels: Tensor, normalize: bool = True) → Tensor[source]

Parameters:

student_logits (torch.Tensor) – logits from student model of shape (batch_size*num_tokens, vocab_size).
teacher_logits (torch.Tensor) – logits from teacher model of shape (batch_size*num_tokens, vocab_size).
labels (torch.Tensor) – Ground truth labels of shape (batch_size, vocab_size).
normalize (bool) – Whether to normalize the loss by the number of unmasked elements.

Returns:

KL divergence loss of shape (1,).

Return type:

torch.Tensor

Docs