KLDivLoss(size_average=None, reduce=None, reduction='mean', log_target=False)¶
The Kullback-Leibler divergence loss.
For tensors of the same shape , where is the
inputand is the
target, we define the pointwise KL-divergence as
To avoid underflow issues when computing this quantity, this loss expects the argument
inputin the log-space. The argument
targetmay also be provided in the log-space if
To summarise, this function is roughly equivalent to computing
if not log_target: # default loss_pointwise = target * (target.log() - input) else: loss_pointwise = target.exp() * (target - input)
and then reducing this result depending on the argument
if reduction == "mean": # default loss = loss_pointwise.mean() elif reduction == "batchmean": # mathematically correct loss = loss_pointwise.sum() / input.size(0) elif reduction == "sum": loss = loss_pointwise.sum() else: # reduction == "none" loss = loss_pointwise
As all the other losses in PyTorch, this function expects the first argument,
input, to be the output of the model (e.g. the neural network) and the second,
target, to be the observations in the dataset. This differs from the standard mathematical notation where denotes the distribution of the observations and denotes the model.
reduction= “mean” doesn’t return the true KL divergence value, please use
reduction= “batchmean” which aligns with the mathematical definition. In a future release, “mean” will be changed to be the same as “batchmean”.
size_average (bool, optional) – Deprecated (see
reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field
size_averageis set to False, the losses are instead summed for each minibatch. Ignored when
reduceis False. Default: True
reduce (bool, optional) – Deprecated (see
reduction). By default, the losses are averaged or summed over observations for each minibatch depending on
reduceis False, returns a loss per batch element instead and ignores
size_average. Default: True
reduction (string, optional) – Specifies the reduction to apply to the output. Default: “mean”
log_target (bool, optional) – Specifies whether target is the log space. Default: False
Input: , where means any number of dimensions.
Target: , same shape as the input.
Output: scalar by default. If
reductionis ‘none’, then , same shape as the input.
>>> kl_loss = nn.KLDivLoss(reduction="batchmean") >>> # input should be a distribution in the log space >>> input = F.log_softmax(torch.randn(3, 5, requires_grad=True)) >>> # Sample a batch of distributions. Usually this would come from the dataset >>> target = F.softmax(torch.rand(3, 5)) >>> output = kl_loss(input, target) >>> log_target = F.log_softmax(torch.rand(3, 5)) >>> output = kl_loss(input, log_target, log_target=True)