- class torch.nn.GaussianNLLLoss(*, full=False, eps=1e-06, reduction='mean')¶
Gaussian negative log likelihood loss.
The targets are treated as samples from Gaussian distributions with expectations and variances predicted by the neural network. For a
targettensor modelled as having Gaussian distribution with a tensor of expectations
inputand a tensor of positive variances
varthe loss is:
epsis used for stability. By default, the constant term of the loss function is omitted unless
varis not the same size as
input(due to a homoscedastic assumption), it must either have a final dimension of 1 or have one fewer dimension (with all other sizes being the same) for correct broadcasting.
full (bool, optional) – include the constant term in the loss calculation. Default:
eps (float, optional) – value used to clamp
var(see note below), for stability. Default: 1e-6.
reduction (str, optional) – specifies the reduction to apply to the output:
'none': no reduction will be applied,
'mean': the output is the average of all batch member losses,
'sum': the output is the sum of all batch member losses. Default:
Input: or where means any number of additional dimensions
Target: or , same shape as the input, or same shape as the input but with one dimension equal to 1 (to allow for broadcasting)
Var: or , same shape as the input, or same shape as the input but with one dimension equal to 1, or same shape as the input but with one fewer dimension (to allow for broadcasting)
Output: scalar if
'none', then , same shape as the input
>>> loss = nn.GaussianNLLLoss() >>> input = torch.randn(5, 2, requires_grad=True) >>> target = torch.randn(5, 2) >>> var = torch.ones(5, 2, requires_grad=True) # heteroscedastic >>> output = loss(input, target, var) >>> output.backward()
>>> loss = nn.GaussianNLLLoss() >>> input = torch.randn(5, 2, requires_grad=True) >>> target = torch.randn(5, 2) >>> var = torch.ones(5, 1, requires_grad=True) # homoscedastic >>> output = loss(input, target, var) >>> output.backward()
The clamping of
varis ignored with respect to autograd, and so the gradients are unaffected by it.
Nix, D. A. and Weigend, A. S., “Estimating the mean and variance of the target probability distribution”, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN’94), Orlando, FL, USA, 1994, pp. 55-60 vol.1, doi: 10.1109/ICNN.1994.374138.