torchaudio.prototype.rnnt_loss¶
Note
The RNN transducer loss is a prototype feature, see here to learn more about the nomenclature. It is only available within the nightlies, and also needs to be imported explicitly using: from torchaudio.prototype.rnnt_loss import rnnt_loss, RNNTLoss
.
rnnt_loss¶

torchaudio.prototype.rnnt_loss.
rnnt_loss
(logits: torch.Tensor, targets: torch.Tensor, logit_lengths: torch.Tensor, target_lengths: torch.Tensor, blank: int = 1, clamp: float = 1, fused_log_softmax: bool = True, reuse_logits_for_grads: bool = True, reduction: str = 'mean')[source]¶ Compute the RNN Transducer loss from Sequence Transduction with Recurrent Neural Networks [1].
The RNN Transducer loss extends the CTC loss by defining a distribution over output sequences of all lengths, and by jointly modelling both inputoutput and outputoutput dependencies.
 Parameters
logits (Tensor) – Tensor of dimension (batch, time, target, class) containing output from joiner
targets (Tensor) – Tensor of dimension (batch, max target length) containing targets with zero padded
logit_lengths (Tensor) – Tensor of dimension (batch) containing lengths of each sequence from encoder
target_lengths (Tensor) – Tensor of dimension (batch) containing lengths of targets for each sequence
blank (int, opt) – blank label (Default:
1
)clamp (float) – clamp for gradients (Default:
1
)fused_log_softmax (bool) – set to False if calling log_softmax outside loss (Default:
True
)reuse_logits_for_grads (bool) – whether to save memory by reusing logits memory for grads (Default:
True
)reduction (string, optional) – Specifies the reduction to apply to the output:
'none'
'mean'
'sum'
. (Default:'mean'
)
 Returns
 Loss with the reduction option applied. If
reduction
is'none'
, then size (batch), otherwise scalar.
 Return type
Tensor
RNNTLoss¶

class
torchaudio.prototype.rnnt_loss.
RNNTLoss
(blank: int = 1, clamp: float = 1.0, fused_log_softmax: bool = True, reuse_logits_for_grads: bool = True, reduction: str = 'mean')[source]¶ Compute the RNN Transducer loss from Sequence Transduction with Recurrent Neural Networks [1].
The RNN Transducer loss extends the CTC loss by defining a distribution over output sequences of all lengths, and by jointly modelling both inputoutput and outputoutput dependencies.
 Parameters
blank (int, opt) – blank label (Default:
1
)clamp (float) – clamp for gradients (Default:
1
)fused_log_softmax (bool) – set to False if calling log_softmax outside loss (Default:
True
)reuse_logits_for_grads (bool) – whether to save memory by reusing logits memory for grads (Default:
True
)reduction (string, optional) – Specifies the reduction to apply to the output:
'none'
'mean'
'sum'
. (Default:'mean'
)

forward
(logits, targets, logit_lengths, target_lengths)[source]¶  Parameters
logits (Tensor) – Tensor of dimension (batch, time, target, class) containing output from joiner
targets (Tensor) – Tensor of dimension (batch, max target length) containing targets with zero padded
logit_lengths (Tensor) – Tensor of dimension (batch) containing lengths of each sequence from encoder
target_lengths (Tensor) – Tensor of dimension (batch) containing lengths of targets for each sequence
 Returns
 Loss with the reduction option applied. If
reduction
is'none'
, then size (batch), otherwise scalar.
 Return type
Tensor
References¶
 1(1,2)
Alex Graves. Sequence transduction with recurrent neural networks. 2012. arXiv:1211.3711.