• Docs >
  • torchaudio.prototype.rnnt_loss
Shortcuts

torchaudio.prototype.rnnt_loss

Note

The RNN transducer loss is a prototype feature, see here to learn more about the nomenclature. It is only available within the nightlies, and also needs to be imported explicitly using: from torchaudio.prototype.rnnt_loss import rnnt_loss, RNNTLoss.

rnnt_loss

torchaudio.prototype.rnnt_loss.rnnt_loss(logits: torch.Tensor, targets: torch.Tensor, logit_lengths: torch.Tensor, target_lengths: torch.Tensor, blank: int = -1, clamp: float = -1, fused_log_softmax: bool = True, reuse_logits_for_grads: bool = True, reduction: str = 'mean')[source]

Compute the RNN Transducer loss from Sequence Transduction with Recurrent Neural Networks [1].

The RNN Transducer loss extends the CTC loss by defining a distribution over output sequences of all lengths, and by jointly modelling both input-output and output-output dependencies.

Parameters
  • logits (Tensor) – Tensor of dimension (batch, time, target, class) containing output from joiner

  • targets (Tensor) – Tensor of dimension (batch, max target length) containing targets with zero padded

  • logit_lengths (Tensor) – Tensor of dimension (batch) containing lengths of each sequence from encoder

  • target_lengths (Tensor) – Tensor of dimension (batch) containing lengths of targets for each sequence

  • blank (int, opt) – blank label (Default: -1)

  • clamp (float) – clamp for gradients (Default: -1)

  • fused_log_softmax (bool) – set to False if calling log_softmax outside loss (Default: True)

  • reuse_logits_for_grads (bool) – whether to save memory by reusing logits memory for grads (Default: True)

  • reduction (string, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. (Default: 'mean')

Returns

Loss with the reduction option applied. If reduction is 'none', then size (batch),

otherwise scalar.

Return type

Tensor

RNNTLoss

class torchaudio.prototype.rnnt_loss.RNNTLoss(blank: int = -1, clamp: float = -1.0, fused_log_softmax: bool = True, reuse_logits_for_grads: bool = True, reduction: str = 'mean')[source]

Compute the RNN Transducer loss from Sequence Transduction with Recurrent Neural Networks [1].

The RNN Transducer loss extends the CTC loss by defining a distribution over output sequences of all lengths, and by jointly modelling both input-output and output-output dependencies.

Parameters
  • blank (int, opt) – blank label (Default: -1)

  • clamp (float) – clamp for gradients (Default: -1)

  • fused_log_softmax (bool) – set to False if calling log_softmax outside loss (Default: True)

  • reuse_logits_for_grads (bool) – whether to save memory by reusing logits memory for grads (Default: True)

  • reduction (string, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. (Default: 'mean')

forward(logits, targets, logit_lengths, target_lengths)[source]
Parameters
  • logits (Tensor) – Tensor of dimension (batch, time, target, class) containing output from joiner

  • targets (Tensor) – Tensor of dimension (batch, max target length) containing targets with zero padded

  • logit_lengths (Tensor) – Tensor of dimension (batch) containing lengths of each sequence from encoder

  • target_lengths (Tensor) – Tensor of dimension (batch) containing lengths of targets for each sequence

Returns

Loss with the reduction option applied. If reduction is 'none', then size (batch),

otherwise scalar.

Return type

Tensor

References

1(1,2)

Alex Graves. Sequence transduction with recurrent neural networks. 2012. arXiv:1211.3711.

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources