Shortcuts

RNNTBeamSearch

class torchaudio.models.RNNTBeamSearch(model: RNNT, blank: int, temperature: float = 1.0, hypo_sort_key: Optional[Callable[[Tuple[List[int], Tensor, List[List[Tensor]], float]], float]] = None, step_max_tokens: int = 100)[source]

Beam search decoder for RNN-T model.

See also

Parameters:
  • model (RNNT) – RNN-T model to use.

  • blank (int) – index of blank token in vocabulary.

  • temperature (float, optional) – temperature to apply to joint network output. Larger values yield more uniform samples. (Default: 1.0)

  • hypo_sort_key (Callable[[Hypothesis], float] or None, optional) – callable that computes a score for a given hypothesis to rank hypotheses by. If None, defaults to callable that returns hypothesis score normalized by token sequence length. (Default: None)

  • step_max_tokens (int, optional) – maximum number of tokens to emit per input time step. (Default: 100)

Tutorials using RNNTBeamSearch:
Online ASR with Emformer RNN-T

Online ASR with Emformer RNN-T

Online ASR with Emformer RNN-T

Methods

forward

RNNTBeamSearch.forward(input: Tensor, length: Tensor, beam_width: int) List[Tuple[List[int], Tensor, List[List[Tensor]], float]][source]

Performs beam search for the given input sequence.

T: number of frames; D: feature dimension of each frame.

Parameters:
  • input (torch.Tensor) – sequence of input frames, with shape (T, D) or (1, T, D).

  • length (torch.Tensor) – number of valid frames in input sequence, with shape () or (1,).

  • beam_width (int) – beam size to use during search.

Returns:

top-beam_width hypotheses found by beam search.

Return type:

List[Hypothesis]

infer

RNNTBeamSearch.infer(input: Tensor, length: Tensor, beam_width: int, state: Optional[List[List[Tensor]]] = None, hypothesis: Optional[List[Tuple[List[int], Tensor, List[List[Tensor]], float]]] = None) Tuple[List[Tuple[List[int], Tensor, List[List[Tensor]], float]], List[List[Tensor]]][source]

Performs beam search for the given input sequence in streaming mode.

T: number of frames; D: feature dimension of each frame.

Parameters:
  • input (torch.Tensor) – sequence of input frames, with shape (T, D) or (1, T, D).

  • length (torch.Tensor) – number of valid frames in input sequence, with shape () or (1,).

  • beam_width (int) – beam size to use during search.

  • state (List[List[torch.Tensor]] or None, optional) – list of lists of tensors representing transcription network internal state generated in preceding invocation. (Default: None)

  • hypothesis (List[Hypothesis] or None) – hypotheses from preceding invocation to seed search with. (Default: None)

Returns:

List[Hypothesis]

top-beam_width hypotheses found by beam search.

List[List[torch.Tensor]]

list of lists of tensors representing transcription network internal state generated in current invocation.

Return type:

(List[Hypothesis], List[List[torch.Tensor]])

Support Structures

Hypothesis

torchaudio.models.Hypothesis

Hypothesis generated by RNN-T beam search decoder, represented as tuple of (tokens, prediction network output, prediction network state, score).

alias of Tuple[List[int], Tensor, List[List[Tensor]], float]

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources