RNNTBeamSearch

class torchaudio.models.RNNTBeamSearch(model: RNNT, blank: int, temperature: float = 1.0, hypo_sort_key: Optional[Callable[[Tuple[List[int], Tensor, List[List[Tensor]], float]], float]] = None, step_max_tokens: int = 100)[source]

Beam search decoder for RNN-T model.

Methods

RNNTBeamSearch.forward(input: Tensor, length: Tensor, beam_width: int) → List[Tuple[List[int], Tensor, List[List[Tensor]], float]][source]

Performs beam search for the given input sequence.

T: number of frames; D: feature dimension of each frame.

Parameters:

input (torch.Tensor) – sequence of input frames, with shape (T, D) or (1, T, D).
length (torch.Tensor) – number of valid frames in input sequence, with shape () or (1,).
beam_width (int) – beam size to use during search.

Returns:

top-beam_width hypotheses found by beam search.

Return type:

List[Hypothesis]

RNNTBeamSearch.infer(input: Tensor, length: Tensor, beam_width: int, state: Optional[List[List[Tensor]]] = None, hypothesis: Optional[Tuple[List[int], Tensor, List[List[Tensor]], float]] = None) → Tuple[List[Tuple[List[int], Tensor, List[List[Tensor]], float]], List[List[Tensor]]][source]

Performs beam search for the given input sequence in streaming mode.

T: number of frames; D: feature dimension of each frame.

Parameters:

input (torch.Tensor) – sequence of input frames, with shape (T, D) or (1, T, D).
length (torch.Tensor) – number of valid frames in input sequence, with shape () or (1,).
beam_width (int) – beam size to use during search.
state (List[List[torch.Tensor]] or None, optional) – list of lists of tensors representing transcription network internal state generated in preceding invocation. (Default: None)
hypothesis (Hypothesis or None) – hypothesis from preceding invocation to seed search with. (Default: None)

Returns:

List[Hypothesis]: top-beam_width hypotheses found by beam search.
List[List[torch.Tensor]]: list of lists of tensors representing transcription network internal state generated in current invocation.

Return type:

(List[Hypothesis], List[List[torch.Tensor]])

torchaudio.models.Hypothesis

Hypothesis generated by RNN-T beam search decoder, represented as tuple of (tokens, prediction network output, prediction network state, score).