Shortcuts

CTCDecoder

class torchaudio.models.decoder.CTCDecoder[source]

CTC beam search decoder from Flashlight [Kahn et al., 2022].

This feature supports the following devices: CPU

Note

To build the decoder, please use the factory function ctc_decoder().

Tutorials using CTCDecoder:
ASR Inference with CTC Decoder

ASR Inference with CTC Decoder

ASR Inference with CTC Decoder

Methods

__call__

CTCDecoder.__call__(emissions: FloatTensor, lengths: Optional[Tensor] = None) List[List[CTCHypothesis]][source]

Performs batched offline decoding.

Note

This method performs offline decoding in one go. To perform incremental decoding, please refer to decode_step().

Parameters:
  • emissions (torch.FloatTensor) – CPU tensor of shape (batch, frame, num_tokens) storing sequences of probability distribution over labels; output of acoustic model.

  • lengths (Tensor or None, optional) – CPU tensor of shape (batch, ) storing the valid length of in time axis of the output Tensor in each batch.

Returns:

List of sorted best hypotheses for each audio sequence in the batch.

Return type:

List[List[CTCHypothesis]]

decode_begin

CTCDecoder.decode_begin()[source]

Initialize the internal state of the decoder.

See decode_step() for the usage.

Note

This method is required only when performing online decoding. It is not necessary when performing batch decoding with __call__().

decode_end

CTCDecoder.decode_end()[source]

Finalize the internal state of the decoder.

See decode_step() for the usage.

Note

This method is required only when performing online decoding. It is not necessary when performing batch decoding with __call__().

decode_step

CTCDecoder.decode_step(emissions: FloatTensor)[source]

Perform incremental decoding on top of the curent internal state.

Note

This method is required only when performing online decoding. It is not necessary when performing batch decoding with __call__().

Parameters:

emissions (torch.FloatTensor) – CPU tensor of shape (frame, num_tokens) storing sequences of probability distribution over labels; output of acoustic model.

Example

>>> decoder = torchaudio.models.decoder.ctc_decoder(...)
>>> decoder.decode_begin()
>>> decoder.decode_step(emission1)
>>> decoder.decode_step(emission2)
>>> decoder.decode_end()
>>> result = decoder.get_final_hypothesis()

get_final_hypothesis

CTCDecoder.get_final_hypothesis() List[CTCHypothesis][source]

Get the final hypothesis

Returns:

List of sorted best hypotheses.

Return type:

List[CTCHypothesis]

Note

This method is required only when performing online decoding. It is not necessary when performing batch decoding with __call__().

idxs_to_tokens

CTCDecoder.idxs_to_tokens(idxs: LongTensor) List[source]

Map raw token IDs into corresponding tokens

Parameters:

idxs (LongTensor) – raw token IDs generated from decoder

Returns:

tokens corresponding to the input IDs

Return type:

List

Support Structures

CTCHypothesis

class torchaudio.models.decoder.CTCHypothesis(tokens: torch.LongTensor, words: List[str], score: float, timesteps: torch.IntTensor)[source]

Represents hypothesis generated by CTC beam search decoder CTCDecoder.

Tutorials using CTCHypothesis:
ASR Inference with CTC Decoder

ASR Inference with CTC Decoder

ASR Inference with CTC Decoder
tokens: LongTensor

Predicted sequence of token IDs. Shape (L, ), where L is the length of the output sequence

words: List[str]

List of predicted words.

Note

This attribute is only applicable if a lexicon is provided to the decoder. If decoding without a lexicon, it will be blank. Please refer to tokens and idxs_to_tokens() instead.

score: float

Score corresponding to hypothesis

timesteps: IntTensor

Timesteps corresponding to the tokens. Shape (L, ), where L is the length of the output sequence

CTCDecoderLM

class torchaudio.models.decoder.CTCDecoderLM[source]

Language model base class for creating custom language models to use with the decoder.

Tutorials using CTCDecoderLM:
ASR Inference with CTC Decoder

ASR Inference with CTC Decoder

ASR Inference with CTC Decoder
abstract start(start_with_nothing: bool) CTCDecoderLMState[source]

Initialize or reset the language model.

Parameters:

start_with_nothing (bool) – whether or not to start sentence with sil token.

Returns:

starting state

Return type:

CTCDecoderLMState

abstract score(state: CTCDecoderLMState, usr_token_idx: int) Tuple[CTCDecoderLMState, float][source]

Evaluate the language model based on the current LM state and new word.

Parameters:
Returns:

(CTCDecoderLMState, float)
CTCDecoderLMState:

new LM state

float:

score

abstract finish(state: CTCDecoderLMState) Tuple[CTCDecoderLMState, float][source]

Evaluate end for language model based on current LM state.

Parameters:

state (CTCDecoderLMState) – current LM state

Returns:

(CTCDecoderLMState, float)
CTCDecoderLMState:

new LM state

float:

score

CTCDecoderLMState

class torchaudio.models.decoder.CTCDecoderLMState[source]

Language model state.

Tutorials using CTCDecoderLMState:
ASR Inference with CTC Decoder

ASR Inference with CTC Decoder

ASR Inference with CTC Decoder
property children: Dict[int, CTCDecoderLMState]

Map of indices to LM states

child(usr_index: int) CTCDecoderLMState[source]

Returns child corresponding to usr_index, or creates and returns a new state if input index is not found.

Parameters:

usr_index (int) – index corresponding to child state

Returns:

child state corresponding to usr_index

Return type:

CTCDecoderLMState

compare(state: CTCDecoderLMState) CTCDecoderLMState[source]

Compare two language model states.

Parameters:

state (CTCDecoderLMState) – LM state to compare against

Returns:

0 if the states are the same, -1 if self is less, +1 if self is greater.

Return type:

int

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources