CTCDecoder¶

class torchaudio.models.decoder.CTCDecoder[source]¶

CTC beam search decoder from Flashlight [Kahn et al., 2022].

Note

To build the decoder, please use the factory function ctc_decoder().

Tutorials using CTCDecoder:: ASR Inference with CTC Decoder

ASR Inference with CTC Decoder

Methods¶

call¶

CTCDecoder.__call__(emissions: FloatTensor, lengths: Optional[Tensor] = None) → List[List[CTCHypothesis]][source]¶

Performs batched offline decoding.

Note

This method performs offline decoding in one go. To perform incremental decoding, please refer to decode_step().

Parameters:

emissions (torch.FloatTensor) – CPU tensor of shape (batch, frame, num_tokens) storing sequences of probability distribution over labels; output of acoustic model.
lengths (Tensor or None, optional) – CPU tensor of shape (batch, ) storing the valid length of in time axis of the output Tensor in each batch.

Returns:

List of sorted best hypotheses for each audio sequence in the batch.

Return type:

List[List[CTCHypothesis]]

decode_begin¶

CTCDecoder.decode_begin()[source]¶

Initialize the internal state of the decoder.

See decode_step() for the usage.

Note

This method is required only when performing online decoding. It is not necessary when performing batch decoding with __call__().

decode_end¶

CTCDecoder.decode_end()[source]¶

Finalize the internal state of the decoder.

See decode_step() for the usage.

Note

This method is required only when performing online decoding. It is not necessary when performing batch decoding with __call__().

decode_step¶

CTCDecoder.decode_step(emissions: FloatTensor)[source]¶

Perform incremental decoding on top of the curent internal state.

Note

This method is required only when performing online decoding. It is not necessary when performing batch decoding with __call__().

Parameters:: emissions (torch.FloatTensor) – CPU tensor of shape (frame, num_tokens) storing sequences of probability distribution over labels; output of acoustic model.

Example

>>> decoder = torchaudio.models.decoder.ctc_decoder(...)
>>> decoder.decode_begin()
>>> decoder.decode_step(emission1)
>>> decoder.decode_step(emission2)
>>> decoder.decode_end()
>>> result = decoder.get_final_hypothesis()

get_final_hypothesis¶

CTCDecoder.get_final_hypothesis() → List[CTCHypothesis][source]¶

Get the final hypothesis

Returns:: List of sorted best hypotheses.
Return type:: List[CTCHypothesis]

Note

This method is required only when performing online decoding. It is not necessary when performing batch decoding with __call__().

idxs_to_tokens¶

CTCDecoder.idxs_to_tokens(idxs: LongTensor) → List[source]¶

Map raw token IDs into corresponding tokens

Parameters:: idxs (LongTensor) – raw token IDs generated from decoder
Returns:: tokens corresponding to the input IDs
Return type:: List

Support Structures¶

CTCHypothesis¶

class torchaudio.models.decoder.CTCHypothesis(tokens: torch.LongTensor, words: List[str], score: float, timesteps: torch.IntTensor)[source]¶

Represents hypothesis generated by CTC beam search decoder CTCDecoder.

Tutorials using CTCHypothesis:: ASR Inference with CTC Decoder

ASR Inference with CTC Decoder

tokens: LongTensor¶: Predicted sequence of token IDs. Shape (L, ), where L is the length of the output sequence

words: List[str]¶: List of predicted words.

Note

This attribute is only applicable if a lexicon is provided to the decoder. If decoding without a lexicon, it will be blank. Please refer to tokens and idxs_to_tokens() instead.

score: float¶: Score corresponding to hypothesis

timesteps: IntTensor¶: Timesteps corresponding to the tokens. Shape (L, ), where L is the length of the output sequence

CTCDecoderLM¶

class torchaudio.models.decoder.CTCDecoderLM[source]¶

Language model base class for creating custom language models to use with the decoder.

Tutorials using CTCDecoderLM:: ASR Inference with CTC Decoder

ASR Inference with CTC Decoder

abstract start(start_with_nothing: bool) → CTCDecoderLMState[source]¶

Initialize or reset the language model.

Parameters:: start_with_nothing (bool) – whether or not to start sentence with sil token.
Returns:: starting state
Return type:: CTCDecoderLMState

abstract score(state: CTCDecoderLMState, usr_token_idx: int) → Tuple[CTCDecoderLMState, float][source]¶

Evaluate the language model based on the current LM state and new word.

Parameters:

state (CTCDecoderLMState) – current LM state
usr_token_idx (int) – index of the word

Returns:

(CTCDecoderLMState, float)

CTCDecoderLMState:: new LM state
float:: score

abstract finish(state: CTCDecoderLMState) → Tuple[CTCDecoderLMState, float][source]¶

Evaluate end for language model based on current LM state.

Parameters:

state (CTCDecoderLMState) – current LM state

Returns:

(CTCDecoderLMState, float)

CTCDecoderLMState:: new LM state
float:: score

CTCDecoderLMState¶

class torchaudio.models.decoder.CTCDecoderLMState[source]¶

Language model state.

Tutorials using CTCDecoderLMState:: ASR Inference with CTC Decoder

ASR Inference with CTC Decoder

property children: Dict[int, CTCDecoderLMState]¶: Map of indices to LM states

child(usr_index: int) → CTCDecoderLMState[source]¶

Returns child corresponding to usr_index, or creates and returns a new state if input index is not found.

Parameters:: usr_index (int) – index corresponding to child state
Returns:: child state corresponding to usr_index
Return type:: CTCDecoderLMState

compare(state: CTCDecoderLMState) → CTCDecoderLMState[source]¶

Compare two language model states.

Parameters:: state (CTCDecoderLMState) – LM state to compare against
Returns:: 0 if the states are the same, -1 if self is less, +1 if self is greater.
Return type:: int

CTCDecoder¶

Methods¶

call¶

decode_begin¶

decode_end¶

decode_step¶

get_final_hypothesis¶

idxs_to_tokens¶

Support Structures¶

CTCHypothesis¶

CTCDecoderLM¶

CTCDecoderLMState¶

Docs

Tutorials

Resources

CTCDecoder¶

Methods¶

__call__¶

decode_begin¶

decode_end¶

decode_step¶

get_final_hypothesis¶

idxs_to_tokens¶

Support Structures¶

CTCHypothesis¶

CTCDecoderLM¶

CTCDecoderLMState¶

Docs

Tutorials

Resources

call¶