CTCDecoder¶
- class torchaudio.models.decoder.CTCDecoder[source]¶
CTC beam search decoder from Flashlight [Kahn et al., 2022].
Note
To build the decoder, please use the factory function
ctc_decoder()
.- Tutorials using
CTCDecoder
: ASR Inference with CTC Decoder
ASR Inference with CTC Decoder
- Tutorials using
Methods¶
__call__¶
- CTCDecoder.__call__(emissions: FloatTensor, lengths: Optional[Tensor] = None) List[List[CTCHypothesis]] [source]¶
- Parameters:
emissions (torch.FloatTensor) – CPU tensor of shape (batch, frame, num_tokens) storing sequences of probability distribution over labels; output of acoustic model.
lengths (Tensor or None, optional) – CPU tensor of shape (batch, ) storing the valid length of in time axis of the output Tensor in each batch.
- Returns:
List of sorted best hypotheses for each audio sequence in the batch.
- Return type:
List[List[CTCHypothesis]]
idxs_to_tokens¶
Support Structures¶
CTCHypothesis¶
- class torchaudio.models.decoder.CTCHypothesis(tokens: torch.LongTensor, words: List[str], score: float, timesteps: torch.IntTensor)[source]¶
Represents hypothesis generated by CTC beam search decoder
CTCDecoder
.- Tutorials using
CTCHypothesis
: ASR Inference with CTC Decoder
ASR Inference with CTC Decoder
- tokens: LongTensor¶
Predicted sequence of token IDs. Shape (L, ), where L is the length of the output sequence
- words: List[str]¶
List of predicted words.
Note
This attribute is only applicable if a lexicon is provided to the decoder. If decoding without a lexicon, it will be blank. Please refer to
tokens
andidxs_to_tokens()
instead.
- timesteps: IntTensor¶
Timesteps corresponding to the tokens. Shape (L, ), where L is the length of the output sequence
- Tutorials using
CTCDecoderLM¶
- class torchaudio.models.decoder.CTCDecoderLM[source]¶
Language model base class for creating custom language models to use with the decoder.
- Tutorials using
CTCDecoderLM
: ASR Inference with CTC Decoder
ASR Inference with CTC Decoder
- abstract start(start_with_nothing: bool) CTCDecoderLMState [source]¶
Initialize or reset the language model.
- Parameters:
start_with_nothing (bool) – whether or not to start sentence with sil token.
- Returns:
starting state
- Return type:
- abstract score(state: CTCDecoderLMState, usr_token_idx: int) Tuple[CTCDecoderLMState, float] [source]¶
Evaluate the language model based on the current LM state and new word.
- Parameters:
state (CTCDecoderLMState) – current LM state
usr_token_idx (int) – index of the word
- Returns:
- (CTCDecoderLMState, float)
- CTCDecoderLMState:
new LM state
- float:
score
- abstract finish(state: CTCDecoderLMState) Tuple[CTCDecoderLMState, float] [source]¶
Evaluate end for language model based on current LM state.
- Parameters:
state (CTCDecoderLMState) – current LM state
- Returns:
- (CTCDecoderLMState, float)
- CTCDecoderLMState:
new LM state
- float:
score
- Tutorials using
CTCDecoderLMState¶
- class torchaudio.models.decoder.CTCDecoderLMState[source]¶
Language model state.
- Tutorials using
CTCDecoderLMState
: ASR Inference with CTC Decoder
ASR Inference with CTC Decoder
- property children: Dict[int, CTCDecoderLMState]¶
Map of indices to LM states
- child(usr_index: int) CTCDecoderLMState [source]¶
Returns child corresponding to usr_index, or creates and returns a new state if input index is not found.
- Parameters:
usr_index (int) – index corresponding to child state
- Returns:
child state corresponding to usr_index
- Return type:
- compare(state: CTCDecoderLMState) CTCDecoderLMState [source]¶
Compare two language model states.
- Parameters:
state (CTCDecoderLMState) – LM state to compare against
- Returns:
0 if the states are the same, -1 if self is less, +1 if self is greater.
- Return type:
- Tutorials using