• Docs >
  • torchaudio.models.decoder
Shortcuts

torchaudio.models.decoder

Decoder Class

CTCDecoder

class torchaudio.models.decoder.CTCDecoder(nbest: int, lexicon: Optional[Dict], word_dict: torchaudio.flashlight_lib_text_dictionary.Dictionary, tokens_dict: torchaudio.flashlight_lib_text_dictionary.Dictionary, lm: torchaudio.flashlight_lib_text_decoder.LM, decoder_options: Union[torchaudio.flashlight_lib_text_decoder.LexiconDecoderOptions, torchaudio.flashlight_lib_text_decoder.LexiconFreeDecoderOptions], blank_token: str, sil_token: str, unk_word: str)[source]
This feature supports the following devices: CPU

CTC beam search decoder from Flashlight [1].

Note

To build the decoder, please use the factory function ctc_decoder().

Parameters
  • nbest (int) – number of best decodings to return

  • lexicon (Dict or None) – lexicon mapping of words to spellings, or None for lexicon-free decoder

  • word_dict (_Dictionary) – dictionary of words

  • tokens_dict (_Dictionary) – dictionary of tokens

  • lm (CTCDecoderLM) – language model. If using a lexicon, only word level LMs are currently supported

  • decoder_options (_LexiconDecoderOptions or _LexiconFreeDecoderOptions) – parameters used for beam search decoding

  • blank_token (str) – token corresopnding to blank

  • sil_token (str) – token corresponding to silence

  • unk_word (str) – word corresponding to unknown

Tutorials using CTCDecoder:
ASR Inference with CTC Decoder

ASR Inference with CTC Decoder

ASR Inference with CTC Decoder
__call__(self, emissions: torch.FloatTensor, lengths: Optional[torch.Tensor] = None)List[List[torchaudio.models.decoder.CTCHypothesis]][source]
Parameters
  • emissions (torch.FloatTensor) – CPU tensor of shape (batch, frame, num_tokens) storing sequences of probability distribution over labels; output of acoustic model.

  • lengths (Tensor or None, optional) – CPU tensor of shape (batch, ) storing the valid length of in time axis of the output Tensor in each batch.

Returns

List of sorted best hypotheses for each audio sequence in the batch.

Return type

List[List[CTCHypothesis]]

idxs_to_tokens(idxs: torch.LongTensor)List[source]

Map raw token IDs into corresponding tokens

Parameters

idxs (LongTensor) – raw token IDs generated from decoder

Returns

tokens corresponding to the input IDs

Return type

List

CTCHypothesis

class torchaudio.models.decoder.CTCHypothesis(tokens: torch.LongTensor, words: List[str], score: float, timesteps: torch.IntTensor)[source]

Represents hypothesis generated by CTC beam search decoder CTCDecoder().

Note

The words field is only applicable if a lexicon is provided to the decoder. If decoding without a lexicon, it will be blank. Please refer to tokens and idxs_to_tokens instead.

Variables
  • tokens (torch.LongTensor) – Predicted sequence of token IDs. Shape (L, ), where L is the length of the output sequence

  • words (List[str]) – List of predicted words

  • score (float) – Score corresponding to hypothesis

  • timesteps (torch.IntTensor) – Timesteps corresponding to the tokens. Shape (L, ), where L is the length of the output sequence

Tutorials using CTCHypothesis:
ASR Inference with CTC Decoder

ASR Inference with CTC Decoder

ASR Inference with CTC Decoder

Factory Function

ctc_decoder

class torchaudio.models.decoder.ctc_decoder(lexicon: Optional[str], tokens: Union[str, List[str]], lm: Optional[Union[str, torchaudio.flashlight_lib_text_decoder.LM]] = None, lm_dict: Optional[str] = None, nbest: int = 1, beam_size: int = 50, beam_size_token: Optional[int] = None, beam_threshold: float = 50, lm_weight: float = 2, word_score: float = 0, unk_score: float = - inf, sil_score: float = 0, log_add: bool = False, blank_token: str = '-', sil_token: str = '|', unk_word: str = '<unk>')[source]

Builds CTC beam search decoder from Flashlight [1].

Parameters
  • lexicon (str or None) – lexicon file containing the possible words and corresponding spellings. Each line consists of a word and its space separated spelling. If None, uses lexicon-free decoding.

  • tokens (str or List[str]) – file or list containing valid tokens. If using a file, the expected format is for tokens mapping to the same index to be on the same line

  • lm (str, CTCDecoderLM, or None, optional) – either a path containing KenLM language model, custom language model of type CTCDecoderLM, or None if not using a language model

  • lm_dict (str or None, optional) – file consisting of the dictionary used for the LM, with a word per line sorted by LM index. If decoding with a lexicon, entries in lm_dict must also occur in the lexicon file. If None, dictionary for LM is constructed using the lexicon file. (Default: None)

  • nbest (int, optional) – number of best decodings to return (Default: 1)

  • beam_size (int, optional) – max number of hypos to hold after each decode step (Default: 50)

  • beam_size_token (int, optional) – max number of tokens to consider at each decode step. If None, it is set to the total number of tokens (Default: None)

  • beam_threshold (float, optional) – threshold for pruning hypothesis (Default: 50)

  • lm_weight (float, optional) – weight of language model (Default: 2)

  • word_score (float, optional) – word insertion score (Default: 0)

  • unk_score (float, optional) – unknown word insertion score (Default: -inf)

  • sil_score (float, optional) – silence insertion score (Default: 0)

  • log_add (bool, optional) – whether or not to use logadd when merging hypotheses (Default: False)

  • blank_token (str, optional) – token corresponding to blank (Default: “-“)

  • sil_token (str, optional) – token corresponding to silence (Default: “|”)

  • unk_word (str, optional) – word corresponding to unknown (Default: “<unk>”)

Returns

decoder

Return type

CTCDecoder

Example
>>> decoder = ctc_decoder(
>>>     lexicon="lexicon.txt",
>>>     tokens="tokens.txt",
>>>     lm="kenlm.bin",
>>> )
>>> results = decoder(emissions) # List of shape (B, nbest) of Hypotheses
Tutorials using ctc_decoder:
ASR Inference with CTC Decoder

ASR Inference with CTC Decoder

ASR Inference with CTC Decoder

Utility Function

download_pretrained_files

class torchaudio.models.decoder.download_pretrained_files(model: str)[source]

Retrieves pretrained data files used for CTC decoder.

Parameters

model (str) – pretrained language model to download. Options: [“librispeech-3-gram”, “librispeech-4-gram”, “librispeech”]

Returns

Object with the following attributes
lm:

path corresponding to downloaded language model, or None if the model is not associated with an lm

lexicon:

path corresponding to downloaded lexicon file

tokens:

path corresponding to downloaded tokens file

Tutorials using download_pretrained_files:
ASR Inference with CTC Decoder

ASR Inference with CTC Decoder

ASR Inference with CTC Decoder

References

1(1,2)

Jacob Kahn, Vineel Pratap, Tatiana Likhomanenko, Qiantong Xu, Awni Hannun, Jeff Cai, Paden Tomasello, Ann Lee, Edouard Grave, Gilad Avidov, and others. Flashlight: enabling innovation in tools for machine learning. arXiv preprint arXiv:2201.12465, 2022.

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources