Shortcuts

ctc_decoder

torchaudio.models.decoder.ctc_decoder(lexicon: Optional[str], tokens: Union[str, List[str]], lm: Optional[Union[str, CTCDecoderLM]] = None, lm_dict: Optional[str] = None, nbest: int = 1, beam_size: int = 50, beam_size_token: Optional[int] = None, beam_threshold: float = 50, lm_weight: float = 2, word_score: float = 0, unk_score: float = -inf, sil_score: float = 0, log_add: bool = False, blank_token: str = '-', sil_token: str = '|', unk_word: str = '<unk>') CTCDecoder[source]

Builds an instance of CTCDecoder.

Parameters:
  • lexicon (str or None) – lexicon file containing the possible words and corresponding spellings. Each line consists of a word and its space separated spelling. If None, uses lexicon-free decoding.

  • tokens (str or List[str]) – file or list containing valid tokens. If using a file, the expected format is for tokens mapping to the same index to be on the same line

  • lm (str, CTCDecoderLM, or None, optional) – either a path containing KenLM language model, custom language model of type CTCDecoderLM, or None if not using a language model

  • lm_dict (str or None, optional) – file consisting of the dictionary used for the LM, with a word per line sorted by LM index. If decoding with a lexicon, entries in lm_dict must also occur in the lexicon file. If None, dictionary for LM is constructed using the lexicon file. (Default: None)

  • nbest (int, optional) – number of best decodings to return (Default: 1)

  • beam_size (int, optional) – max number of hypos to hold after each decode step (Default: 50)

  • beam_size_token (int, optional) – max number of tokens to consider at each decode step. If None, it is set to the total number of tokens (Default: None)

  • beam_threshold (float, optional) – threshold for pruning hypothesis (Default: 50)

  • lm_weight (float, optional) – weight of language model (Default: 2)

  • word_score (float, optional) – word insertion score (Default: 0)

  • unk_score (float, optional) – unknown word insertion score (Default: -inf)

  • sil_score (float, optional) – silence insertion score (Default: 0)

  • log_add (bool, optional) – whether or not to use logadd when merging hypotheses (Default: False)

  • blank_token (str, optional) – token corresponding to blank (Default: “-“)

  • sil_token (str, optional) – token corresponding to silence (Default: “|”)

  • unk_word (str, optional) – word corresponding to unknown (Default: “<unk>”)

Returns:

decoder

Return type:

CTCDecoder

Example
>>> decoder = ctc_decoder(
>>>     lexicon="lexicon.txt",
>>>     tokens="tokens.txt",
>>>     lm="kenlm.bin",
>>> )
>>> results = decoder(emissions) # List of shape (B, nbest) of Hypotheses
Tutorials using ctc_decoder:
ASR Inference with CTC Decoder

ASR Inference with CTC Decoder

ASR Inference with CTC Decoder

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources