ctc_decoder

torchaudio.models.decoder.ctc_decoder(lexicon: Optional[str], tokens: Union[str, List[str]], lm: Optional[Union[str, CTCDecoderLM]] = None, lm_dict: Optional[str] = None, nbest: int = 1, beam_size: int = 50, beam_size_token: Optional[int] = None, beam_threshold: float = 50, lm_weight: float = 2, word_score: float = 0, unk_score: float = -inf, sil_score: float = 0, log_add: bool = False, blank_token: str = '-', sil_token: str = '|', unk_word: str = '<unk>') → CTCDecoder[source]

Builds an instance of CTCDecoder.

Parameters:

lexicon (str or None) – lexicon file containing the possible words and corresponding spellings. Each line consists of a word and its space separated spelling. If None, uses lexicon-free decoding.
tokens (str or List[str]) – file or list containing valid tokens. If using a file, the expected format is for tokens mapping to the same index to be on the same line
lm (str, CTCDecoderLM, or None, optional) – either a path containing KenLM language model, custom language model of type CTCDecoderLM, or None if not using a language model
lm_dict (str or None, optional) – file consisting of the dictionary used for the LM, with a word per line sorted by LM index. If decoding with a lexicon, entries in lm_dict must also occur in the lexicon file. If None, dictionary for LM is constructed using the lexicon file. (Default: None)
nbest (int, optional) – number of best decodings to return (Default: 1)
beam_size (int, optional) – max number of hypos to hold after each decode step (Default: 50)
beam_size_token (int, optional) – max number of tokens to consider at each decode step. If None, it is set to the total number of tokens (Default: None)
beam_threshold (float, optional) – threshold for pruning hypothesis (Default: 50)
lm_weight (float, optional) – weight of language model (Default: 2)
word_score (float, optional) – word insertion score (Default: 0)
unk_score (float, optional) – unknown word insertion score (Default: -inf)
sil_score (float, optional) – silence insertion score (Default: 0)
log_add (bool, optional) – whether or not to use logadd when merging hypotheses (Default: False)
blank_token (str, optional) – token corresponding to blank (Default: “-“)
sil_token (str, optional) – token corresponding to silence (Default: “|”)
unk_word (str, optional) – word corresponding to unknown (Default: “<unk>”)

Returns:

decoder

Return type:

CTCDecoder

Example

>>> decoder = ctc_decoder(
>>>     lexicon="lexicon.txt",
>>>     tokens="tokens.txt",
>>>     lm="kenlm.bin",
>>> )
>>> results = decoder(emissions) # List of shape (B, nbest) of Hypotheses

Tutorials using ctc_decoder:

ASR Inference with CTC Decoder

ctc_decoder

Docs

Tutorials

Resources