torchaudio.functional.merge_tokens

torchaudio.functional.merge_tokens(tokens: Tensor, scores: Tensor, blank: int = 0) → List[TokenSpan][source]

Removes repeated tokens and blank tokens from the given CTC token sequence.

Parameters:

tokens (Tensor) – Alignment tokens (unbatched) returned from forced_align(). Shape: (time, ).
scores (Tensor) – Alignment scores (unbatched) returned from forced_align(). Shape: (time, ). When computing the token-size score, the given score is averaged across the corresponding time span.

Returns:

list of TokenSpan

Example

>>> aligned_tokens, scores = forced_align(emission, targets, input_lengths, target_lengths)
>>> token_spans = merge_tokens(aligned_tokens[0], scores[0])

Tutorials using merge_tokens:: CTC forced alignment API tutorial

CTC forced alignment API tutorial

Forced alignment for multilingual data

Forced alignment for multilingual data

torchaudio.functional.merge_tokens

Docs

Tutorials

Resources