torchtext.data.metrics¶
bleu_score¶
- torchtext.data.metrics.bleu_score(candidate_corpus, references_corpus, max_n=4, weights=[0.25, 0.25, 0.25, 0.25])[source]¶
Computes the BLEU score between a candidate translation corpus and a references translation corpus. Based on https://www.aclweb.org/anthology/P02-1040.pdf
- Parameters:
candidate_corpus – an iterable of candidate translations. Each translation is an iterable of tokens
references_corpus – an iterable of iterables of reference translations. Each translation is an iterable of tokens
max_n – the maximum n-gram we want to use. E.g. if max_n=3, we will use unigrams, bigrams and trigrams
weights – a list of weights used for each n-gram category (uniform by default)
Examples
>>> from torchtext.data.metrics import bleu_score >>> candidate_corpus = [['My', 'full', 'pytorch', 'test'], ['Another', 'Sentence']] >>> references_corpus = [[['My', 'full', 'pytorch', 'test'], ['Completely', 'Different']], [['No', 'Match']]] >>> bleu_score(candidate_corpus, references_corpus) 0.8408964276313782