class ignite.metrics.Bleu(ngram=4, smooth='no_smooth', output_transform=<function Bleu.<lambda>>, device=device(type='cpu'), average='macro')[source]#

Calculates the BLEU score.

BLEU=bpexp(n=1Nwnlogpn)\text{BLEU} = b_{p} \cdot \exp \left( \sum_{n=1}^{N} w_{n} \: \log p_{n} \right)

where NN is the order of n-grams, bpb_{p} is a sentence brevety penalty, wnw_{n} are positive weights summing to one and pnp_{n} are modified n-gram precisions.

More details can be found in Papineni et al. 2002.

In addition, a review of smoothing techniques can be found in Chen et al. 2014

  • update must receive output of the form (y_pred, y) or {'y_pred': y_pred, 'y': y}.

  • y_pred (list(list(str))) - a list of hypotheses sentences.

  • y (list(list(list(str))) - a corpus of lists of reference sentences w.r.t hypotheses.

Remark :

This implementation is inspired by nltk

  • ngram (int) – order of n-grams.

  • smooth (str) – enable smoothing. Valid are no_smooth, smooth1, nltk_smooth2 or smooth2. Default: no_smooth.

  • output_transform (Callable) – a callable that is used to transform the Engine’s process_function’s output into the form expected by the metric. This can be useful if, for example, you have a multi-output model and you want to compute the metric with respect to one of the outputs. By default, metrics require the output as (y_pred, y) or {'y_pred': y_pred, 'y': y}.

  • device (Union[str, device]) – specifies which device updates are accumulated on. Setting the metric’s device to be the same as your update arguments ensures the update method is non-blocking. By default, CPU.

  • average (str) – specifies which type of averaging to use (macro or micro) for more details refer Default: “macro”


For more information on how metric works with Engine, visit Attach Engine API.

from ignite.metrics.nlp import Bleu

m = Bleu(ngram=4, smooth="smooth1")

y_pred = "the the the the the the the"
y = ["the cat is on the mat", "there is a cat on the mat"]

m.update(([y_pred.split()], [[_y.split() for _y in y]]))

tensor(0.0393, dtype=torch.float64)

New in version 0.4.5.

Changed in version 0.4.7:

  • update method has changed and now works on batch of inputs.

  • added average option to handle micro and macro averaging modes.



Computes the metric based on it's accumulated state.


Resets the metric to it's initial state.


Updates the metric's state using the passed batch output.


Computes the metric based on it’s accumulated state.

By default, this is called at the end of each epoch.


the actual quantity of interest. However, if a Mapping is returned, it will be (shallow) flattened into engine.state.metrics when completed() is called.

Return type



NotComputableError – raised when the metric cannot be computed.


Resets the metric to it’s initial state.

By default, this is called at the start of each epoch.

Return type



Updates the metric’s state using the passed batch output.

By default, this is called once for each batch.


output (Tuple[Sequence[Sequence[Any]], Sequence[Sequence[Sequence[Any]]]]) – the is the output from the engine’s process function.

Return type