class int, alpha: float, beta: float, eps: float = 1e-08, dtype: dtype = torch.float32, reduction: str = 'max')[source]

Prioritized sampler for replay buffer.

Presented in “Schaul, T.; Quan, J.; Antonoglou, I.; and Silver, D. 2015.

Prioritized experience replay.” (

  • alpha (float) – exponent α determines how much prioritization is used, with α = 0 corresponding to the uniform case.

  • beta (float) – importance sampling negative exponent.

  • eps (float, optional) – delta added to the priorities to ensure that the buffer does not contain null priorities. Defaults to 1e-8.

  • reduction (str, optional) – the reduction method for multidimensional tensordicts (ie stored trajectories). Can be one of “max”, “min”, “median” or “mean”.

update_priority(index: Union[int, Tensor], priority: Union[float, Tensor]) None[source]

Updates the priority of the data pointed by the index.

  • index (int or torch.Tensor) – indexes of the priorities to be updated.

  • priority (Number or torch.Tensor) – new priorities of the indexed elements.


Access comprehensive developer documentation for PyTorch

View Docs


Get in-depth tutorials for beginners and advanced developers

View Tutorials


Find development resources and get your questions answered

View Resources