- class torchrl.data.replay_buffers.PrioritizedSampler(max_capacity: int, alpha: float, beta: float, eps: float = 1e-08, dtype: dtype = torch.float32, reduction: str = 'max')¶
Prioritized sampler for replay buffer.
- Presented in “Schaul, T.; Quan, J.; Antonoglou, I.; and Silver, D. 2015.
Prioritized experience replay.” (https://arxiv.org/abs/1511.05952)
alpha (float) – exponent α determines how much prioritization is used, with α = 0 corresponding to the uniform case.
beta (float) – importance sampling negative exponent.
eps (float, optional) – delta added to the priorities to ensure that the buffer does not contain null priorities. Defaults to 1e-8.
reduction (str, optional) – the reduction method for multidimensional tensordicts (ie stored trajectories). Can be one of “max”, “min”, “median” or “mean”.