Shortcuts package

Replay Buffers

Replay buffers are a central part of off-policy RL algorithms. TorchRL provides an efficient implementation of a few, widely used replay buffers:

ReplayBuffer(*[, storage, sampler, writer, ...])

A generic, composable replay buffer class.

PrioritizedReplayBuffer(*, alpha, beta[, ...])

Prioritized replay buffer.

TensorDictReplayBuffer(*[, priority_key])

TensorDict-specific wrapper around the ReplayBuffer class.

TensorDictPrioritizedReplayBuffer(*, alpha, beta)

TensorDict-specific wrapper around the PrioritizedReplayBuffer class.

Composable Replay Buffers

We also give users the ability to compose a replay buffer using the following components:


A generic sampler base class for composable Replay Buffers.

PrioritizedSampler(max_capacity, alpha, beta)

Prioritized sampler for replay buffer.


A uniformly random sampler for composable replay buffers.


A data-consuming sampler that ensures that the same sample is not present in consecutive batches.


A Storage is the container of a replay buffer.


A storage stored in a list.

LazyTensorStorage(*args, **kwargs)

A pre-allocated tensor storage for tensors and tensordicts.

LazyMemmapStorage(*args, **kwargs)

A memory-mapped storage for tensors and tensordicts.


A ReplayBuffer base Writer class.


A RoundRobin Writer class for composable replay buffers.

Storage choice is very influential on replay buffer sampling latency, especially in distributed reinforcement learning settings with larger data volumes. LazyMemmapStorage is highly advised in distributed settings with shared storage due to the lower serialisation cost of MemmapTensors as well as the ability to specify file storage locations for improved node failure recovery. The following mean sampling latency improvements over using ListStorage were found from rough benchmarking in

Storage Type

Speed up







Storing trajectories

It is not too difficult to store trajectories in the replay buffer. One element to pay attention to is that the size of the replay buffer is always the size of the leading dimension of the storage: in other words, creating a replay buffer with a storage of size 1M when storing multidimensional data does not mean storing 1M frames but 1M trajectories.

When sampling trajectories, it may be desirable to sample sub-trajectories to diversify learning or make the sampling more efficient. To do this, we provide a custom Transform class named RandomCropTensorDict. Here is an example of how this class can be used:


TorchRL provides wrappers around offline RL datasets. These data are presented a ReplayBuffer instances, which means that they can be customized at will with transforms, samplers and storages. By default, datasets are stored as memory mapped tensors, allowing them to be promptly sampled with virtually no memory footprint. Here’s an example:


Installing dependencies is the responsibility of the user. For D4RL, a clone of the repository is needed as the latest wheels are not published on PyPI. For OpenML, scikit-learn and pandas are required.

D4RLExperienceReplay(name, batch_size[, ...])

An Experience replay class for D4RL.

OpenMLExperienceReplay(name, batch_size[, ...])

An experience replay for OpenML data.


The TensorSpec parent class and subclasses define the basic properties of observations and actions in TorchRL, such as shape, device, dtype and domain. It is important that your environment specs match the input and output that it sends and receives, as ParallelEnv will create buffers from these specs to communicate with the spawn processes. Check the torchrl.envs.utils.check_env_specs method for a sanity check.

TensorSpec(shape, space[, device, dtype, domain])

Parent class of the tensor meta-data containers for observation, actions and rewards.

BinaryDiscreteTensorSpec(n[, shape, device, ...])

A binary discrete tensor spec.

BoundedTensorSpec(minimum, maximum[, shape, ...])

A bounded continuous tensor spec.

CompositeSpec(*args, **kwargs)

A composition of TensorSpecs.

DiscreteTensorSpec(n[, shape, device, dtype])

A discrete tensor spec.

MultiDiscreteTensorSpec(nvec[, shape, ...])

A concatenation of discrete tensor spec.

MultiOneHotDiscreteTensorSpec(nvec[, shape, ...])

A concatenation of one-hot discrete tensor spec.

OneHotDiscreteTensorSpec(n[, shape, device, ...])

A unidimensional, one-hot discrete tensor spec.

UnboundedContinuousTensorSpec([shape, ...])

An unbounded continuous tensor spec.

UnboundedDiscreteTensorSpec([shape, device, ...])

An unbounded discrete tensor spec.


MultiStep(gamma, n_steps)

Multistep reward transform.


Access comprehensive developer documentation for PyTorch

View Docs


Get in-depth tutorials for beginners and advanced developers

View Tutorials


Find development resources and get your questions answered

View Resources