Shortcuts package

Replay Buffers

Replay buffers are a central part of off-policy RL algorithms. TorchRL provides an efficient implementation of a few, widely used replay buffers:

ReplayBuffer(*[, storage, sampler, writer, ...])

A generic, composable replay buffer class.

PrioritizedReplayBuffer(*, alpha, beta[, ...])

Prioritized replay buffer.

TensorDictReplayBuffer(*[, priority_key])

TensorDict-specific wrapper around the ReplayBuffer class.

TensorDictPrioritizedReplayBuffer(*, alpha, beta)

TensorDict-specific wrapper around the PrioritizedReplayBuffer class.

Composable Replay Buffers

We also give users the ability to compose a replay buffer using the following components:


A generic sampler base class for composable Replay Buffers.

PrioritizedSampler(max_capacity, alpha, beta)

Prioritized sampler for replay buffer.


A uniformly random sampler for composable replay buffers.


A data-consuming sampler that ensures that the same sample is not present in consecutive batches.


A Storage is the container of a replay buffer.


A storage stored in a list.

LazyTensorStorage(*args, **kwargs)

A pre-allocated tensor storage for tensors and tensordicts.

LazyMemmapStorage(*args, **kwargs)

A memory-mapped storage for tensors and tensordicts.

TensorStorage(*args, **kwargs)

A storage for tensors and tensordicts.


A ReplayBuffer base Writer class.


A RoundRobin Writer class for composable replay buffers.


A RoundRobin Writer class for composable, tensordict-based replay buffers.


A Writer class for composable replay buffers that keeps the top elements based on some ranking key.

Storage choice is very influential on replay buffer sampling latency, especially in distributed reinforcement learning settings with larger data volumes. LazyMemmapStorage is highly advised in distributed settings with shared storage due to the lower serialisation cost of MemmapTensors as well as the ability to specify file storage locations for improved node failure recovery. The following mean sampling latency improvements over using ListStorage were found from rough benchmarking in

Storage Type

Speed up







Storing trajectories

It is not too difficult to store trajectories in the replay buffer. One element to pay attention to is that the size of the replay buffer is always the size of the leading dimension of the storage: in other words, creating a replay buffer with a storage of size 1M when storing multidimensional data does not mean storing 1M frames but 1M trajectories.

When sampling trajectories, it may be desirable to sample sub-trajectories to diversify learning or make the sampling more efficient. To do this, we provide a custom Transform class named RandomCropTensorDict. Here is an example of how this class can be used:


TorchRL provides wrappers around offline RL datasets. These data are presented a ReplayBuffer instances, which means that they can be customized at will with transforms, samplers and storages. By default, datasets are stored as memory mapped tensors, allowing them to be promptly sampled with virtually no memory footprint. Here’s an example:


Installing dependencies is the responsibility of the user. For D4RL, a clone of the repository is needed as the latest wheels are not published on PyPI. For OpenML, scikit-learn and pandas are required.

D4RLExperienceReplay(name, batch_size[, ...])

An Experience replay class for D4RL.

OpenMLExperienceReplay(name, batch_size[, ...])

An experience replay for OpenML data.


The TensorSpec parent class and subclasses define the basic properties of observations and actions in TorchRL, such as shape, device, dtype and domain. It is important that your environment specs match the input and output that it sends and receives, as ParallelEnv will create buffers from these specs to communicate with the spawn processes. Check the torchrl.envs.utils.check_env_specs method for a sanity check.

TensorSpec(shape, space[, device, dtype, domain])

Parent class of the tensor meta-data containers for observation, actions and rewards.

BinaryDiscreteTensorSpec(n[, shape, device, ...])

A binary discrete tensor spec.

BoundedTensorSpec([low, high, shape, ...])

A bounded continuous tensor spec.

CompositeSpec(*args, **kwargs)

A composition of TensorSpecs.

DiscreteTensorSpec(n[, shape, device, ...])

A discrete tensor spec.

MultiDiscreteTensorSpec(nvec[, shape, ...])

A concatenation of discrete tensor spec.

MultiOneHotDiscreteTensorSpec(nvec[, shape, ...])

A concatenation of one-hot discrete tensor spec.

OneHotDiscreteTensorSpec(n[, shape, device, ...])

A unidimensional, one-hot discrete tensor spec.

UnboundedContinuousTensorSpec([shape, ...])

An unbounded continuous tensor spec.

UnboundedDiscreteTensorSpec([shape, device, ...])

An unbounded discrete tensor spec.

LazyStackedTensorSpec(*specs, dim)

A lazy representation of a stack of tensor specs.

LazyStackedCompositeSpec(*specs, dim)

A lazy representation of a stack of composite specs.

Reinforcement Learning From Human Feedback (RLHF)

Data is of utmost importance in Reinforcement Learning from Human Feedback (RLHF). Given that these techniques are commonly employed in the realm of language, which is scarcely addressed in other subdomains of RL within the library, we offer specific utilities to facilitate interaction with external libraries like datasets. These utilities consist of tools for tokenizing data, formatting it in a manner suitable for TorchRL modules, and optimizing storage for efficient sampling.

PairwiseDataset(chosen_data, rejected_data, ...)

PromptData(input_ids, attention_mask, ...[, ...])

PromptTensorDictTokenizer(tokenizer, max_length)

Tokenization recipe for prompt datasets.

RewardData(input_ids, attention_mask[, ...])

RolloutFromModel(model, ref_model, reward_model)

A class for performing rollouts with causal language models.

TensorDictTokenizer(tokenizer, max_length[, ...])

Factory for a process function that applies a tokenizer over a text example.

TokenizedDatasetLoader(split, max_length, ...)

Loads a tokenizes dataset, and caches a memory-mapped copy of it.


Iterates indefinitely over an iterator.

get_dataloader(batch_size, block_size, ...)

Creates a dataset and returns a dataloader from it.


MultiStep(gamma, n_steps)

Multistep reward transform.

consolidate_spec(spec[, ...])

Given a TensorSpec, removes exclusive keys by adding 0 shaped specs.

check_no_exclusive_keys(spec[, recurse])

Given a TensorSpec, returns true if there are no exclusive keys.


Returns true if a spec contains lazy stacked specs.


Access comprehensive developer documentation for PyTorch

View Docs


Get in-depth tutorials for beginners and advanced developers

View Tutorials


Find development resources and get your questions answered

View Resources