torchrl.data package¶
Replay Buffers¶
Replay buffers are a central part of off-policy RL algorithms. TorchRL provides an efficient implementation of a few, widely used replay buffers:
|
A generic, composable replay buffer class. |
|
Prioritized replay buffer. |
|
TensorDict-specific wrapper around the |
|
TensorDict-specific wrapper around the |
Composable Replay Buffers¶
We also give users the ability to compose a replay buffer using the following components:
|
A generic sampler base class for composable Replay Buffers. |
|
Prioritized sampler for replay buffer. |
A uniformly random sampler for composable replay buffers. |
|
|
A data-consuming sampler that ensures that the same sample is not present in consecutive batches. |
|
A Storage is the container of a replay buffer. |
|
A storage stored in a list. |
|
A pre-allocated tensor storage for tensors and tensordicts. |
|
A memory-mapped storage for tensors and tensordicts. |
|
A ReplayBuffer base Writer class. |
|
A RoundRobin Writer class for composable replay buffers. |
Storage choice is very influential on replay buffer sampling latency, especially in distributed reinforcement learning settings with larger data volumes.
LazyMemmapStorage
is highly advised in distributed settings with shared storage due to the lower serialisation cost of MemmapTensors as well as the ability to specify file storage locations for improved node failure recovery.
The following mean sampling latency improvements over using ListStorage were found from rough benchmarking in https://github.com/pytorch/rl/tree/main/benchmarks/storage.
Storage Type |
Speed up |
---|---|
1x |
|
1.83x |
|
3.44x |
Storing trajectories¶
It is not too difficult to store trajectories in the replay buffer. One element to pay attention to is that the size of the replay buffer is always the size of the leading dimension of the storage: in other words, creating a replay buffer with a storage of size 1M when storing multidimensional data does not mean storing 1M frames but 1M trajectories.
When sampling trajectories, it may be desirable to sample sub-trajectories
to diversify learning or make the sampling more efficient.
To do this, we provide a custom Transform
class named
RandomCropTensorDict
. Here is an example of how this class
can be used:
Datasets¶
TorchRL provides wrappers around offline RL datasets.
These data are presented a ReplayBuffer
instances, which
means that they can be customized at will with transforms, samplers and storages.
By default, datasets are stored as memory mapped tensors, allowing them to be
promptly sampled with virtually no memory footprint.
Here’s an example:
Note
Installing dependencies is the responsibility of the user. For D4RL, a clone of the repository is needed as the latest wheels are not published on PyPI. For OpenML, scikit-learn and pandas are required.
|
An Experience replay class for D4RL. |
|
An experience replay for OpenML data. |
TensorSpec¶
The TensorSpec parent class and subclasses define the basic properties of observations and actions in TorchRL, such
as shape, device, dtype and domain.
It is important that your environment specs match the input and output that it sends and receives, as
ParallelEnv
will create buffers from these specs to communicate with the spawn processes.
Check the torchrl.envs.utils.check_env_specs
method for a sanity check.
|
Parent class of the tensor meta-data containers for observation, actions and rewards. |
|
A binary discrete tensor spec. |
|
A bounded continuous tensor spec. |
|
A composition of TensorSpecs. |
|
A discrete tensor spec. |
|
A concatenation of discrete tensor spec. |
|
A concatenation of one-hot discrete tensor spec. |
|
A unidimensional, one-hot discrete tensor spec. |
|
An unbounded continuous tensor spec. |
|
An unbounded discrete tensor spec. |
Utils¶
|
Multistep reward transform. |