TensorDictReplayBuffer¶
- class torchrl.data.TensorDictReplayBuffer(*, priority_key: str = 'td_error', **kw)[source]¶
TensorDict-specific wrapper around the
ReplayBuffer
class.- Keyword Arguments:
storage (Storage, optional) – the storage to be used. If none is provided a default
ListStorage
withmax_size
of1_000
will be created.sampler (Sampler, optional) – the sampler to be used. If none is provided a default RandomSampler() will be used.
writer (Writer, optional) – the writer to be used. If none is provided a default
RoundRobinWriter
will be used.collate_fn (callable, optional) – merges a list of samples to form a mini-batch of Tensor(s)/outputs. Used when using batched loading from a map-style dataset. The default value will be decided based on the storage type.
pin_memory (bool) – whether pin_memory() should be called on the rb samples.
prefetch (int, optional) – number of next batches to be prefetched using multithreading. Defaults to None (no prefetching).
transform (Transform, optional) – Transform to be executed when sample() is called. To chain transforms use the
Compose
class. Transforms should be used withtensordict.TensorDict
content. If used with other structures, the transforms should be encoded with a"data"
leading key that will be used to construct a tensordict from the non-tensordict content.batch_size (int, optional) –
the batch size to be used when sample() is called. .. note:
The batch-size can be specified at construction time via the ``batch_size`` argument, or at sampling time. The former should be preferred whenever the batch-size is consistent across the experiment. If the batch-size is likely to change, it can be passed to the :meth:`~.sample` method. This option is incompatible with prefetching (since this requires to know the batch-size in advance) as well as with samplers that have a ``drop_last`` argument.
priority_key (str, optional) – the key at which priority is assumed to be stored within TensorDicts added to this ReplayBuffer. This is to be used when the sampler is of type
PrioritizedSampler
. Defaults to"td_error"
.
Examples
>>> import torch >>> >>> from torchrl.data import LazyTensorStorage, TensorDictReplayBuffer >>> from tensordict import TensorDict >>> >>> torch.manual_seed(0) >>> >>> rb = TensorDictReplayBuffer(storage=LazyTensorStorage(10), batch_size=5) >>> data = TensorDict({"a": torch.ones(10, 3), ("b", "c"): torch.zeros(10, 1, 1)}, [10]) >>> rb.extend(data) >>> sample = rb.sample(3) >>> # samples keep track of the index >>> print(sample) TensorDict( fields={ a: Tensor(shape=torch.Size([3, 3]), device=cpu, dtype=torch.float32, is_shared=False), b: TensorDict( fields={ c: Tensor(shape=torch.Size([3, 1, 1]), device=cpu, dtype=torch.float32, is_shared=False)}, batch_size=torch.Size([3]), device=cpu, is_shared=False), index: Tensor(shape=torch.Size([3]), device=cpu, dtype=torch.int32, is_shared=False)}, batch_size=torch.Size([3]), device=cpu, is_shared=False) >>> # we can iterate over the buffer >>> for i, data in enumerate(rb): ... print(i, data) ... if i == 2: ... break 0 TensorDict( fields={ a: Tensor(shape=torch.Size([5, 3]), device=cpu, dtype=torch.float32, is_shared=False), b: TensorDict( fields={ c: Tensor(shape=torch.Size([5, 1, 1]), device=cpu, dtype=torch.float32, is_shared=False)}, batch_size=torch.Size([5]), device=cpu, is_shared=False), index: Tensor(shape=torch.Size([5]), device=cpu, dtype=torch.int32, is_shared=False)}, batch_size=torch.Size([5]), device=cpu, is_shared=False) 1 TensorDict( fields={ a: Tensor(shape=torch.Size([5, 3]), device=cpu, dtype=torch.float32, is_shared=False), b: TensorDict( fields={ c: Tensor(shape=torch.Size([5, 1, 1]), device=cpu, dtype=torch.float32, is_shared=False)}, batch_size=torch.Size([5]), device=cpu, is_shared=False), index: Tensor(shape=torch.Size([5]), device=cpu, dtype=torch.int32, is_shared=False)}, batch_size=torch.Size([5]), device=cpu, is_shared=False)
- add(data: TensorDictBase) int [source]¶
Add a single element to the replay buffer.
- Parameters:
data (Any) – data to be added to the replay buffer
- Returns:
index where the data lives in the replay buffer.
- append_transform(transform: Transform) None ¶
Appends transform at the end.
Transforms are applied in order when sample is called.
- Parameters:
transform (Transform) – The transform to be appended
- empty()¶
Empties the replay buffer and reset cursor to 0.
- extend(tensordicts: Union[List, TensorDictBase]) Tensor [source]¶
Extends the replay buffer with one or more elements contained in an iterable.
If present, the inverse transforms will be called.`
- Parameters:
data (iterable) – collection of data to be added to the replay buffer.
- Returns:
Indices of the data added to the replay buffer.
- insert_transform(index: int, transform: Transform) None ¶
Inserts transform.
Transforms are executed in order when sample is called.
- Parameters:
index (int) – Position to insert the transform.
transform (Transform) – The transform to be appended
- sample(batch_size: Optional[int] = None, return_info: bool = False, include_info: Optional[bool] = None) TensorDictBase [source]¶
Samples a batch of data from the replay buffer.
Uses Sampler to sample indices, and retrieves them from Storage.
- Parameters:
batch_size (int, optional) – size of data to be collected. If none is provided, this method will sample a batch-size as indicated by the sampler.
return_info (bool) – whether to return info. If True, the result is a tuple (data, info). If False, the result is the data.
- Returns:
A tensordict containing a batch of data selected in the replay buffer. A tuple containing this tensordict and info if return_info flag is set to True.