tensordict package¶
The TensorDict
class simplifies the process of passing multiple tensors
from module to module by packing them in a dictionary-like object that inherits features from
regular pytorch tensors.
TensorDictBase is an abstract parent class for TensorDicts, a torch.Tensor data container. |
|
|
A batched dictionary of tensors. |
|
A Lazy stack of TensorDicts. |
|
Persistent TensorDict implementation. |
|
Holds a TensorDictBase instance full of parameters. |
|
Returns the status of get default value. |
Constructors and handlers¶
The library offers a few method to interact with other data structures such as numpy structured arrays, namedtuples or
h5 files. The library also exposes dedicated functions to manipulate tensordicts such as save
, load
, stack
or cat
.
|
Concatenates tensordicts into a single tensordict along the given dimension. |
|
Reconstructs a tensordict from a consolidated file. |
|
Returns a TensorDict created from a dictionary or another |
|
Creates a PersistentTensorDict from a h5 file. |
|
Copies the params and buffers of a module in a tensordict. |
|
Retrieves the parameters of several modules for ensebmle learning/feature of expects applications through vmap. |
|
Converts a namedtuple to a TensorDict recursively. |
|
Converts a pytree to a TensorDict instance. |
|
Converts a structured numpy array to a TensorDict. |
|
Creates a tensordict from a list of keys and a single value. |
|
|
|
Creates a lazy stack of tensordicts. |
|
Loads a tensordict from disk. |
|
Loads a memory-mapped tensordict from disk. |
|
Attempts to make a dense stack of tensordicts, and falls back on lazy stack when required.. |
|
Writes all tensors onto a corresponding memory-mapped Tensor in a new tensordict. |
|
Saves the tensordict to disk. |
|
Stacks tensordicts into a single tensordict along the given dimension. |
TensorDict as a context manager¶
TensorDict
can be used as a context manager in situations
where an action has to be done and then undone. This include temporarily
locking/unlocking a tensordict
>>> data.lock_() # data.set will result in an exception
>>> with data.unlock_():
... data.set("key", value)
>>> assert data.is_locked()
or to execute functional calls with a TensorDict instance containing the parameters and buffers of a model:
>>> params = TensorDict.from_module(module).clone()
>>> params.zero_()
>>> with params.to_module(module):
... y = module(x)
In the first example, we can modify the tensordict data because we have temporarily unlocked it. In the second example, we populate the module with the parameters and buffers contained in the params tensordict instance, and reset the original parameters after this call is completed.
Memory-mapped tensors¶
tensordict offers the MemoryMappedTensor
primitive which
allows you to work with tensors stored in physical memory in a handy way.
The main advantages of MemoryMappedTensor
are its ease of construction (no need to handle the storage of a tensor),
the possibility to work with big contiguous data that would not fit in memory,
an efficient (de)serialization across processes and efficient indexing of
stored tensors.
If all workers have access to the same storage (both in multiprocess and distributed
settings), passing a MemoryMappedTensor
will just consist in passing a reference to a file on disk plus a bunch of
extra meta-data for reconstructing it. The same goes with indexed memory-mapped
tensors as long as the data-pointer of their storage is the same as the original
one.
Indexing memory-mapped tensors is much faster than loading several independent files from the disk and does not require to load the full content of the array in memory. However, physical storage of PyTorch tensors should not be any different:
>>> my_images = MemoryMappedTensor.empty((1_000_000, 3, 480, 480), dtype=torch.unint8)
>>> mini_batch = my_images[:10] # just reads the first 10 images of the dataset
|
A Memory-mapped Tensor. |
Utils¶
|
Expand a tensor on the right to match another tensor shape. |
|
Expand a tensor on the right to match a desired shape. |
|
Tests if each element of |
|
Removes indices duplicated in key along the specified dimension. |
|
|
|
Checks if a data object or a type is a tensor container from the tensordict lib. |
|
Returns a TensorDict created from the keyword arguments or an input dictionary. |
|
Merges tensordicts together. |
|
Pads all tensors in a tensordict along the batch dimensions with a constant value, returning a new tensordict. |
|
Pads a list of tensordicts in order for them to be stacked together in a contiguous format. |
|
Densely stack a list of |
|
Sets the behaviour of some methods to a lazy transform. |
|
Returns True if lazy representations will be used for selected methods. |
Parse a TensorDict repr to a TensorDict. |