EnvBase

class torchrl.envs.EnvBase(*args, **kwargs)[source]

Abstract environment parent class.

Keyword Arguments:

device (torch.device) – The device of the environment. Deviceless environments are allowed (device=None). If not None, all specs will be cast on that device and it is expected that all inputs and outputs will live on that device. Defaults to None.
batch_size (torch.Size or equivalent, optional) – batch-size of the environment. Corresponds to the leading dimension of all the input and output tensordicts the environment reads and writes. Defaults to an empty batch-size.
run_type_checks (bool, optional) – If True, type-checks will occur at every reset and every step. Defaults to False.
allow_done_after_reset (bool, optional) – if True, an environment can be done after a call to reset() is made. Defaults to False.
spec_locked (bool, optional) –
if True, the specs are locked and can only be modified if set_spec_lock_() is called.

Note

The locking is achieved by the EnvBase metaclass. It does not appear in the __init__ method and is included in the keyword arguments strictly for type-hinting purpose.

See also

Locking environment specs.

Defaults to True.
auto_reset (bool, optional) –
if True, the env is assumed to reset automatically when done. Defaults to False.

Note

The auto-resetting is achieved by the EnvBase metaclass. It does not appear in the __init__ method and is included in the keyword arguments strictly for type-hinting purpose.

See also

The auto-resetting environments API section in the API documentation.

Variables:

done_spec (Composite) – equivalent to full_done_spec as all done_specs contain at least a "done" and a "terminated" entry
action_spec (TensorSpec) – the spec of the action. Links to the spec of the leaf action if only one action tensor is to be expected. Otherwise links to full_action_spec.
observation_spec (Composite) – equivalent to full_observation_spec.
reward_spec (TensorSpec) – the spec of the reward. Links to the spec of the leaf reward if only one reward tensor is to be expected. Otherwise links to full_reward_spec.
state_spec (Composite) – equivalent to full_state_spec.
full_done_spec (Composite) – a composite spec such that full_done_spec.zero() returns a tensordict containing only the leaves encoding the done status of the environment.
full_action_spec (Composite) – a composite spec such that full_action_spec.zero() returns a tensordict containing only the leaves encoding the action of the environment.
full_observation_spec (Composite) – a composite spec such that full_observation_spec.zero() returns a tensordict containing only the leaves encoding the observation of the environment.
full_reward_spec (Composite) – a composite spec such that full_reward_spec.zero() returns a tensordict containing only the leaves encoding the reward of the environment.
full_state_spec (Composite) – a composite spec such that full_state_spec.zero() returns a tensordict containing only the leaves encoding the inputs (actions excluded) of the environment.
batch_size (torch.Size) – The batch-size of the environment.
device (torch.device) – the device where the input/outputs of the environment are to be expected. Can be None.
is_spec_locked (bool) – returns True if the specs are locked. See the spec_locked argument above.

step(TensorDictBase -> TensorDictBase)[source]: step in the environment

reset(TensorDictBase, optional -> TensorDictBase)[source]: reset the environment

set_seed(int -> int)[source]: sets the seed of the environment

rand_step(TensorDictBase, optional -> TensorDictBase)[source]: random step given the action spec

rollout(Callable, ... -> TensorDictBase)[source]: executes a rollout in the environment with the given policy (or random steps if no policy is provided)

Examples

>>> from torchrl.envs import EnvBase
>>> class CounterEnv(EnvBase):
...     def __init__(self, batch_size=(), device=None, **kwargs):
...         self.observation_spec = Composite(
...             count=Unbounded(batch_size, device=device, dtype=torch.int64))
...         self.action_spec = Unbounded(batch_size, device=device, dtype=torch.int8)
...         # done spec and reward spec are set automatically
...     def _step(self, tensordict):
...
>>> from torchrl.envs.libs.gym import GymEnv
>>> env = GymEnv("Pendulum-v1")
>>> env.batch_size  # how many envs are run at once
torch.Size([])
>>> env.input_spec
Composite(
    full_state_spec: None,
    full_action_spec: Composite(
        action: BoundedContinuous(
            shape=torch.Size([1]),
            space=ContinuousBox(
                low=Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.float32, contiguous=True),
                high=Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.float32, contiguous=True)),
            device=cpu,
            dtype=torch.float32,
            domain=continuous), device=cpu, shape=torch.Size([])), device=cpu, shape=torch.Size([]))
>>> env.action_spec
BoundedContinuous(
    shape=torch.Size([1]),
    space=ContinuousBox(
        low=Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.float32, contiguous=True),
        high=Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.float32, contiguous=True)),
    device=cpu,
    dtype=torch.float32,
    domain=continuous)
>>> env.observation_spec
Composite(
    observation: BoundedContinuous(
        shape=torch.Size([3]),
        space=ContinuousBox(
            low=Tensor(shape=torch.Size([3]), device=cpu, dtype=torch.float32, contiguous=True),
            high=Tensor(shape=torch.Size([3]), device=cpu, dtype=torch.float32, contiguous=True)),
        device=cpu,
        dtype=torch.float32,
        domain=continuous), device=cpu, shape=torch.Size([]))
>>> env.reward_spec
UnboundedContinuous(
    shape=torch.Size([1]),
    space=None,
    device=cpu,
    dtype=torch.float32,
    domain=continuous)
>>> env.done_spec
Categorical(
    shape=torch.Size([1]),
    space=DiscreteBox(n=2),
    device=cpu,
    dtype=torch.bool,
    domain=discrete)
>>> # the output_spec contains all the expected outputs
>>> env.output_spec
Composite(
    full_reward_spec: Composite(
        reward: UnboundedContinuous(
            shape=torch.Size([1]),
            space=None,
            device=cpu,
            dtype=torch.float32,
            domain=continuous), device=cpu, shape=torch.Size([])),
    full_observation_spec: Composite(
        observation: BoundedContinuous(
            shape=torch.Size([3]),
            space=ContinuousBox(
                low=Tensor(shape=torch.Size([3]), device=cpu, dtype=torch.float32, contiguous=True),
                high=Tensor(shape=torch.Size([3]), device=cpu, dtype=torch.float32, contiguous=True)),
            device=cpu,
            dtype=torch.float32,
            domain=continuous), device=cpu, shape=torch.Size([])),
    full_done_spec: Composite(
        done: Categorical(
            shape=torch.Size([1]),
            space=DiscreteBox(n=2),
            device=cpu,
            dtype=torch.bool,
            domain=discrete), device=cpu, shape=torch.Size([])), device=cpu, shape=torch.Size([]))

Note

Learn more about dynamic specs and environments here.

property action_key: NestedKey

The action key of an environment.

By default, this will be “action”.

If there is more than one action key in the environment, this function will raise an exception.

property action_keys: list[tensordict._nestedkey.NestedKey]

The action keys of an environment.

By default, there will only be one key named “action”.

Keys are sorted by depth in the data tree.

property action_spec: TensorSpec

The action spec.

The action_spec is always stored as a composite spec.

If the action spec is provided as a simple spec, this will be returned.

>>> env.action_spec = Unbounded(1)
>>> env.action_spec
UnboundedContinuous(
    shape=torch.Size([1]),
    space=ContinuousBox(
        low=Tensor(shape=torch.Size([]), device=cpu, dtype=torch.float32, contiguous=True),
        high=Tensor(shape=torch.Size([]), device=cpu, dtype=torch.float32, contiguous=True)),
    device=cpu,
    dtype=torch.float32,
    domain=continuous)

If the action spec is provided as a composite spec and contains only one leaf, this function will return just the leaf.

>>> env.action_spec = Composite({"nested": {"action": Unbounded(1)}})
>>> env.action_spec
UnboundedContinuous(
    shape=torch.Size([1]),
    space=ContinuousBox(
        low=Tensor(shape=torch.Size([]), device=cpu, dtype=torch.float32, contiguous=True),
        high=Tensor(shape=torch.Size([]), device=cpu, dtype=torch.float32, contiguous=True)),
    device=cpu,
    dtype=torch.float32,
    domain=continuous)

If the action spec is provided as a composite spec and has more than one leaf, this function will return the whole spec.

>>> env.action_spec = Composite({"nested": {"action": Unbounded(1), "another_action": Categorical(1)}})
>>> env.action_spec
Composite(
    nested: Composite(
        action: UnboundedContinuous(
            shape=torch.Size([1]),
            space=ContinuousBox(
                low=Tensor(shape=torch.Size([]), device=cpu, dtype=torch.float32, contiguous=True),
                high=Tensor(shape=torch.Size([]), device=cpu, dtype=torch.float32, contiguous=True)),
            device=cpu,
            dtype=torch.float32,
            domain=continuous),
        another_action: Categorical(
            shape=torch.Size([]),
            space=DiscreteBox(n=1),
            device=cpu,
            dtype=torch.int64,
            domain=discrete), device=cpu, shape=torch.Size([])), device=cpu, shape=torch.Size([]))

To retrieve the full spec passed, use:

>>> env.input_spec["full_action_spec"]

This property is mutable.

Examples

>>> from torchrl.envs.libs.gym import GymEnv
>>> env = GymEnv("Pendulum-v1")
>>> env.action_spec
BoundedContinuous(
    shape=torch.Size([1]),
    space=ContinuousBox(
        low=Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.float32, contiguous=True),
        high=Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.float32, contiguous=True)),
    device=cpu,
    dtype=torch.float32,
    domain=continuous)

property action_spec_unbatched: TensorSpec: Returns the action spec of the env as if it had no batch dimensions.

add_module(name: str, module: Optional[Module]) → None

Add a child module to the current module.

The module can be accessed as an attribute using the given name.

Parameters:

name (str) – name of the child module. The child module can be accessed from this module using the given name
module (Module) – child module to be added to the module.

add_truncated_keys() → EnvBase[source]: Adds truncated keys to the environment.

all_actions(tensordict: TensorDictBase | None = None) → TensorDictBase[source]

Generates all possible actions from the action spec.

This only works in environments with fully discrete actions.

Parameters:: tensordict (TensorDictBase, optional) – If given, reset() is called with this tensordict.
Returns:: a tensordict object with the “action” entry updated with a batch of all possible actions. The actions are stacked together in the leading dimension.

append_transform(transform: Transform | Callable[[TensorDictBase], TensorDictBase]) → EnvBase[source]

Returns a transformed environment where the callable/transform passed is applied.

Parameters:: transform (Transform or Callable[[TensorDictBase], TensorDictBase]) – the transform to apply to the environment.

Examples

>>> from torchrl.envs import GymEnv
>>> import torch
>>> env = GymEnv("CartPole-v1")
>>> loc = 0.5
>>> scale = 1.0
>>> transform = lambda data: data.set("observation", (data.get("observation") - loc)/scale)
>>> env = env.append_transform(transform=transform)
>>> print(env)
TransformedEnv(
    env=GymEnv(env=CartPole-v1, batch_size=torch.Size([]), device=cpu),
    transform=_CallableTransform(keys=[]))

apply(fn: Callable[[Module], None]) → T

Apply fn recursively to every submodule (as returned by .children()) as well as self.

Typical use includes initializing the parameters of a model (see also torch.nn.init).

Parameters:: fn (Module -> None) – function to be applied to each submodule
Returns:: self
Return type:: Module

Example:

>>> @torch.no_grad()
>>> def init_weights(m):
>>>     print(m)
>>>     if type(m) == nn.Linear:
>>>         m.weight.fill_(1.0)
>>>         print(m.weight)
>>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2))
>>> net.apply(init_weights)
Linear(in_features=2, out_features=2, bias=True)
Parameter containing:
tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
Linear(in_features=2, out_features=2, bias=True)
Parameter containing:
tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
)

auto_specs_(policy: Callable[[TensorDictBase], TensorDictBase], *, tensordict: TensorDictBase | None = None, action_key: NestedKey | list[NestedKey] = 'action', done_key: NestedKey | list[NestedKey] | None = None, observation_key: NestedKey | list[NestedKey] = 'observation', reward_key: NestedKey | list[NestedKey] = 'reward')[source]

Automatically sets the specifications (specs) of the environment based on a random rollout using a given policy.

This method performs a rollout using the provided policy to infer the input and output specifications of the environment. It updates the environment’s specs for actions, observations, rewards, and done signals based on the data collected during the rollout.

Parameters:

policy (Callable[[TensorDictBase], TensorDictBase]) – A callable policy that takes a TensorDictBase as input and returns a TensorDictBase as output. This policy is used to perform the rollout and determine the specs.

Keyword Arguments:

tensordict (TensorDictBase, optional) – An optional TensorDictBase instance to be used as the initial state for the rollout. If not provided, the environment’s reset method will be called to obtain the initial state.
action_key (NestedKey or List[NestedKey], optional) – The key(s) used to identify actions in the TensorDictBase. Defaults to “action”.
done_key (NestedKey or List[NestedKey], optional) – The key(s) used to identify done signals in the TensorDictBase. Defaults to None, which will attempt to use [“done”, “terminated”, “truncated”] as potential keys.
observation_key (NestedKey or List[NestedKey], optional) – The key(s) used to identify observations in the TensorDictBase. Defaults to “observation”.
reward_key (NestedKey or List[NestedKey], optional) – The key(s) used to identify rewards in the TensorDictBase. Defaults to “reward”.

Returns:

The environment instance with updated specs.

Return type:

EnvBase

Raises:

RuntimeError – If there are keys in the output specs that are not accounted for in the provided keys.

property batch_dims: int: Number of batch dimensions of the env.

property batch_locked: bool

Whether the environment can be used with a batch size different from the one it was initialized with or not.

If True, the env needs to be used with a tensordict having the same batch size as the env. batch_locked is an immutable property.

property batch_size: Size

Number of envs batched in this environment instance organised in a torch.Size() object.

Environment may be similar or different but it is assumed that they have little if not no interactions between them (e.g., multi-task or batched execution in parallel).

bfloat16() → T

Casts all floating point parameters and buffers to bfloat16 datatype.

Note

This method modifies the module in-place.

Returns:: self
Return type:: Module

buffers(recurse: bool = True) → Iterator[Tensor]

Return an iterator over module buffers.

Parameters:: recurse (bool) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module.
Yields:: torch.Tensor – module buffer

Example:

>>> # xdoctest: +SKIP("undefined vars")
>>> for buf in model.buffers():
>>>     print(type(buf), buf.size())
<class 'torch.Tensor'> (20L,)
<class 'torch.Tensor'> (20L, 1L, 5L, 5L)

cardinality(tensordict: TensorDictBase | None = None) → int[source]

The cardinality of the action space.

By default, this is just a wrapper around env.action_space.cardinality.

This class is useful when the action spec is variable:

The number of actions can be undefined, e.g., Categorical(n=-1);
The action cardinality may depend on the action mask;
The shape can be dynamic, as in Unbound(shape=(-1)).

In these cases, the cardinality() should be overwritten,

Parameters:: tensordict (TensorDictBase, optional) – a tensordict containing the data required to compute the cardinality.

check_env_specs(return_contiguous: bool | None = None, check_dtype=True, seed: int | None = None, tensordict: TensorDictBase | None = None)

Tests an environment specs against the results of short rollout.

This test function should be used as a sanity check for an env wrapped with torchrl’s EnvBase subclasses: any discrepancy between the expected data and the data collected should raise an assertion error.

A broken environment spec will likely make it impossible to use parallel environments.

Parameters:

env (EnvBase) – the env for which the specs have to be checked against data.
return_contiguous (bool, optional) – if True, the random rollout will be called with return_contiguous=True. This will fail in some cases (e.g. heterogeneous shapes of inputs/outputs). Defaults to None (determined by the presence of dynamic specs).
check_dtype (bool, optional) – if False, dtype checks will be skipped. Defaults to True.
seed (int, optional) – for reproducibility, a seed can be set. The seed will be set in pytorch temporarily, then the RNG state will be reverted to what it was before. For the env, we set the seed but since setting the rng state back to what is was isn’t a feature of most environment, we leave it to the user to accomplish that. Defaults to None.
tensordict (TensorDict, optional) – an optional tensordict instance to use for reset.

Caution: this function resets the env seed. It should be used “offline” to check that an env is adequately constructed, but it may affect the seeding of an experiment and as such should be kept out of training scripts.

children() → Iterator[Module]

Return an iterator over immediate children modules.

Yields:: Module – a child module

compile(*args, **kwargs)

Compile this Module’s forward using torch.compile().

This Module’s __call__ method is compiled and all arguments are passed as-is to torch.compile().

See torch.compile() for details on the arguments for this function.

cpu() → T

Move all model parameters and buffers to the CPU.

Note

This method modifies the module in-place.

Returns:: self
Return type:: Module

cuda(device: Optional[Union[int, device]] = None) → T

Move all model parameters and buffers to the GPU.

This also makes associated parameters and buffers different objects. So it should be called before constructing the optimizer if the module will live on GPU while being optimized.

Note

This method modifies the module in-place.

Parameters:: device (int, optional) – if specified, all parameters will be copied to that device
Returns:: self
Return type:: Module

property done_key

The done key of an environment.

By default, this will be “done”.

If there is more than one done key in the environment, this function will raise an exception.

property done_keys: list[tensordict._nestedkey.NestedKey]

The done keys of an environment.

By default, there will only be one key named “done”.

Keys are sorted by depth in the data tree.

property done_keys_groups

A list of done keys, grouped as the reset keys.

This is a list of lists. The outer list has the length of reset keys, the inner lists contain the done keys (eg, done and truncated) that can be read to determine a reset when it is absent.

property done_spec: TensorSpec

The done spec.

The done_spec is always stored as a composite spec.

If the done spec is provided as a simple spec, this will be returned.

>>> env.done_spec = Categorical(2, dtype=torch.bool)
>>> env.done_spec
Categorical(
    shape=torch.Size([]),
    space=DiscreteBox(n=2),
    device=cpu,
    dtype=torch.bool,
    domain=discrete)

If the done spec is provided as a composite spec and contains only one leaf, this function will return just the leaf.

>>> env.done_spec = Composite({"nested": {"done": Categorical(2, dtype=torch.bool)}})
>>> env.done_spec
Categorical(
    shape=torch.Size([]),
    space=DiscreteBox(n=2),
    device=cpu,
    dtype=torch.bool,
    domain=discrete)

If the done spec is provided as a composite spec and has more than one leaf, this function will return the whole spec.

>>> env.done_spec = Composite({"nested": {"done": Categorical(2, dtype=torch.bool), "another_done": Categorical(2, dtype=torch.bool)}})
>>> env.done_spec
Composite(
    nested: Composite(
        done: Categorical(
            shape=torch.Size([]),
            space=DiscreteBox(n=2),
            device=cpu,
            dtype=torch.bool,
            domain=discrete),
        another_done: Categorical(
            shape=torch.Size([]),
            space=DiscreteBox(n=2),
            device=cpu,
            dtype=torch.bool,
            domain=discrete), device=cpu, shape=torch.Size([])), device=cpu, shape=torch.Size([]))

To always retrieve the full spec passed, use:

>>> env.output_spec["full_done_spec"]

This property is mutable.

Examples

>>> from torchrl.envs.libs.gym import GymEnv
>>> env = GymEnv("Pendulum-v1")
>>> env.done_spec
Categorical(
    shape=torch.Size([1]),
    space=DiscreteBox(n=2),
    device=cpu,
    dtype=torch.bool,
    domain=discrete)

property done_spec_unbatched: TensorSpec: Returns the done spec of the env as if it had no batch dimensions.

double() → T

Casts all floating point parameters and buffers to double datatype.

Note

This method modifies the module in-place.

Returns:: self
Return type:: Module

empty_cache()[source]

Erases all the cached values.

For regular envs, the key lists (reward, done etc) are cached, but in some cases they may change during the execution of the code (eg, when adding a transform).

eval() → T

Set the module in evaluation mode.

This has an effect only on certain modules. See the documentation of particular modules for details of their behaviors in training/evaluation mode, i.e. whether they are affected, e.g. Dropout, BatchNorm, etc.

This is equivalent with self.train(False).

See Locally disabling gradient computation for a comparison between .eval() and several similar mechanisms that may be confused with it.

Returns:: self
Return type:: Module

extra_repr() → str

Return the extra representation of the module.

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

fake_tensordict() → TensorDictBase[source]: Returns a fake tensordict with key-value pairs that match in shape, device and dtype what can be expected during an environment rollout.

float() → T

Casts all floating point parameters and buffers to float datatype.

Note

This method modifies the module in-place.

Returns:: self
Return type:: Module

forward(*args, **kwargs)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

property full_action_spec: Composite

The full action spec.

full_action_spec is a Composite` instance that contains all the action entries.

Examples

>>> from torchrl.envs import BraxEnv
>>> for envname in BraxEnv.available_envs:
...     break
>>> env = BraxEnv(envname)
>>> env.full_action_spec
Composite(
    action: BoundedContinuous(
        shape=torch.Size([8]),
        space=ContinuousBox(
            low=Tensor(shape=torch.Size([8]), device=cpu, dtype=torch.float32, contiguous=True),
            high=Tensor(shape=torch.Size([8]), device=cpu, dtype=torch.float32, contiguous=True)),
        device=cpu,
        dtype=torch.float32,
        domain=continuous), device=cpu, shape=torch.Size([]))

property full_action_spec_unbatched: Composite: Returns the action spec of the env as if it had no batch dimensions.

property full_done_spec: Composite

The full done spec.

full_done_spec is a Composite` instance that contains all the done entries. It can be used to generate fake data with a structure that mimics the one obtained at runtime.

Examples

>>> import gymnasium
>>> from torchrl.envs import GymWrapper
>>> env = GymWrapper(gymnasium.make("Pendulum-v1"))
>>> env.full_done_spec
Composite(
    done: Categorical(
        shape=torch.Size([1]),
        space=DiscreteBox(n=2),
        device=cpu,
        dtype=torch.bool,
        domain=discrete),
    truncated: Categorical(
        shape=torch.Size([1]),
        space=DiscreteBox(n=2),
        device=cpu,
        dtype=torch.bool,
        domain=discrete), device=cpu, shape=torch.Size([]))

property full_done_spec_unbatched: Composite: Returns the done spec of the env as if it had no batch dimensions.

property full_observation_spec_unbatched: Composite: Returns the observation spec of the env as if it had no batch dimensions.

property full_reward_spec: Composite

The full reward spec.

full_reward_spec is a Composite` instance that contains all the reward entries.

Examples

>>> import gymnasium
>>> from torchrl.envs import GymWrapper, TransformedEnv, RenameTransform
>>> base_env = GymWrapper(gymnasium.make("Pendulum-v1"))
>>> env = TransformedEnv(base_env, RenameTransform("reward", ("nested", "reward")))
>>> env.full_reward_spec
Composite(
    nested: Composite(
        reward: UnboundedContinuous(
            shape=torch.Size([1]),
            space=ContinuousBox(
                low=Tensor(shape=torch.Size([]), device=cpu, dtype=torch.float32, contiguous=True),
                high=Tensor(shape=torch.Size([]), device=cpu, dtype=torch.float32, contiguous=True)),
            device=cpu,
            dtype=torch.float32,
            domain=continuous), device=None, shape=torch.Size([])), device=cpu, shape=torch.Size([]))

property full_reward_spec_unbatched: Composite: Returns the reward spec of the env as if it had no batch dimensions.

property full_state_spec: Composite

The full state spec.

full_state_spec is a Composite` instance that contains all the state entries (ie, the input data that is not action).

Examples

>>> from torchrl.envs import BraxEnv
>>> for envname in BraxEnv.available_envs:
...     break
>>> env = BraxEnv(envname)
>>> env.full_state_spec
Composite(
    state: Composite(
        pipeline_state: Composite(
            q: UnboundedContinuous(
                shape=torch.Size([15]),
                space=None,
                device=cpu,
                dtype=torch.float32,
                domain=continuous),
    [...], device=cpu, shape=torch.Size([])), device=cpu, shape=torch.Size([])), device=cpu, shape=torch.Size([]))

property full_state_spec_unbatched: Composite: Returns the state spec of the env as if it had no batch dimensions.

get_buffer(target: str) → Tensor

Return the buffer given by target if it exists, otherwise throw an error.

See the docstring for get_submodule for a more detailed explanation of this method’s functionality as well as how to correctly specify target.

Parameters:: target – The fully-qualified string name of the buffer to look for. (See get_submodule for how to specify a fully-qualified string.)
Returns:: The buffer referenced by target
Return type:: torch.Tensor
Raises:: AttributeError – If the target string references an invalid path or resolves to something that is not a buffer

get_extra_state() → Any

Return any extra state to include in the module’s state_dict.

Implement this and a corresponding set_extra_state() for your module if you need to store extra state. This function is called when building the module’s state_dict().

Note that extra state should be picklable to ensure working serialization of the state_dict. We only provide backwards compatibility guarantees for serializing Tensors; other objects may break backwards compatibility if their serialized pickled form changes.

Returns:: Any extra state to store in the module’s state_dict
Return type:: object

get_parameter(target: str) → Parameter

Return the parameter given by target if it exists, otherwise throw an error.

See the docstring for get_submodule for a more detailed explanation of this method’s functionality as well as how to correctly specify target.

Parameters:: target – The fully-qualified string name of the Parameter to look for. (See get_submodule for how to specify a fully-qualified string.)
Returns:: The Parameter referenced by target
Return type:: torch.nn.Parameter
Raises:: AttributeError – If the target string references an invalid path or resolves to something that is not an nn.Parameter

get_submodule(target: str) → Module

Return the submodule given by target if it exists, otherwise throw an error.

For example, let’s say you have an nn.Module A that looks like this:

A(
    (net_b): Module(
        (net_c): Module(
            (conv): Conv2d(16, 33, kernel_size=(3, 3), stride=(2, 2))
        )
        (linear): Linear(in_features=100, out_features=200, bias=True)
    )
)

(The diagram shows an nn.Module A. A which has a nested submodule net_b, which itself has two submodules net_c and linear. net_c then has a submodule conv.)

To check whether or not we have the linear submodule, we would call get_submodule("net_b.linear"). To check whether we have the conv submodule, we would call get_submodule("net_b.net_c.conv").

The runtime of get_submodule is bounded by the degree of module nesting in target. A query against named_modules achieves the same result, but it is O(N) in the number of transitive modules. So, for a simple check to see if some submodule exists, get_submodule should always be used.

Parameters:: target – The fully-qualified string name of the submodule to look for. (See above example for how to specify a fully-qualified string.)
Returns:: The submodule referenced by target
Return type:: torch.nn.Module
Raises:: AttributeError – If the target string references an invalid path or resolves to something that is not an nn.Module

half() → T

Casts all floating point parameters and buffers to half datatype.

Note

This method modifies the module in-place.

Returns:: self
Return type:: Module

property input_spec: TensorSpec

Input spec.

The composite spec containing all specs for data input to the environments.

It contains:

“full_action_spec”: the spec of the input actions
“full_state_spec”: the spec of all other environment inputs

This attribute is locked and should be read-only. Instead, to set the specs contained in it, use the respective properties.

Examples

>>> from torchrl.envs.libs.gym import GymEnv
>>> env = GymEnv("Pendulum-v1")
>>> env.input_spec
Composite(
    full_state_spec: None,
    full_action_spec: Composite(
        action: BoundedContinuous(
            shape=torch.Size([1]),
            space=ContinuousBox(
                low=Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.float32, contiguous=True),
                high=Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.float32, contiguous=True)),
            device=cpu,
            dtype=torch.float32,
            domain=continuous), device=cpu, shape=torch.Size([])), device=cpu, shape=torch.Size([]))

property input_spec_unbatched: Composite: Returns the input spec of the env as if it had no batch dimensions.

ipu(device: Optional[Union[int, device]] = None) → T

Move all model parameters and buffers to the IPU.

This also makes associated parameters and buffers different objects. So it should be called before constructing the optimizer if the module will live on IPU while being optimized.

Note

This method modifies the module in-place.

Parameters:: device (int, optional) – if specified, all parameters will be copied to that device
Returns:: self
Return type:: Module

property is_spec_locked

Gets whether the environment’s specs are locked.

This property can be modified directly.

Returns:: True if the specs are locked, False otherwise.
Return type:: bool

See also

Locking environment specs.

load_state_dict(state_dict: Mapping[str, Any], strict: bool = True, assign: bool = False)

Copy parameters and buffers from state_dict into this module and its descendants.

If strict is True, then the keys of state_dict must exactly match the keys returned by this module’s state_dict() function.

Warning

If assign is True the optimizer must be created after the call to load_state_dict unless get_swap_module_params_on_conversion() is True.

Parameters:

state_dict (dict) – a dict containing parameters and persistent buffers.
strict (bool, optional) – whether to strictly enforce that the keys in state_dict match the keys returned by this module’s state_dict() function. Default: True
assign (bool, optional) – When set to False, the properties of the tensors in the current module are preserved whereas setting it to True preserves properties of the Tensors in the state dict. The only exception is the requires_grad field of Default: ``False`

Returns:

missing_keys is a list of str containing any keys that are expected
by this module but missing from the provided state_dict.
unexpected_keys is a list of str containing the keys that are not
expected by this module but present in the provided state_dict.

Return type:

NamedTuple with missing_keys and unexpected_keys fields

Note

If a parameter or buffer is registered as None and its corresponding key exists in state_dict, load_state_dict() will raise a RuntimeError.

maybe_reset(tensordict: TensorDictBase) → TensorDictBase[source]

Checks the done keys of the input tensordict and, if needed, resets the environment where it is done.

Parameters:: tensordict (TensorDictBase) – a tensordict coming from the output of step_mdp().
Returns:: A tensordict that is identical to the input where the environment was not reset and contains the new reset data where the environment was reset.

modules() → Iterator[Module]

Return an iterator over all modules in the network.

Yields:: Module – a module in the network

Note

Duplicate modules are returned only once. In the following example, l will be returned only once.

Example:

>>> l = nn.Linear(2, 2)
>>> net = nn.Sequential(l, l)
>>> for idx, m in enumerate(net.modules()):
...     print(idx, '->', m)

0 -> Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
)
1 -> Linear(in_features=2, out_features=2, bias=True)

mtia(device: Optional[Union[int, device]] = None) → T

Move all model parameters and buffers to the MTIA.

This also makes associated parameters and buffers different objects. So it should be called before constructing the optimizer if the module will live on MTIA while being optimized.

Note

This method modifies the module in-place.

Parameters:: device (int, optional) – if specified, all parameters will be copied to that device
Returns:: self
Return type:: Module

named_buffers(prefix: str = '', recurse: bool = True, remove_duplicate: bool = True) → Iterator[Tuple[str, Tensor]]

Return an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.

Parameters:

prefix (str) – prefix to prepend to all buffer names.
recurse (bool, optional) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module. Defaults to True.
remove_duplicate (bool, optional) – whether to remove the duplicated buffers in the result. Defaults to True.

Yields:

(str, torch.Tensor) – Tuple containing the name and buffer

Example:

>>> # xdoctest: +SKIP("undefined vars")
>>> for name, buf in self.named_buffers():
>>>     if name in ['running_var']:
>>>         print(buf.size())

named_children() → Iterator[Tuple[str, Module]]

Return an iterator over immediate children modules, yielding both the name of the module as well as the module itself.

Yields:: (str, Module) – Tuple containing a name and child module

Example:

>>> # xdoctest: +SKIP("undefined vars")
>>> for name, module in model.named_children():
>>>     if name in ['conv4', 'conv5']:
>>>         print(module)

named_modules(memo: Optional[Set[Module]] = None, prefix: str = '', remove_duplicate: bool = True)

Return an iterator over all modules in the network, yielding both the name of the module as well as the module itself.

Parameters:

memo – a memo to store the set of modules already added to the result
prefix – a prefix that will be added to the name of the module
remove_duplicate – whether to remove the duplicated module instances in the result or not

Yields:

(str, Module) – Tuple of name and module

Note

Duplicate modules are returned only once. In the following example, l will be returned only once.

Example:

>>> l = nn.Linear(2, 2)
>>> net = nn.Sequential(l, l)
>>> for idx, m in enumerate(net.named_modules()):
...     print(idx, '->', m)

0 -> ('', Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
))
1 -> ('0', Linear(in_features=2, out_features=2, bias=True))

named_parameters(prefix: str = '', recurse: bool = True, remove_duplicate: bool = True) → Iterator[Tuple[str, Parameter]]

Return an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.

Parameters:

prefix (str) – prefix to prepend to all parameter names.
recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.
remove_duplicate (bool, optional) – whether to remove the duplicated parameters in the result. Defaults to True.

Yields:

(str, Parameter) – Tuple containing the name and parameter

Example:

>>> # xdoctest: +SKIP("undefined vars")
>>> for name, param in self.named_parameters():
>>>     if name in ['bias']:
>>>         print(param.size())

property observation_keys: list[tensordict._nestedkey.NestedKey]

The observation keys of an environment.

By default, there will only be one key named “observation”.

Keys are sorted by depth in the data tree.

property observation_spec: Composite

Observation spec.

Must be a torchrl.data.Composite instance. The keys listed in the spec are directly accessible after reset and step.

In TorchRL, even though they are not properly speaking “observations” all info, states, results of transforms etc. outputs from the environment are stored in the observation_spec.

Therefore, "observation_spec" should be thought as a generic data container for environment outputs that are not done or reward data.

Examples

>>> from torchrl.envs.libs.gym import GymEnv
>>> env = GymEnv("Pendulum-v1")
>>> env.observation_spec
Composite(
    observation: BoundedContinuous(
        shape=torch.Size([3]),
        space=ContinuousBox(
            low=Tensor(shape=torch.Size([3]), device=cpu, dtype=torch.float32, contiguous=True),
            high=Tensor(shape=torch.Size([3]), device=cpu, dtype=torch.float32, contiguous=True)),
        device=cpu,
        dtype=torch.float32,
        domain=continuous), device=cpu, shape=torch.Size([]))

property observation_spec_unbatched: Composite: Returns the observation spec of the env as if it had no batch dimensions.

property output_spec: TensorSpec

Output spec.

The composite spec containing all specs for data output from the environments.

It contains:

“full_reward_spec”: the spec of reward
“full_done_spec”: the spec of done
“full_observation_spec”: the spec of all other environment outputs

This attribute is locked and should be read-only. Instead, to set the specs contained in it, use the respective properties.

Examples

>>> from torchrl.envs.libs.gym import GymEnv
>>> env = GymEnv("Pendulum-v1")
>>> env.output_spec
Composite(
    full_reward_spec: Composite(
        reward: UnboundedContinuous(
            shape=torch.Size([1]),
            space=None,
            device=cpu,
            dtype=torch.float32,
            domain=continuous), device=cpu, shape=torch.Size([])),
    full_observation_spec: Composite(
        observation: BoundedContinuous(
            shape=torch.Size([3]),
            space=ContinuousBox(
                low=Tensor(shape=torch.Size([3]), device=cpu, dtype=torch.float32, contiguous=True),
                high=Tensor(shape=torch.Size([3]), device=cpu, dtype=torch.float32, contiguous=True)),
            device=cpu,
            dtype=torch.float32,
            domain=continuous), device=cpu, shape=torch.Size([])),
    full_done_spec: Composite(
        done: Categorical(
            shape=torch.Size([1]),
            space=DiscreteBox(n=2),
            device=cpu,
            dtype=torch.bool,
            domain=discrete), device=cpu, shape=torch.Size([])), device=cpu, shape=torch.Size([]))

property output_spec_unbatched: Composite: Returns the output spec of the env as if it had no batch dimensions.

parameters(recurse: bool = True) → Iterator[Parameter]

Return an iterator over module parameters.

This is typically passed to an optimizer.

Parameters:: recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.
Yields:: Parameter – module parameter

Example:

>>> # xdoctest: +SKIP("undefined vars")
>>> for param in model.parameters():
>>>     print(type(param), param.size())
<class 'torch.Tensor'> (20L,)
<class 'torch.Tensor'> (20L, 1L, 5L, 5L)

rand_action(tensordict: TensorDictBase | None = None)[source]

Performs a random action given the action_spec attribute.

Parameters:: tensordict (TensorDictBase, optional) – tensordict where the resulting action should be written.
Returns:: a tensordict object with the “action” entry updated with a random sample from the action-spec.

rand_step(tensordict: TensorDictBase | None = None) → TensorDictBase[source]

Performs a random step in the environment given the action_spec attribute.

Parameters:: tensordict (TensorDictBase, optional) – tensordict where the resulting info should be written.
Returns:: a tensordict object with the new observation after a random step in the environment. The action will be stored with the “action” key.

register_backward_hook(hook: Callable[[Module, Union[Tuple[Tensor, ...], Tensor], Union[Tuple[Tensor, ...], Tensor]], Union[None, Tuple[Tensor, ...], Tensor]]) → RemovableHandle

Register a backward hook on the module.

This function is deprecated in favor of register_full_backward_hook() and the behavior of this function will change in future versions.

Returns:: a handle that can be used to remove the added hook by calling handle.remove()
Return type:: torch.utils.hooks.RemovableHandle

register_buffer(name: str, tensor: Optional[Tensor], persistent: bool = True) → None

Add a buffer to the module.

This is typically used to register a buffer that should not to be considered a model parameter. For example, BatchNorm’s running_mean is not a parameter, but is part of the module’s state. Buffers, by default, are persistent and will be saved alongside parameters. This behavior can be changed by setting persistent to False. The only difference between a persistent buffer and a non-persistent buffer is that the latter will not be a part of this module’s state_dict.

Buffers can be accessed as attributes using given names.

Parameters:

name (str) – name of the buffer. The buffer can be accessed from this module using the given name
tensor (Tensor or None) – buffer to be registered. If None, then operations that run on buffers, such as cuda, are ignored. If None, the buffer is not included in the module’s state_dict.
persistent (bool) – whether the buffer is part of this module’s state_dict.

Example:

>>> # xdoctest: +SKIP("undefined vars")
>>> self.register_buffer('running_mean', torch.zeros(num_features))

register_forward_hook(hook: Union[Callable[[T, Tuple[Any, ...], Any], Optional[Any]], Callable[[T, Tuple[Any, ...], Dict[str, Any], Any], Optional[Any]]], *, prepend: bool = False, with_kwargs: bool = False, always_call: bool = False) → RemovableHandle

Register a forward hook on the module.

The hook will be called every time after forward() has computed an output.

If with_kwargs is False or not specified, the input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the forward. The hook can modify the output. It can modify the input inplace but it will not have effect on forward since this is called after forward() is called. The hook should have the following signature:

hook(module, args, output) -> None or modified output

If with_kwargs is True, the forward hook will be passed the kwargs given to the forward function and be expected to return the output possibly modified. The hook should have the following signature:

hook(module, args, kwargs, output) -> None or modified output

Parameters:

hook (Callable) – The user defined hook to be registered.
prepend (bool) – If True, the provided hook will be fired before all existing forward hooks on this torch.nn.modules.Module. Otherwise, the provided hook will be fired after all existing forward hooks on this torch.nn.modules.Module. Note that global forward hooks registered with register_module_forward_hook() will fire before all hooks registered by this method. Default: False
with_kwargs (bool) – If True, the hook will be passed the kwargs given to the forward function. Default: False
always_call (bool) – If True the hook will be run regardless of whether an exception is raised while calling the Module. Default: False

Returns:

a handle that can be used to remove the added hook by calling handle.remove()

Return type:

torch.utils.hooks.RemovableHandle

register_forward_pre_hook(hook: Union[Callable[[T, Tuple[Any, ...]], Optional[Any]], Callable[[T, Tuple[Any, ...], Dict[str, Any]], Optional[Tuple[Any, Dict[str, Any]]]]], *, prepend: bool = False, with_kwargs: bool = False) → RemovableHandle

Register a forward pre-hook on the module.

The hook will be called every time before forward() is invoked.

If with_kwargs is false or not specified, the input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the forward. The hook can modify the input. User can either return a tuple or a single modified value in the hook. We will wrap the value into a tuple if a single value is returned (unless that value is already a tuple). The hook should have the following signature:

hook(module, args) -> None or modified input

If with_kwargs is true, the forward pre-hook will be passed the kwargs given to the forward function. And if the hook modifies the input, both the args and kwargs should be returned. The hook should have the following signature:

hook(module, args, kwargs) -> None or a tuple of modified input and kwargs

Parameters:

hook (Callable) – The user defined hook to be registered.
prepend (bool) – If true, the provided hook will be fired before all existing forward_pre hooks on this torch.nn.modules.Module. Otherwise, the provided hook will be fired after all existing forward_pre hooks on this torch.nn.modules.Module. Note that global forward_pre hooks registered with register_module_forward_pre_hook() will fire before all hooks registered by this method. Default: False
with_kwargs (bool) – If true, the hook will be passed the kwargs given to the forward function. Default: False

Returns:

a handle that can be used to remove the added hook by calling handle.remove()

Return type:

torch.utils.hooks.RemovableHandle

register_full_backward_hook(hook: Callable[[Module, Union[Tuple[Tensor, ...], Tensor], Union[Tuple[Tensor, ...], Tensor]], Union[None, Tuple[Tensor, ...], Tensor]], prepend: bool = False) → RemovableHandle

Register a backward hook on the module.

The hook will be called every time the gradients with respect to a module are computed, i.e. the hook will execute if and only if the gradients with respect to module outputs are computed. The hook should have the following signature:

hook(module, grad_input, grad_output) -> tuple(Tensor) or None

The grad_input and grad_output are tuples that contain the gradients with respect to the inputs and outputs respectively. The hook should not modify its arguments, but it can optionally return a new gradient with respect to the input that will be used in place of grad_input in subsequent computations. grad_input will only correspond to the inputs given as positional arguments and all kwarg arguments are ignored. Entries in grad_input and grad_output will be None for all non-Tensor arguments.

For technical reasons, when this hook is applied to a Module, its forward function will receive a view of each Tensor passed to the Module. Similarly the caller will receive a view of each Tensor returned by the Module’s forward function.

Warning

Modifying inputs or outputs inplace is not allowed when using backward hooks and will raise an error.

Parameters:

hook (Callable) – The user-defined hook to be registered.
prepend (bool) – If true, the provided hook will be fired before all existing backward hooks on this torch.nn.modules.Module. Otherwise, the provided hook will be fired after all existing backward hooks on this torch.nn.modules.Module. Note that global backward hooks registered with register_module_full_backward_hook() will fire before all hooks registered by this method.

Returns:

a handle that can be used to remove the added hook by calling handle.remove()

Return type:

torch.utils.hooks.RemovableHandle

register_full_backward_pre_hook(hook: Callable[[Module, Union[Tuple[Tensor, ...], Tensor]], Union[None, Tuple[Tensor, ...], Tensor]], prepend: bool = False) → RemovableHandle

Register a backward pre-hook on the module.

The hook will be called every time the gradients for the module are computed. The hook should have the following signature:

hook(module, grad_output) -> tuple[Tensor] or None

The grad_output is a tuple. The hook should not modify its arguments, but it can optionally return a new gradient with respect to the output that will be used in place of grad_output in subsequent computations. Entries in grad_output will be None for all non-Tensor arguments.

For technical reasons, when this hook is applied to a Module, its forward function will receive a view of each Tensor passed to the Module. Similarly the caller will receive a view of each Tensor returned by the Module’s forward function.

Warning

Modifying inputs inplace is not allowed when using backward hooks and will raise an error.

Parameters:

hook (Callable) – The user-defined hook to be registered.
prepend (bool) – If true, the provided hook will be fired before all existing backward_pre hooks on this torch.nn.modules.Module. Otherwise, the provided hook will be fired after all existing backward_pre hooks on this torch.nn.modules.Module. Note that global backward_pre hooks registered with register_module_full_backward_pre_hook() will fire before all hooks registered by this method.

Returns:

a handle that can be used to remove the added hook by calling handle.remove()

Return type:

torch.utils.hooks.RemovableHandle

classmethod register_gym(id: str, *, entry_point: Callable | None = None, transform: Transform | None = None, info_keys: list[NestedKey] | None = None, backend: str = None, to_numpy: bool = False, reward_threshold: float | None = None, nondeterministic: bool = False, max_episode_steps: int | None = None, order_enforce: bool = True, autoreset: bool = False, disable_env_checker: bool = False, apply_api_compatibility: bool = False, **kwargs)[source]

Registers an environment in gym(nasium).

This method is designed with the following scopes in mind:

Incorporate a TorchRL-first environment in a framework that uses Gym;
Incorporate another environment (eg, DeepMind Control, Brax, Jumanji, …) in a framework that uses Gym.

Parameters:

id (str) – the name of the environment. Should follow the gym naming convention.

Keyword Arguments:

entry_point (callable, optional) –

the entry point to build the environment. If none is passed, the parent class will be used as entry point. Typically, this is used to register an environment that does not necessarily inherit from the base being used:

>>> from torchrl.envs import DMControlEnv
>>> DMControlEnv.register_gym("DMC-cheetah-v0", env_name="cheetah", task="run")
>>> # equivalently
>>> EnvBase.register_gym("DMC-cheetah-v0", entry_point=DMControlEnv, env_name="cheetah", task="run")

transform (torchrl.envs.Transform) – a transform (or list of transforms within a torchrl.envs.Compose instance) to be used with the env. This arg can be passed during a call to make() (see example below).
info_keys (List[NestedKey], optional) –
if provided, these keys will be used to build the info dictionary and will be excluded from the observation keys. This arg can be passed during a call to make() (see example below).

Warning

It may be the case that using info_keys makes a spec empty because the content has been moved to the info dictionary. Gym does not like empty Dict in the specs, so this empty content should be removed with RemoveEmptySpecs.
backend (str, optional) – the backend. Can be either “gym” or “gymnasium” or any other backend compatible with set_gym_backend.
to_numpy (bool, optional) – if True, the result of calls to step and reset will be mapped to numpy arrays. Defaults to False (results are tensors). This arg can be passed during a call to make() (see example below).
reward_threshold (float, optional) – [Gym kwarg] The reward threshold considered to have learnt an environment.
nondeterministic (bool, optional) – [Gym kwarg If the environment is nondeterministic (even with knowledge of the initial seed and all actions). Defaults to False.
max_episode_steps (int, optional) – [Gym kwarg] The maximum number of episodes steps before truncation. Used by the Time Limit wrapper.
order_enforce (bool, optional) – [Gym >= 0.14] Whether the order enforcer wrapper should be applied to ensure users run functions in the correct order. Defaults to True.
autoreset (bool, optional) – [Gym >= 0.14] Whether the autoreset wrapper should be added such that reset does not need to be called. Defaults to False.
disable_env_checker – [Gym >= 0.14] Whether the environment checker should be disabled for the environment. Defaults to False.
apply_api_compatibility – [Gym >= 0.26] If to apply the StepAPICompatibility wrapper. Defaults to False.
**kwargs – arbitrary keyword arguments which are passed to the environment constructor.

Note

TorchRL’s environment do not have the concept of an "info" dictionary, as TensorDict offers all the storage requirements deemed necessary in most training settings. Still, you can use the info_keys argument to have a fine grained control over what is deemed to be considered as an observation and what should be seen as info.

Examples

>>> # Register the "cheetah" env from DMControl with the "run" task
>>> from torchrl.envs import DMControlEnv
>>> import torch
>>> DMControlEnv.register_gym("DMC-cheetah-v0", to_numpy=False, backend="gym", env_name="cheetah", task_name="run")
>>> import gym
>>> envgym = gym.make("DMC-cheetah-v0")
>>> envgym.seed(0)
>>> torch.manual_seed(0)
>>> envgym.reset()
({'position': tensor([-0.0855,  0.0215, -0.0881, -0.0412, -0.1101,  0.0080,  0.0254,  0.0424],
       dtype=torch.float64), 'velocity': tensor([ 1.9609e-02, -1.9776e-04, -1.6347e-03,  3.3842e-02,  2.5338e-02,
         3.3064e-02,  1.0381e-04,  7.6656e-05,  1.0204e-02],
       dtype=torch.float64)}, {})
>>> envgym.step(envgym.action_space.sample())
({'position': tensor([-0.0833,  0.0275, -0.0612, -0.0770, -0.1256,  0.0082,  0.0186,  0.0476],
       dtype=torch.float64), 'velocity': tensor([ 0.2221,  0.2256,  0.5930,  2.6937, -3.5865, -1.5479,  0.0187, -0.6825,
         0.5224], dtype=torch.float64)}, tensor([0.0018], dtype=torch.float64), tensor([False]), tensor([False]), {})
>>> # same environment with observation stacked
>>> from torchrl.envs import CatTensors
>>> envgym = gym.make("DMC-cheetah-v0", transform=CatTensors(in_keys=["position", "velocity"], out_key="observation"))
>>> envgym.reset()
({'observation': tensor([-0.1005,  0.0335, -0.0268,  0.0133, -0.0627,  0.0074, -0.0488, -0.0353,
        -0.0075, -0.0069,  0.0098, -0.0058,  0.0033, -0.0157, -0.0004, -0.0381,
        -0.0452], dtype=torch.float64)}, {})
>>> # same environment with numpy observations
>>> envgym = gym.make("DMC-cheetah-v0", transform=CatTensors(in_keys=["position", "velocity"], out_key="observation"), to_numpy=True)
>>> envgym.reset()
({'observation': array([-0.11355747,  0.04257728,  0.00408397,  0.04155852, -0.0389733 ,
       -0.01409826, -0.0978704 , -0.08808327,  0.03970837,  0.00535434,
       -0.02353762,  0.05116226,  0.02788907,  0.06848346,  0.05154399,
        0.0371798 ,  0.05128025])}, {})
>>> # If gymnasium is installed, we can register the environment there too.
>>> DMControlEnv.register_gym("DMC-cheetah-v0", to_numpy=False, backend="gymnasium", env_name="cheetah", task_name="run")
>>> import gymnasium
>>> envgym = gymnasium.make("DMC-cheetah-v0")
>>> envgym.seed(0)
>>> torch.manual_seed(0)
>>> envgym.reset()
({'position': tensor([-0.0855,  0.0215, -0.0881, -0.0412, -0.1101,  0.0080,  0.0254,  0.0424],
       dtype=torch.float64), 'velocity': tensor([ 1.9609e-02, -1.9776e-04, -1.6347e-03,  3.3842e-02,  2.5338e-02,
         3.3064e-02,  1.0381e-04,  7.6656e-05,  1.0204e-02],
       dtype=torch.float64)}, {})

Note

This feature also works for stateless environments (eg, BraxEnv).

>>> import gymnasium
>>> import torch
>>> from tensordict import TensorDict
>>> from torchrl.envs import BraxEnv, SelectTransform
>>>
>>> # get action for dydactic purposes
>>> env = BraxEnv("ant", batch_size=[2])
>>> env.set_seed(0)
>>> torch.manual_seed(0)
>>> td = env.rollout(10)
>>>
>>> actions = td.get("action")
>>>
>>> # register env
>>> env.register_gym("Brax-Ant-v0", env_name="ant", batch_size=[2], info_keys=["state"])
>>> gym_env = gymnasium.make("Brax-Ant-v0")
>>> gym_env.seed(0)
>>> torch.manual_seed(0)
>>>
>>> gym_env.reset()
>>> obs = []
>>> for i in range(10):
...     obs, reward, terminated, truncated, info = gym_env.step(td[..., i].get("action"))

register_load_state_dict_post_hook(hook)

Register a post-hook to be run after module’s load_state_dict() is called.

It should have the following signature::: hook(module, incompatible_keys) -> None

The module argument is the current module that this hook is registered on, and the incompatible_keys argument is a NamedTuple consisting of attributes missing_keys and unexpected_keys. missing_keys is a list of str containing the missing keys and unexpected_keys is a list of str containing the unexpected keys.

The given incompatible_keys can be modified inplace if needed.

Note that the checks performed when calling load_state_dict() with strict=True are affected by modifications the hook makes to missing_keys or unexpected_keys, as expected. Additions to either set of keys will result in an error being thrown when strict=True, and clearing out both missing and unexpected keys will avoid an error.

Returns:: a handle that can be used to remove the added hook by calling handle.remove()
Return type:: torch.utils.hooks.RemovableHandle

register_load_state_dict_pre_hook(hook)

Register a pre-hook to be run before module’s load_state_dict() is called.

It should have the following signature::: hook(module, state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs) -> None # noqa: B950

Parameters:: hook (Callable) – Callable hook that will be invoked before loading the state dict.

register_module(name: str, module: Optional[Module]) → None: Alias for add_module().

register_parameter(name: str, param: Optional[Parameter]) → None

Add a parameter to the module.

The parameter can be accessed as an attribute using given name.

Parameters:

name (str) – name of the parameter. The parameter can be accessed from this module using the given name
param (Parameter or None) – parameter to be added to the module. If None, then operations that run on parameters, such as cuda, are ignored. If None, the parameter is not included in the module’s state_dict.

register_state_dict_post_hook(hook)

Register a post-hook for the state_dict() method.

It should have the following signature::: hook(module, state_dict, prefix, local_metadata) -> None

The registered hooks can modify the state_dict inplace.

register_state_dict_pre_hook(hook)

Register a pre-hook for the state_dict() method.

It should have the following signature::: hook(module, prefix, keep_vars) -> None

The registered hooks can be used to perform pre-processing before the state_dict call is made.

requires_grad_(requires_grad: bool = True) → T

Change if autograd should record operations on parameters in this module.

This method sets the parameters’ requires_grad attributes in-place.

This method is helpful for freezing part of the module for finetuning or training parts of a model individually (e.g., GAN training).

See Locally disabling gradient computation for a comparison between .requires_grad_() and several similar mechanisms that may be confused with it.

Parameters:: requires_grad (bool) – whether autograd should record operations on parameters in this module. Default: True.
Returns:: self
Return type:: Module

reset(tensordict: TensorDictBase | None = None, **kwargs) → TensorDictBase[source]

Resets the environment.

As for step and _step, only the private method _reset should be overwritten by EnvBase subclasses.

Parameters:

tensordict (TensorDictBase, optional) – tensordict to be used to contain the resulting new observation. In some cases, this input can also be used to pass argument to the reset function.
kwargs (optional) – other arguments to be passed to the native reset function.

Returns:

a tensordict (or the input tensordict, if any), modified in place with the resulting observations.

Note

reset should not be overwritten by EnvBase subclasses. The method to modify is _reset().

property reset_keys: list[tensordict._nestedkey.NestedKey]

Returns a list of reset keys.

Reset keys are keys that indicate partial reset, in batched, multitask or multiagent settings. They are structured as (*prefix, "_reset") where prefix is a (possibly empty) tuple of strings pointing to a tensordict location where a done state can be found.

Keys are sorted by depth in the data tree.

property reward_key

The reward key of an environment.

By default, this will be “reward”.

If there is more than one reward key in the environment, this function will raise an exception.

property reward_keys: list[tensordict._nestedkey.NestedKey]

The reward keys of an environment.

By default, there will only be one key named “reward”.

Keys are sorted by depth in the data tree.

property reward_spec: TensorSpec

The reward spec.

The reward_spec is always stored as a composite spec.

If the reward spec is provided as a simple spec, this will be returned.

>>> env.reward_spec = Unbounded(1)
>>> env.reward_spec
UnboundedContinuous(
    shape=torch.Size([1]),
    space=ContinuousBox(
        low=Tensor(shape=torch.Size([]), device=cpu, dtype=torch.float32, contiguous=True),
        high=Tensor(shape=torch.Size([]), device=cpu, dtype=torch.float32, contiguous=True)),
    device=cpu,
    dtype=torch.float32,
    domain=continuous)

If the reward spec is provided as a composite spec and contains only one leaf, this function will return just the leaf.

>>> env.reward_spec = Composite({"nested": {"reward": Unbounded(1)}})
>>> env.reward_spec
UnboundedContinuous(
    shape=torch.Size([1]),
    space=ContinuousBox(
        low=Tensor(shape=torch.Size([]), device=cpu, dtype=torch.float32, contiguous=True),
        high=Tensor(shape=torch.Size([]), device=cpu, dtype=torch.float32, contiguous=True)),
    device=cpu,
    dtype=torch.float32,
    domain=continuous)

If the reward spec is provided as a composite spec and has more than one leaf, this function will return the whole spec.

>>> env.reward_spec = Composite({"nested": {"reward": Unbounded(1), "another_reward": Categorical(1)}})
>>> env.reward_spec
Composite(
    nested: Composite(
        reward: UnboundedContinuous(
            shape=torch.Size([1]),
            space=ContinuousBox(
                low=Tensor(shape=torch.Size([]), device=cpu, dtype=torch.float32, contiguous=True),
                high=Tensor(shape=torch.Size([]), device=cpu, dtype=torch.float32, contiguous=True)),
            device=cpu,
            dtype=torch.float32,
            domain=continuous),
        another_reward: Categorical(
            shape=torch.Size([]),
            space=DiscreteBox(n=1),
            device=cpu,
            dtype=torch.int64,
            domain=discrete), device=cpu, shape=torch.Size([])), device=cpu, shape=torch.Size([]))

To retrieve the full spec passed, use:

>>> env.output_spec["full_reward_spec"]

This property is mutable.

Examples

>>> from torchrl.envs.libs.gym import GymEnv
>>> env = GymEnv("Pendulum-v1")
>>> env.reward_spec
UnboundedContinuous(
    shape=torch.Size([1]),
    space=None,
    device=cpu,
    dtype=torch.float32,
    domain=continuous)

property reward_spec_unbatched: TensorSpec: Returns the reward spec of the env as if it had no batch dimensions.

rollout(max_steps: int, policy: Callable[[TensorDictBase], TensorDictBase] | None = None, callback: Callable[[TensorDictBase, ...], Any] | None = None, *, auto_reset: bool = True, auto_cast_to_device: bool = False, break_when_any_done: bool | None = None, break_when_all_done: bool | None = None, return_contiguous: bool | None = False, tensordict: TensorDictBase | None = None, set_truncated: bool = False, out=None, trust_policy: bool = False) → TensorDictBase[source]

Executes a rollout in the environment.

The function will return as soon as any of the contained environments reaches any of the done states.

Parameters:

max_steps (int) – maximum number of steps to be executed. The actual number of steps can be smaller if the environment reaches a done state before max_steps have been executed.
policy (callable, optional) – callable to be called to compute the desired action. If no policy is provided, actions will be called using env.rand_step(). The policy can be any callable that reads either a tensordict or the entire sequence of observation entries __sorted as__ the env.observation_spec.keys(). Defaults to None.
callback (Callable[[TensorDict], Any], optional) – function to be called at each iteration with the given TensorDict. Defaults to None. The output of callback will not be collected, it is the user responsibility to save any result within the callback call if data needs to be carried over beyond the call to rollout.

Keyword Arguments:

auto_reset (bool, optional) – if True, the contained environments will be reset before starting the rollout. If False, then the rollout will continue from a previous state, which requires the tensordict argument to be passed with the previous rollout. Default is True.
auto_cast_to_device (bool, optional) – if True, the device of the tensordict is automatically cast to the policy device before the policy is used. Default is False.
break_when_any_done (bool) – if True, break when any of the contained environments reaches any of the done states. If False, then the done environments are reset automatically. Default is True.
break_when_all_done (bool, optional) – if True, break if all of the contained environments reach any of the done states. If False, break if at least one environment reaches any of the done states. Default is False.
return_contiguous (bool) – if False, a LazyStackedTensorDict will be returned. Default is True if the env does not have dynamic specs, otherwise False.
tensordict (TensorDict, optional) – if auto_reset is False, an initial tensordict must be provided. Rollout will check if this tensordict has done flags and reset the environment in those dimensions (if needed). This normally should not occur if tensordict is the output of a reset, but can occur if tensordict is the last step of a previous rollout. A tensordict can also be provided when auto_reset=True if metadata need to be passed to the reset method, such as a batch-size or a device for stateless environments.
set_truncated (bool, optional) – if True, "truncated" and "done" keys will be set to True after completion of the rollout. If no "truncated" is found within the done_spec, an exception is raised. Truncated keys can be set through env.add_truncated_keys. Defaults to False.
trust_policy (bool, optional) – if True, a non-TensorDictModule policy will be trusted to be assumed to be compatible with the collector. This defaults to True for CudaGraphModules and False otherwise.

Returns:

TensorDict object containing the resulting trajectory.

The data returned will be marked with a “time” dimension name for the last dimension of the tensordict (at the env.ndim index).

rollout is quite handy to display what the data structure of the environment looks like.

Examples

>>> # Using rollout without a policy
>>> from torchrl.envs.libs.gym import GymEnv
>>> from torchrl.envs.transforms import TransformedEnv, StepCounter
>>> env = TransformedEnv(GymEnv("Pendulum-v1"), StepCounter(max_steps=20))
>>> rollout = env.rollout(max_steps=1000)
>>> print(rollout)
TensorDict(
    fields={
        action: Tensor(shape=torch.Size([20, 1]), device=cpu, dtype=torch.float32, is_shared=False),
        done: Tensor(shape=torch.Size([20, 1]), device=cpu, dtype=torch.bool, is_shared=False),
        next: TensorDict(
            fields={
                done: Tensor(shape=torch.Size([20, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                observation: Tensor(shape=torch.Size([20, 3]), device=cpu, dtype=torch.float32, is_shared=False),
                reward: Tensor(shape=torch.Size([20, 1]), device=cpu, dtype=torch.float32, is_shared=False),
                step_count: Tensor(shape=torch.Size([20, 1]), device=cpu, dtype=torch.int64, is_shared=False),
                truncated: Tensor(shape=torch.Size([20, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
            batch_size=torch.Size([20]),
            device=cpu,
            is_shared=False),
        observation: Tensor(shape=torch.Size([20, 3]), device=cpu, dtype=torch.float32, is_shared=False),
        step_count: Tensor(shape=torch.Size([20, 1]), device=cpu, dtype=torch.int64, is_shared=False),
        truncated: Tensor(shape=torch.Size([20, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
    batch_size=torch.Size([20]),
    device=cpu,
    is_shared=False)
>>> print(rollout.names)
['time']
>>> # with envs that contain more dimensions
>>> from torchrl.envs import SerialEnv
>>> env = SerialEnv(3, lambda: TransformedEnv(GymEnv("Pendulum-v1"), StepCounter(max_steps=20)))
>>> rollout = env.rollout(max_steps=1000)
>>> print(rollout)
TensorDict(
    fields={
        action: Tensor(shape=torch.Size([3, 20, 1]), device=cpu, dtype=torch.float32, is_shared=False),
        done: Tensor(shape=torch.Size([3, 20, 1]), device=cpu, dtype=torch.bool, is_shared=False),
        next: TensorDict(
            fields={
                done: Tensor(shape=torch.Size([3, 20, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                observation: Tensor(shape=torch.Size([3, 20, 3]), device=cpu, dtype=torch.float32, is_shared=False),
                reward: Tensor(shape=torch.Size([3, 20, 1]), device=cpu, dtype=torch.float32, is_shared=False),
                step_count: Tensor(shape=torch.Size([3, 20, 1]), device=cpu, dtype=torch.int64, is_shared=False),
                truncated: Tensor(shape=torch.Size([3, 20, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
            batch_size=torch.Size([3, 20]),
            device=cpu,
            is_shared=False),
        observation: Tensor(shape=torch.Size([3, 20, 3]), device=cpu, dtype=torch.float32, is_shared=False),
        step_count: Tensor(shape=torch.Size([3, 20, 1]), device=cpu, dtype=torch.int64, is_shared=False),
        truncated: Tensor(shape=torch.Size([3, 20, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
    batch_size=torch.Size([3, 20]),
    device=cpu,
    is_shared=False)
>>> print(rollout.names)
[None, 'time']

Using a policy (a regular Module or a TensorDictModule) is also easy:

Examples

>>> from torch import nn
>>> env = GymEnv("CartPole-v1", categorical_action_encoding=True)
>>> class ArgMaxModule(nn.Module):
...     def forward(self, values):
...         return values.argmax(-1)
>>> n_obs = env.observation_spec["observation"].shape[-1]
>>> n_act = env.action_spec.n
>>> # A deterministic policy
>>> policy = nn.Sequential(
...     nn.Linear(n_obs, n_act),
...     ArgMaxModule())
>>> env.rollout(max_steps=10, policy=policy)
TensorDict(
    fields={
        action: Tensor(shape=torch.Size([10]), device=cpu, dtype=torch.int64, is_shared=False),
        done: Tensor(shape=torch.Size([10, 1]), device=cpu, dtype=torch.bool, is_shared=False),
        next: TensorDict(
            fields={
                done: Tensor(shape=torch.Size([10, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                observation: Tensor(shape=torch.Size([10, 4]), device=cpu, dtype=torch.float32, is_shared=False),
                reward: Tensor(shape=torch.Size([10, 1]), device=cpu, dtype=torch.float32, is_shared=False),
                terminated: Tensor(shape=torch.Size([10, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                truncated: Tensor(shape=torch.Size([10, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
            batch_size=torch.Size([10]),
            device=cpu,
            is_shared=False),
        observation: Tensor(shape=torch.Size([10, 4]), device=cpu, dtype=torch.float32, is_shared=False),
        terminated: Tensor(shape=torch.Size([10, 1]), device=cpu, dtype=torch.bool, is_shared=False),
        truncated: Tensor(shape=torch.Size([10, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
    batch_size=torch.Size([10]),
    device=cpu,
    is_shared=False)
>>> # Under the hood, rollout will wrap the policy in a TensorDictModule
>>> # To speed things up we can do that ourselves
>>> from tensordict.nn import TensorDictModule
>>> policy = TensorDictModule(policy, in_keys=list(env.observation_spec.keys()), out_keys=["action"])
>>> env.rollout(max_steps=10, policy=policy)
TensorDict(
    fields={
        action: Tensor(shape=torch.Size([10]), device=cpu, dtype=torch.int64, is_shared=False),
        done: Tensor(shape=torch.Size([10, 1]), device=cpu, dtype=torch.bool, is_shared=False),
        next: TensorDict(
            fields={
                done: Tensor(shape=torch.Size([10, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                observation: Tensor(shape=torch.Size([10, 4]), device=cpu, dtype=torch.float32, is_shared=False),
                reward: Tensor(shape=torch.Size([10, 1]), device=cpu, dtype=torch.float32, is_shared=False),
                terminated: Tensor(shape=torch.Size([10, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                truncated: Tensor(shape=torch.Size([10, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
            batch_size=torch.Size([10]),
            device=cpu,
            is_shared=False),
        observation: Tensor(shape=torch.Size([10, 4]), device=cpu, dtype=torch.float32, is_shared=False),
        terminated: Tensor(shape=torch.Size([10, 1]), device=cpu, dtype=torch.bool, is_shared=False),
        truncated: Tensor(shape=torch.Size([10, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
    batch_size=torch.Size([10]),
    device=cpu,
    is_shared=False)

In some instances, contiguous tensordict cannot be obtained because they cannot be stacked. This can happen when the data returned at each step may have a different shape, or when different environments are executed together. In that case, return_contiguous=False will cause the returned tensordict to be a lazy stack of tensordicts:

Examples of non-contiguous rollout:

>>> rollout = env.rollout(4, return_contiguous=False)
>>> print(rollout)
LazyStackedTensorDict(
    fields={
        action: Tensor(shape=torch.Size([3, 4, 1]), device=cpu, dtype=torch.float32, is_shared=False),
        done: Tensor(shape=torch.Size([3, 4, 1]), device=cpu, dtype=torch.bool, is_shared=False),
        next: LazyStackedTensorDict(
            fields={
                done: Tensor(shape=torch.Size([3, 4, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                observation: Tensor(shape=torch.Size([3, 4, 3]), device=cpu, dtype=torch.float32, is_shared=False),
                reward: Tensor(shape=torch.Size([3, 4, 1]), device=cpu, dtype=torch.float32, is_shared=False),
                step_count: Tensor(shape=torch.Size([3, 4, 1]), device=cpu, dtype=torch.int64, is_shared=False),
                truncated: Tensor(shape=torch.Size([3, 4, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
            batch_size=torch.Size([3, 4]),
            device=cpu,
            is_shared=False),
        observation: Tensor(shape=torch.Size([3, 4, 3]), device=cpu, dtype=torch.float32, is_shared=False),
        step_count: Tensor(shape=torch.Size([3, 4, 1]), device=cpu, dtype=torch.int64, is_shared=False),
        truncated: Tensor(shape=torch.Size([3, 4, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
    batch_size=torch.Size([3, 4]),
    device=cpu,
    is_shared=False)
    >>> print(rollout.names)
    [None, 'time']

Rollouts can be used in a loop to emulate data collection. To do so, you need to pass as input the last tensordict coming from the previous rollout after calling step_mdp() on it.

Examples of data collection rollouts:

>>> from torchrl.envs import GymEnv, step_mdp
>>> env = GymEnv("CartPole-v1")
>>> epochs = 10
>>> input_td = env.reset()
>>> for i in range(epochs):
...     rollout_td = env.rollout(
...         max_steps=100,
...         break_when_any_done=False,
...         auto_reset=False,
...         tensordict=input_td,
...     )
...     input_td = step_mdp(
...         rollout_td[..., -1],
...     )

set_extra_state(state: Any) → None

Set extra state contained in the loaded state_dict.

This function is called from load_state_dict() to handle any extra state found within the state_dict. Implement this function and a corresponding get_extra_state() for your module if you need to store extra state within its state_dict.

Parameters:: state (dict) – Extra state from the state_dict

set_seed(seed: int | None = None, static_seed: bool = False) → int | None[source]

Sets the seed of the environment and returns the next seed to be used (which is the input seed if a single environment is present).

Parameters:

seed (int) – seed to be set. The seed is set only locally in the environment. To handle the global seed, see manual_seed().
static_seed (bool, optional) – if True, the seed is not incremented. Defaults to False

Returns:

i.e. the seed that should be used for another environment if created concomitantly to this environment.

Return type:

integer representing the “next seed”

set_spec_lock_(mode: bool = True) → EnvBase[source]

Locks or unlocks the environment’s specs.

Parameters:: mode (bool) – Whether to lock (True) or unlock (False) the specs. Defaults to True.
Returns:: The environment instance itself.
Return type:: EnvBase

EnvBase

Docs

Tutorials

Resources