EndOfLifeTransform¶
- class torchrl.envs.transforms.EndOfLifeTransform(eol_key: Union[str, Tuple[str, ...]] = 'end-of-life', lives_key: Union[str, Tuple[str, ...]] = 'lives', done_key: Union[str, Tuple[str, ...]] = 'done', eol_attribute='unwrapped.ale.lives')[source]¶
Registers the end-of-life signal from a Gym env with a lives method.
Proposed by DeepMind for the DQN and co. It helps value estimation.
- Parameters:
eol_key (NestedKey, optional) – the key where the end-of-life signal should be written. Defaults to
"end-of-life"
.done_key (NestedKey, optional) – a “done” key in the parent env done_spec, where the done value can be retrieved. This key must be unique and its shape must match the shape of the end-of-life entry. Defaults to
"done"
.eol_attribute (str, optional) – the location of the “lives” in the gym env. Defaults to
"unwrapped.ale.lives"
. Supported attribute types are integer/array-like objects or callables that return these values.
Note
This transform should be used with gym envs that have a
env.unwrapped.ale.lives
.Examples
>>> from torchrl.envs.libs.gym import GymEnv >>> from torchrl.envs.transforms.transforms import TransformedEnv >>> env = GymEnv("ALE/Breakout-v5") >>> env.rollout(100) TensorDict( fields={ action: Tensor(shape=torch.Size([100, 4]), device=cpu, dtype=torch.int64, is_shared=False), done: Tensor(shape=torch.Size([100, 1]), device=cpu, dtype=torch.bool, is_shared=False), next: TensorDict( fields={ done: Tensor(shape=torch.Size([100, 1]), device=cpu, dtype=torch.bool, is_shared=False), pixels: Tensor(shape=torch.Size([100, 210, 160, 3]), device=cpu, dtype=torch.uint8, is_shared=False), reward: Tensor(shape=torch.Size([100, 1]), device=cpu, dtype=torch.float32, is_shared=False), terminated: Tensor(shape=torch.Size([100, 1]), device=cpu, dtype=torch.bool, is_shared=False), truncated: Tensor(shape=torch.Size([100, 1]), device=cpu, dtype=torch.bool, is_shared=False)}, batch_size=torch.Size([100]), device=cpu, is_shared=False), pixels: Tensor(shape=torch.Size([100, 210, 160, 3]), device=cpu, dtype=torch.uint8, is_shared=False), terminated: Tensor(shape=torch.Size([100, 1]), device=cpu, dtype=torch.bool, is_shared=False), truncated: Tensor(shape=torch.Size([100, 1]), device=cpu, dtype=torch.bool, is_shared=False)}, batch_size=torch.Size([100]), device=cpu, is_shared=False) >>> eol_transform = EndOfLifeTransform() >>> env = TransformedEnv(env, eol_transform) >>> env.rollout(100) TensorDict( fields={ action: Tensor(shape=torch.Size([100, 4]), device=cpu, dtype=torch.int64, is_shared=False), done: Tensor(shape=torch.Size([100, 1]), device=cpu, dtype=torch.bool, is_shared=False), eol: Tensor(shape=torch.Size([100, 1]), device=cpu, dtype=torch.bool, is_shared=False), lives: Tensor(shape=torch.Size([100]), device=cpu, dtype=torch.int64, is_shared=False), next: TensorDict( fields={ done: Tensor(shape=torch.Size([100, 1]), device=cpu, dtype=torch.bool, is_shared=False), end-of-life: Tensor(shape=torch.Size([100, 1]), device=cpu, dtype=torch.bool, is_shared=False), lives: Tensor(shape=torch.Size([100]), device=cpu, dtype=torch.int64, is_shared=False), pixels: Tensor(shape=torch.Size([100, 210, 160, 3]), device=cpu, dtype=torch.uint8, is_shared=False), reward: Tensor(shape=torch.Size([100, 1]), device=cpu, dtype=torch.float32, is_shared=False), terminated: Tensor(shape=torch.Size([100, 1]), device=cpu, dtype=torch.bool, is_shared=False), truncated: Tensor(shape=torch.Size([100, 1]), device=cpu, dtype=torch.bool, is_shared=False)}, batch_size=torch.Size([100]), device=cpu, is_shared=False), pixels: Tensor(shape=torch.Size([100, 210, 160, 3]), device=cpu, dtype=torch.uint8, is_shared=False), terminated: Tensor(shape=torch.Size([100, 1]), device=cpu, dtype=torch.bool, is_shared=False), truncated: Tensor(shape=torch.Size([100, 1]), device=cpu, dtype=torch.bool, is_shared=False)}, batch_size=torch.Size([100]), device=cpu, is_shared=False)
The typical usage of this transform is to replace the “done” state by “end-of-life” within the loss module. The end-of-life signal isn’t registered within the
done_spec
because it should not instruct the env to reset.Examples
>>> from torchrl.objectives import DQNLoss >>> module = torch.nn.Identity() # used as a placeholder >>> loss = DQNLoss(module, action_space="categorical") >>> loss.set_keys(done="end-of-life", terminated="end-of-life") >>> # equivalently >>> eol_transform.register_keys(loss)
- forward(tensordict: TensorDictBase) TensorDictBase [source]¶
Reads the input tensordict, and for the selected keys, applies the transform.
- register_keys(loss_or_advantage: LossModule)[source]¶
Registers the end-of-life key at appropriate places within the loss.
- Parameters:
loss_or_advantage (torchrl.objectives.LossModule or torchrl.objectives.value.ValueEstimatorBase) – a module to instruct what the end-of-life key is.
- transform_observation_spec(observation_spec)[source]¶
Transforms the observation spec such that the resulting spec matches transform mapping.
- Parameters:
observation_spec (TensorSpec) – spec before the transform
- Returns:
expected spec after the transform