RewardSum
- class torchrl.envs.transforms.RewardSum(in_keys: Sequence[NestedKey] | None = None, out_keys: Sequence[NestedKey] | None = None, reset_keys: Sequence[NestedKey] | None = None, *, reward_spec: bool = False)[source]
Tracks episode cumulative rewards.
This transform accepts a list of tensordict reward keys (i.e. ´in_keys´) and tracks their cumulative value along the time dimension for each episode.
When called, the transform writes a new tensordict entry for each
in_key
namedepisode_{in_key}
where the cumulative values are written.- Parameters:
in_keys (list of NestedKeys, optional) – Input reward keys. All ´in_keys´ should be part of the environment reward_spec. If no
in_keys
are specified, this transform assumes"reward"
to be the input key. However, multiple rewards (e.g."reward1"
and"reward2""
) can also be specified.out_keys (list of NestedKeys, optional) – The output sum keys, should be one per each input key.
reset_keys (list of NestedKeys, optional) – the list of reset_keys to be used, if the parent environment cannot be found. If provided, this value will prevail over the environment
reset_keys
.
- Keyword Arguments:
reward_spec (bool, optional) – if
True
, the new reward entry will be registered in the reward specs. Defaults toFalse
(registered inobservation_specs
).
Examples
>>> from torchrl.envs.transforms import RewardSum, TransformedEnv >>> from torchrl.envs.libs.gym import GymEnv >>> env = TransformedEnv(GymEnv("CartPole-v1"), RewardSum()) >>> env.set_seed(0) >>> torch.manual_seed(0) >>> td = env.reset() >>> print(td["episode_reward"]) tensor([0.]) >>> td = env.rollout(3) >>> print(td["next", "episode_reward"]) tensor([[1.], [2.], [3.]])
- forward(tensordict: TensorDictBase) TensorDictBase [source]
Reads the input tensordict, and for the selected keys, applies the transform.
By default, this method:
calls directly
_apply_transform()
.does not call
_step()
or_call()
.
This method is not called within env.step at any point. However, is is called within
sample()
.Note
forward
also works with regular keyword arguments usingdispatch
to cast the args names to the keys.Examples
>>> class TransformThatMeasuresBytes(Transform): ... '''Measures the number of bytes in the tensordict, and writes it under `"bytes"`.''' ... def __init__(self): ... super().__init__(in_keys=[], out_keys=["bytes"]) ... ... def forward(self, tensordict: TensorDictBase) -> TensorDictBase: ... bytes_in_td = tensordict.bytes() ... tensordict["bytes"] = bytes ... return tensordict >>> t = TransformThatMeasuresBytes() >>> env = env.append_transform(t) # works within envs >>> t(TensorDict(a=0)) # Works offline too.
- transform_input_spec(input_spec: TensorSpec) TensorSpec [source]
Transforms the input spec such that the resulting spec matches transform mapping.
- Parameters:
input_spec (TensorSpec) – spec before the transform
- Returns:
expected spec after the transform
- transform_observation_spec(observation_spec: TensorSpec) TensorSpec [source]
Transforms the observation spec, adding the new keys generated by RewardSum.
- transform_reward_spec(reward_spec: TensorSpec) TensorSpec [source]
Transforms the reward spec such that the resulting spec matches transform mapping.
- Parameters:
reward_spec (TensorSpec) – spec before the transform
- Returns:
expected spec after the transform