RewardSum

class torchrl.envs.transforms.RewardSum(in_keys: Sequence[NestedKey] | None = None, out_keys: Sequence[NestedKey] | None = None, reset_keys: Sequence[NestedKey] | None = None, *, reward_spec: bool = False)[source]

Tracks episode cumulative rewards.

This transform accepts a list of tensordict reward keys (i.e. ´in_keys´) and tracks their cumulative value along the time dimension for each episode.

When called, the transform writes a new tensordict entry for each in_key named episode_{in_key} where the cumulative values are written.

Parameters:

in_keys (list of NestedKeys, optional) – Input reward keys. All ´in_keys´ should be part of the environment reward_spec. If no in_keys are specified, this transform assumes "reward" to be the input key. However, multiple rewards (e.g. "reward1" and "reward2"") can also be specified.
out_keys (list of NestedKeys, optional) – The output sum keys, should be one per each input key.
reset_keys (list of NestedKeys, optional) – the list of reset_keys to be used, if the parent environment cannot be found. If provided, this value will prevail over the environment reset_keys.

Keyword Arguments:

reward_spec (bool, optional) – if True, the new reward entry will be registered in the reward specs. Defaults to False (registered in observation_specs).

Examples

>>> from torchrl.envs.transforms import RewardSum, TransformedEnv
>>> from torchrl.envs.libs.gym import GymEnv
>>> env = TransformedEnv(GymEnv("CartPole-v1"), RewardSum())
>>> env.set_seed(0)
>>> torch.manual_seed(0)
>>> td = env.reset()
>>> print(td["episode_reward"])
tensor([0.])
>>> td = env.rollout(3)
>>> print(td["next", "episode_reward"])
tensor([[1.],
        [2.],
        [3.]])

forward(tensordict: TensorDictBase) → TensorDictBase[source]

Reads the input tensordict, and for the selected keys, applies the transform.

By default, this method:

calls directly _apply_transform().
does not call _step() or _call().

This method is not called within env.step at any point. However, is is called within sample().

Note

forward also works with regular keyword arguments using dispatch to cast the args names to the keys.

Examples

>>> class TransformThatMeasuresBytes(Transform):
...     '''Measures the number of bytes in the tensordict, and writes it under `"bytes"`.'''
...     def __init__(self):
...         super().__init__(in_keys=[], out_keys=["bytes"])
...
...     def forward(self, tensordict: TensorDictBase) -> TensorDictBase:
...         bytes_in_td = tensordict.bytes()
...         tensordict["bytes"] = bytes
...         return tensordict
>>> t = TransformThatMeasuresBytes()
>>> env = env.append_transform(t) # works within envs
>>> t(TensorDict(a=0))  # Works offline too.

transform_input_spec(input_spec: TensorSpec) → TensorSpec[source]

Transforms the input spec such that the resulting spec matches transform mapping.

Parameters:: input_spec (TensorSpec) – spec before the transform
Returns:: expected spec after the transform

transform_observation_spec(observation_spec: TensorSpec) → TensorSpec[source]: Transforms the observation spec, adding the new keys generated by RewardSum.

transform_reward_spec(reward_spec: TensorSpec) → TensorSpec[source]

Transforms the reward spec such that the resulting spec matches transform mapping.

Parameters:: reward_spec (TensorSpec) – spec before the transform
Returns:: expected spec after the transform

RewardSum

Docs

Tutorials

Resources