Shortcuts

reward2go

class torchrl.objectives.value.functional.reward2go(reward, done, gamma, *, time_dim: int = - 2)[source]

Compute the discounted cumulative sum of rewards given multiple trajectories and the episode ends.

Parameters:
  • reward (torch.Tensor) – A tensor containing the rewards received at each time step over multiple trajectories.

  • done (Tensor) – boolean flag for end of episode. Differs from truncated, where the episode did not end but was interrupted.

  • gamma (float, optional) – The discount factor to use for computing the discounted cumulative sum of rewards. Defaults to 1.0.

  • time_dim (int) – dimension where the time is unrolled. Defaults to -2.

Returns:

A tensor of shape [B, T] containing the discounted cumulative

sum of rewards (reward-to-go) at each time step.

Return type:

torch.Tensor

Examples

>>> reward = torch.ones(1, 10)
>>> done = torch.zeros(1, 10, dtype=torch.bool)
>>> done[:, [3, 7]] = True
>>> reward2go(reward, done, 0.99, time_dim=-1)
tensor([[3.9404],
        [2.9701],
        [1.9900],
        [1.0000],
        [3.9404],
        [2.9701],
        [1.9900],
        [1.0000],
        [1.9900],
        [1.0000]])

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources