EGreedyModule

class torchrl.modules.EGreedyModule(*args, **kwargs)[source]

Epsilon-Greedy exploration module.

This module randomly updates the action(s) in a tensordict given an epsilon greedy exploration strategy. At each call, random draws (one per action) are executed given a certain probability threshold. If successful, the corresponding actions are being replaced by random samples drawn from the action spec provided. Others are left unchanged.

Parameters:

spec (TensorSpec) – the spec used for sampling actions.
eps_init (scalar, optional) – initial epsilon value. default: 1.0
eps_end (scalar, optional) – final epsilon value. default: 0.1
annealing_num_steps (int, optional) – number of steps it will take for epsilon to reach the eps_end value. Defaults to 1000.

Keyword Arguments:

action_key (NestedKey, optional) – the key where the action can be found in the input tensordict. Default is "action".
action_mask_key (NestedKey, optional) – the key where the action mask can be found in the input tensordict. Default is None (corresponding to no mask).
device (torch.device, optional) – the device of the exploration module.

Note

It is crucial to incorporate a call to step() in the training loop to update the exploration factor. Since it is not easy to capture this omission no warning or exception will be raised if this is omitted!

Examples

>>> import torch
>>> from tensordict import TensorDict
>>> from tensordict.nn import TensorDictSequential
>>> from torchrl.modules import EGreedyModule, Actor
>>> from torchrl.data import Bounded
>>> torch.manual_seed(0)
>>> spec = Bounded(-1, 1, torch.Size([4]))
>>> module = torch.nn.Linear(4, 4, bias=False)
>>> policy = Actor(spec=spec, module=module)
>>> explorative_policy = TensorDictSequential(policy,  EGreedyModule(eps_init=0.2))
>>> td = TensorDict({"observation": torch.zeros(10, 4)}, batch_size=[10])
>>> print(explorative_policy(td).get("action"))
tensor([[ 0.0000,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  0.0000],
        [ 0.9055, -0.9277, -0.6295, -0.2532],
        [ 0.0000,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  0.0000]], grad_fn=<AddBackward0>)

forward(tensordict: TensorDictBase) → TensorDictBase[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

step(frames: int = 1) → None[source]

A step of epsilon decay.

After self.annealing_num_steps calls to this method, calls result in no-op.

Parameters:: frames (int, optional) – number of frames since last step. Defaults to 1.

EGreedyModule

Docs

Tutorials

Resources