DecisionTransformerInferenceWrapper¶
- class torchrl.modules.tensordict_module.DecisionTransformerInferenceWrapper(*args, **kwargs)[source]¶
Inference Action Wrapper for the Decision Transformer.
A wrapper specifically designed for the Decision Transformer, which will mask the input tensordict sequences to the inferece context. The output will be a TensorDict with the same keys as the input, but with only the last action of the predicted action sequence and the last return to go.
This module creates returns a modified copy of the tensordict, ie. it does not modify the tensordict in-place.
Note
If the action, observation or reward-to-go key is not standard, the method
set_tensor_keys()
should be used, e.g.>>> dt_inference_wrapper.set_tensor_keys(action="foo", observation="bar", return_to_go="baz")
The in_keys are the observation, action and return-to-go keys. The out-keys match the in-keys, with the addition of any other out-key from the policy (eg., parameters of the distribution or hidden values).
- Parameters:
policy (TensorDictModule) – The policy module that takes in observations and produces an action value
- Keyword Arguments:
inference_context (int) – The number of previous actions that will not be masked in the context. For example for an observation input of shape [batch_size, context, obs_dim] with context=20 and inference_context=5, the first 15 entries of the context will be masked. Defaults to 5.
spec (Optional[TensorSpec]) – The spec of the input TensorDict. If None, it will be inferred from the policy module.
device (torch.device, optional) – if provided, the device where the buffers / specs will be placed.
Examples
>>> import torch >>> from tensordict import TensorDict >>> from tensordict.nn import TensorDictModule >>> from torchrl.modules import ( ... ProbabilisticActor, ... TanhDelta, ... DTActor, ... DecisionTransformerInferenceWrapper, ... ) >>> dtactor = DTActor(state_dim=4, action_dim=2, ... transformer_config=DTActor.default_config() ... ) >>> actor_module = TensorDictModule( ... dtactor, ... in_keys=["observation", "action", "return_to_go"], ... out_keys=["param"]) >>> dist_class = TanhDelta >>> dist_kwargs = { ... "low": -1.0, ... "high": 1.0, ... } >>> actor = ProbabilisticActor( ... in_keys=["param"], ... out_keys=["action"], ... module=actor_module, ... distribution_class=dist_class, ... distribution_kwargs=dist_kwargs) >>> inference_actor = DecisionTransformerInferenceWrapper(actor) >>> sequence_length = 20 >>> td = TensorDict({"observation": torch.randn(1, sequence_length, 4), ... "action": torch.randn(1, sequence_length, 2), ... "return_to_go": torch.randn(1, sequence_length, 1)}, [1,]) >>> result = inference_actor(td) >>> print(result) TensorDict( fields={ action: Tensor(shape=torch.Size([1, 2]), device=cpu, dtype=torch.float32, is_shared=False), observation: Tensor(shape=torch.Size([1, 20, 4]), device=cpu, dtype=torch.float32, is_shared=False), param: Tensor(shape=torch.Size([1, 20, 2]), device=cpu, dtype=torch.float32, is_shared=False), return_to_go: Tensor(shape=torch.Size([1, 1]), device=cpu, dtype=torch.float32, is_shared=False)}, batch_size=torch.Size([1]), device=None, is_shared=False)
- forward(tensordict: TensorDictBase = None) TensorDictBase [source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- mask_context(tensordict: TensorDictBase) TensorDictBase [source]¶
Mask the context of the input sequences.
- set_tensor_keys(**kwargs)[source]¶
Sets the input keys of the module.
- Keyword Arguments:
observation (NestedKey, optional) – The observation key.
action (NestedKey, optional) – The action key (input to the network).
return_to_go (NestedKey, optional) – The return_to_go key.
out_action (NestedKey, optional) – The action key (output of the network).