VmasWrapper¶
- torchrl.envs.VmasWrapper(*args, **kwargs)[source]¶
Vmas environment wrapper.
GitHub: https://github.com/proroklab/VectorizedMultiAgentSimulator
Paper: https://arxiv.org/abs/2207.03530
- Parameters:
env (
vmas.simulator.environment.environment.Environment
) – the vmas environment to wrap.- Keyword Arguments:
num_envs (int) – Number of vectorized simulation environments. VMAS perfroms vectorized simulations using PyTorch. This argument indicates the number of vectorized environments that should be simulated in a batch. It will also determine the batch size of the environment.
device (torch.device, optional) – Device for simulation. Defaults to the default device. All the tensors created by VMAS will be placed on this device.
continuous_actions (bool, optional) – Whether to use continuous actions. Defaults to
True
. IfFalse
, actions will be discrete. The number of actions and their size will depend on the chosen scenario. See the VMAS repository for more info.max_steps (int, optional) – Horizon of the task. Defaults to
None
(infinite horizon). Each VMAS scenario can be terminating or not. Ifmax_steps
is specified, the scenario is also terminated (and the"terminated"
flag is set) whenever this horizon is reached. Unlike gym’sTimeLimit
transform or torchrl’sStepCounter
, this argument will not set the"truncated"
entry in the tensordict.categorical_actions (bool, optional) – if the environment actions are discrete, whether to transform them to categorical or one-hot. Defaults to
True
.group_map (MarlGroupMapType or Dict[str, List[str]], optional) – how to group agents in tensordicts for input/output. By default, if the agent names follow the
"<name>_<int>"
convention, they will be grouped by"<name>"
. If they do not follow this convention, they will be all put in one group named"agents"
. Otherwise, a group map can be specified or selected from some premade options. SeeMarlGroupMapType
for more info.
- Variables:
group_map (Dict[str, List[str]]) – how to group agents in tensordicts for input/output. See
MarlGroupMapType
for more info.agent_names (list of str) – names of the agent in the environment
agent_names_to_indices_map (Dict[str, int]) – dictionary mapping agent names to their index in the environment
unbatched_action_spec (TensorSpec) – version of the spec without the vectorized dimension
unbatched_observation_spec (TensorSpec) – version of the spec without the vectorized dimension
unbatched_reward_spec (TensorSpec) – version of the spec without the vectorized dimension
het_specs (bool) – whether the enviornment has any lazy spec
het_specs_map (Dict[str, bool]) – dictionary mapping each group to a flag representing of the group has lazy specs
available_envs (List[str]) – the list of the scenarios available to build.
Warning
VMAS returns a single
done
flag which does not distinguish between when the env reachedmax_steps
and termination. If you deem thetruncation
signal necessary, setmax_steps
toNone
and use aStepCounter
transform.Examples
>>> env = VmasWrapper( ... vmas.make_env( ... scenario="flocking", ... num_envs=32, ... continuous_actions=True, ... max_steps=200, ... device="cpu", ... seed=None, ... # Scenario kwargs ... n_agents=5, ... ) ... ) >>> print(env.rollout(10)) TensorDict( fields={ agents: TensorDict( fields={ action: Tensor(shape=torch.Size([32, 10, 5, 2]), device=cpu, dtype=torch.float32, is_shared=False), info: TensorDict( fields={ agent_collision_rew: Tensor(shape=torch.Size([32, 10, 5, 1]), device=cpu, dtype=torch.float32, is_shared=False), agent_distance_rew: Tensor(shape=torch.Size([32, 10, 5, 1]), device=cpu, dtype=torch.float32, is_shared=False)}, batch_size=torch.Size([32, 10, 5]), device=cpu, is_shared=False), observation: Tensor(shape=torch.Size([32, 10, 5, 18]), device=cpu, dtype=torch.float32, is_shared=False)}, batch_size=torch.Size([32, 10, 5]), device=cpu, is_shared=False), done: Tensor(shape=torch.Size([32, 10, 1]), device=cpu, dtype=torch.bool, is_shared=False), next: TensorDict( fields={ agents: TensorDict( fields={ info: TensorDict( fields={ agent_collision_rew: Tensor(shape=torch.Size([32, 10, 5, 1]), device=cpu, dtype=torch.float32, is_shared=False), agent_distance_rew: Tensor(shape=torch.Size([32, 10, 5, 1]), device=cpu, dtype=torch.float32, is_shared=False)}, batch_size=torch.Size([32, 10, 5]), device=cpu, is_shared=False), observation: Tensor(shape=torch.Size([32, 10, 5, 18]), device=cpu, dtype=torch.float32, is_shared=False), reward: Tensor(shape=torch.Size([32, 10, 5, 1]), device=cpu, dtype=torch.float32, is_shared=False)}, batch_size=torch.Size([32, 10, 5]), device=cpu, is_shared=False), done: Tensor(shape=torch.Size([32, 10, 1]), device=cpu, dtype=torch.bool, is_shared=False), terminated: Tensor(shape=torch.Size([32, 10, 1]), device=cpu, dtype=torch.bool, is_shared=False)}, batch_size=torch.Size([32, 10]), device=cpu, is_shared=False), terminated: Tensor(shape=torch.Size([32, 10, 1]), device=cpu, dtype=torch.bool, is_shared=False)}, batch_size=torch.Size([32, 10]), device=cpu, is_shared=False)