VmasEnv¶
- torchrl.envs.VmasEnv(*args, **kwargs)[source]¶
Vmas environment wrapper.
GitHub: https://github.com/proroklab/VectorizedMultiAgentSimulator
Paper: https://arxiv.org/abs/2207.03530
- Parameters:
scenario (str or vmas.simulator.scenario.BaseScenario) – the vmas scenario to build. Must be one of
available_envs
. For a description and rendering of available scenarios see the README.- Keyword Arguments:
num_envs (int) – Number of vectorized simulation environments. VMAS perfroms vectorized simulations using PyTorch. This argument indicates the number of vectorized environments that should be simulated in a batch. It will also determine the batch size of the environment.
device (torch.device, optional) – Device for simulation. Defaults to the defaultt device. All the tensors created by VMAS will be placed on this device.
continuous_actions (bool, optional) – Whether to use continuous actions. Defaults to
True
. IfFalse
, actions will be discrete. The number of actions and their size will depend on the chosen scenario. See the VMAS repositiory for more info.max_steps (int, optional) – Horizon of the task. Defaults to
None
(infinite horizon). Each VMAS scenario can be terminating or not. Ifmax_steps
is specified, the scenario is also terminated (and the"terminated"
flag is set) whenever this horizon is reached. Unlike gym’sTimeLimit
transform or torchrl’sStepCounter
, this argument will not set the"truncated"
entry in the tensordict.categorical_actions (bool, optional) – if the environment actions are discrete, whether to transform them to categorical or one-hot. Defaults to
True
.group_map (MarlGroupMapType or Dict[str, List[str]], optional) – how to group agents in tensordicts for input/output. By default, if the agent names follow the
"<name>_<int>"
convention, they will be grouped by"<name>"
. If they do not follow this convention, they will be all put in one group named"agents"
. Otherwise, a group map can be specified or selected from some premade options. SeeMarlGroupMapType
for more info.**kwargs (Dict, optional) – These are additional arguments that can be passed to the VMAS scenario constructor. (e.g., number of agents, reward sparsity). The available arguments will vary based on the chosen scenario. To see the available arguments for a specific scenario, see the constructor in its file from the scenario folder.
- Variables:
group_map (Dict[str, List[str]]) – how to group agents in tensordicts for input/output. See
MarlGroupMapType
for more info.agent_names (list of str) – names of the agent in the environment
agent_names_to_indices_map (Dict[str, int]) – dictionary mapping agent names to their index in the enviornment
unbatched_action_spec (TensorSpec) – version of the spec without the vectorized dimension
unbatched_observation_spec (TensorSpec) – version of the spec without the vectorized dimension
unbatched_reward_spec (TensorSpec) – version of the spec without the vectorized dimension
het_specs (bool) – whether the enviornment has any lazy spec
het_specs_map (Dict[str, bool]) – dictionary mapping each group to a flag representing of the group has lazy specs
available_envs (List[str]) – the list of the scenarios available to build.
Warning
VMAS returns a single
done
flag which does not distinguish between when the env reachedmax_steps
and termination. If you deem thetruncation
signal necessary, setmax_steps
toNone
and use aStepCounter
transform.Examples
>>> env = VmasEnv( ... scenario="flocking", ... num_envs=32, ... continuous_actions=True, ... max_steps=200, ... device="cpu", ... seed=None, ... # Scenario kwargs ... n_agents=5, ... ) >>> print(env.rollout(10)) TensorDict( fields={ agents: TensorDict( fields={ action: Tensor(shape=torch.Size([32, 10, 5, 2]), device=cpu, dtype=torch.float32, is_shared=False), info: TensorDict( fields={ agent_collision_rew: Tensor(shape=torch.Size([32, 10, 5, 1]), device=cpu, dtype=torch.float32, is_shared=False), agent_distance_rew: Tensor(shape=torch.Size([32, 10, 5, 1]), device=cpu, dtype=torch.float32, is_shared=False)}, batch_size=torch.Size([32, 10, 5]), device=cpu, is_shared=False), observation: Tensor(shape=torch.Size([32, 10, 5, 18]), device=cpu, dtype=torch.float32, is_shared=False)}, batch_size=torch.Size([32, 10, 5]), device=cpu, is_shared=False), done: Tensor(shape=torch.Size([32, 10, 1]), device=cpu, dtype=torch.bool, is_shared=False), next: TensorDict( fields={ agents: TensorDict( fields={ info: TensorDict( fields={ agent_collision_rew: Tensor(shape=torch.Size([32, 10, 5, 1]), device=cpu, dtype=torch.float32, is_shared=False), agent_distance_rew: Tensor(shape=torch.Size([32, 10, 5, 1]), device=cpu, dtype=torch.float32, is_shared=False)}, batch_size=torch.Size([32, 10, 5]), device=cpu, is_shared=False), observation: Tensor(shape=torch.Size([32, 10, 5, 18]), device=cpu, dtype=torch.float32, is_shared=False), reward: Tensor(shape=torch.Size([32, 10, 5, 1]), device=cpu, dtype=torch.float32, is_shared=False)}, batch_size=torch.Size([32, 10, 5]), device=cpu, is_shared=False), done: Tensor(shape=torch.Size([32, 10, 1]), device=cpu, dtype=torch.bool, is_shared=False), terminated: Tensor(shape=torch.Size([32, 10, 1]), device=cpu, dtype=torch.bool, is_shared=False)}, batch_size=torch.Size([32, 10]), device=cpu, is_shared=False), terminated: Tensor(shape=torch.Size([32, 10, 1]), device=cpu, dtype=torch.bool, is_shared=False)}, batch_size=torch.Size([32, 10]), device=cpu, is_shared=False)