Shortcuts

MeltingpotEnv

torchrl.envs.MeltingpotEnv(*args, **kwargs)[source]

Meltingpot environment wrapper.

GitHub: https://github.com/google-deepmind/meltingpot

Paper: https://arxiv.org/abs/2211.13746

Melting Pot assesses generalization to novel social situations involving both familiar and unfamiliar individuals, and has been designed to test a broad range of social interactions such as: cooperation, competition, deception, reciprocation, trust, stubbornness and so on. Melting Pot offers researchers a set of over 50 multi-agent reinforcement learning substrates (multi-agent games) on which to train agents, and over 256 unique test scenarios on which to evaluate these trained agents.

Parameters:

substrate (str or ml_collections.config_dict.ConfigDict) – the meltingpot substrate to build. Can be a string from available_envs or a ConfigDict for the substrate

Keyword Arguments:
  • max_steps (int, optional) – Horizon of the task. Defaults to None (infinite horizon). Each Meltingpot substrate can be terminating or not. If max_steps is specified, the scenario is also terminated (and the "terminated" flag is set) whenever this horizon is reached. Unlike gym’s TimeLimit transform or torchrl’s StepCounter, this argument will not set the "truncated" entry in the tensordict.

  • categorical_actions (bool, optional) – if the environment actions are discrete, whether to transform them to categorical or one-hot. Defaults to True.

  • group_map (MarlGroupMapType or Dict[str, List[str]], optional) – how to group agents in tensordicts for input/output. By default, they will be all put in one group named "agents". Otherwise, a group map can be specified or selected from some premade options. See MarlGroupMapType for more info.

Variables:
  • group_map (Dict[str, List[str]]) – how to group agents in tensordicts for input/output. See MarlGroupMapType for more info.

  • agent_names (list of str) – names of the agent in the environment

  • agent_names_to_indices_map (Dict[str, int]) – dictionary mapping agent names to their index in the enviornment

  • available_envs (List[str]) – the list of the scenarios available to build.

Warning

Meltingpot returns a single done flag which does not distinguish between when the env reached max_steps and termination. If you deem the truncation signal necessary, set max_steps to None and use a StepCounter transform.

Examples

>>> from torchrl.envs.libs.meltingpot import MeltingpotEnv
>>> env_torchrl = MeltingpotEnv("commons_harvest__open")
>>> print(env_torchrl.rollout(max_steps=5))
TensorDict(
    fields={
        RGB: Tensor(shape=torch.Size([5, 144, 192, 3]), device=cpu, dtype=torch.uint8, is_shared=False),
        agents: TensorDict(
            fields={
                action: Tensor(shape=torch.Size([5, 7]), device=cpu, dtype=torch.int64, is_shared=False),
                observation: TensorDict(
                    fields={
                        COLLECTIVE_REWARD: Tensor(shape=torch.Size([5, 7, 1]), device=cpu, dtype=torch.float64, is_shared=False),
                        READY_TO_SHOOT: Tensor(shape=torch.Size([5, 7, 1]), device=cpu, dtype=torch.float64, is_shared=False),
                        RGB: Tensor(shape=torch.Size([5, 7, 88, 88, 3]), device=cpu, dtype=torch.uint8, is_shared=False)},
                    batch_size=torch.Size([5, 7]),
                    device=cpu,
                    is_shared=False)},
            batch_size=torch.Size([5, 7]),
            device=cpu,
            is_shared=False),
        done: Tensor(shape=torch.Size([5, 1]), device=cpu, dtype=torch.bool, is_shared=False),
        next: TensorDict(
            fields={
                RGB: Tensor(shape=torch.Size([5, 144, 192, 3]), device=cpu, dtype=torch.uint8, is_shared=False),
                agents: TensorDict(
                    fields={
                        observation: TensorDict(
                            fields={
                                COLLECTIVE_REWARD: Tensor(shape=torch.Size([5, 7, 1]), device=cpu, dtype=torch.float64, is_shared=False),
                                READY_TO_SHOOT: Tensor(shape=torch.Size([5, 7, 1]), device=cpu, dtype=torch.float64, is_shared=False),
                                RGB: Tensor(shape=torch.Size([5, 7, 88, 88, 3]), device=cpu, dtype=torch.uint8, is_shared=False)},
                            batch_size=torch.Size([5, 7]),
                            device=cpu,
                            is_shared=False),
                        reward: Tensor(shape=torch.Size([5, 7, 1]), device=cpu, dtype=torch.float64, is_shared=False)},
                    batch_size=torch.Size([5, 7]),
                    device=cpu,
                    is_shared=False),
                done: Tensor(shape=torch.Size([5, 1]), device=cpu, dtype=torch.bool, is_shared=False),
                terminated: Tensor(shape=torch.Size([5, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
            batch_size=torch.Size([5]),
            device=cpu,
            is_shared=False),
        terminated: Tensor(shape=torch.Size([5, 1]), device=cpu, dtype=torch.bool, is_shared=False)},
    batch_size=torch.Size([5]),
    device=cpu,
    is_shared=False)

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources