OpenSpielEnv¶

torchrl.envs.OpenSpielEnv(*args, **kwargs)[source]¶

Google DeepMind OpenSpiel environment wrapper built with the game string.

GitHub: https://github.com/google-deepmind/open_spiel

Documentation: https://openspiel.readthedocs.io/en/latest/index.html

Parameters:

game_string (str) – the name of the game to wrap. Must be part of available_envs.

Keyword Arguments:

device (torch.device, optional) – if provided, the device on which the data is to be cast. Defaults to None.
batch_size (torch.Size, optional) – the batch size of the environment. Defaults to torch.Size([]).
allow_done_after_reset (bool, optional) – if True, it is tolerated for envs to be done just after reset() is called. Defaults to False.
group_map (MarlGroupMapType or Dict[str, List[str]]], optional) – how to group agents in tensordicts for input/output. See MarlGroupMapType for more info. Defaults to ALL_IN_ONE_GROUP.
categorical_actions (bool, optional) – if True, categorical specs will be converted to the TorchRL equivalent (torchrl.data.Categorical), otherwise a one-hot encoding will be used (torchrl.data.OneHot). Defaults to False.
return_state (bool, optional) – if True, “state” is included in the output of reset() and step(). The state can be given to reset() to reset to that state, rather than resetting to the initial state. Defaults to False.

Variables:

available_envs – environments available to build

Examples

>>> from torchrl.envs import OpenSpielEnv
>>> from tensordict import TensorDict
>>> env = OpenSpielEnv("chess", return_state=True)
>>> td = env.reset()
>>> td = env.step(env.full_action_spec.rand())
>>> print(td)
TensorDict(
    fields={
        agents: TensorDict(
            fields={
                action: Tensor(shape=torch.Size([2, 4672]), device=cpu, dtype=torch.int64, is_shared=False)},
            batch_size=torch.Size([]),
            device=None,
            is_shared=False),
        next: TensorDict(
            fields={
                agents: TensorDict(
                    fields={
                        observation: Tensor(shape=torch.Size([2, 20, 8, 8]), device=cpu, dtype=torch.float32, is_shared=False),
                        reward: Tensor(shape=torch.Size([2, 1]), device=cpu, dtype=torch.float32, is_shared=False)},
                    batch_size=torch.Size([2]),
                    device=None,
                    is_shared=False),
                current_player: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.int32, is_shared=False),
                done: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False),
                state: NonTensorData(data=FEN: rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
                674
                , batch_size=torch.Size([]), device=None),
                terminated: Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.bool, is_shared=False)},
            batch_size=torch.Size([]),
            device=None,
            is_shared=False)},
    batch_size=torch.Size([]),
    device=None,
    is_shared=False)
>>> print(env.available_envs)
['2048', 'add_noise', 'amazons', 'backgammon', ...]

reset() can restore a specific state, rather than the initial state, as long as return_state=True.

>>> from torchrl.envs import OpenSpielEnv
>>> from tensordict import TensorDict
>>> env = OpenSpielEnv("chess", return_state=True)
>>> td = env.reset()
>>> td = env.step(env.full_action_spec.rand())
>>> td_restore = td["next"]
>>> td = env.step(env.full_action_spec.rand())
>>> # Current state is not equal `td_restore`
>>> (td["next"] == td_restore).all()
False
>>> td = env.reset(td_restore)
>>> # After resetting, now the current state is equal to `td_restore`
>>> (td == td_restore).all()
True

OpenSpielEnv¶

Docs

Tutorials

Resources