Parameter Servers¶
This module provides a prototype implementation of a fault tolerant parameter server bulit on the reconfigurable ProcessGroups.
- class torchft.parameter_server.ParameterServer(port: int, store_port: int = 0)[source]¶
Bases:
ABC
This implements a threaded parameter server using the torchft reconfigurable ProcessGroups.
- address() str [source]¶
Returns the HTTP address to create a new session on this server.
Format: http://host:port/new_session
- Returns:
an HTTP address
- abstract forward(session_id: str, pg: ProcessGroup) None [source]¶
This method will be called once per session in a dedicated thread. To support multiple operations on a single session you should put a for-loop in your forward implementation.
If an error occurs, the process group will be freed and the client will have to create a new session.
The server rank is 0 and the client rank is 1.
Must be implemented by subclasses.
- Parameters:
session_id – a unique uuid for this session
pg – the ProcessGroup that’s configured for the client.
- abstract classmethod new_process_group() ProcessGroup [source]¶
Create a new non-configured ProcessGroup for the ParameterServer to configure when setting up server and client connections.
Must be implemented by subclasses.
- Returns:
a new ProcessGroup
- classmethod new_session(address: str) ProcessGroup [source]¶
Creates a new session on the parameter server and returns a ProcessGroup configured for that server.
Client is rank 1, server is rank 0.