SLURMEnvironment¶
- class lightning.pytorch.plugins.environments.SLURMEnvironment(auto_requeue=True, requeue_signal=None)[source]¶
- Bases: - ClusterEnvironment- Cluster environment for training on a cluster managed by SLURM. - You can configure the main_address and main_port properties via the env variables MASTER_ADDR and MASTER_PORT, respectively. - Parameters:
- auto_requeue¶ ( - bool) – Whether automatic job resubmission is enabled or not. How and under which conditions a job gets rescheduled gets determined by the owner of this plugin.
- requeue_signal¶ ( - Optional[- Signals]) – The signal that SLURM will send to indicate that the job should be requeued. Defaults to SIGUSR1 on Unix.
 
 - static detect()[source]¶
- Returns - Trueif the current process was launched on a SLURM cluster.- It is possible to use the SLURM scheduler to request resources and then launch processes manually using a different environment. For this, the user can set the job name in SLURM to ‘bash’ or ‘interactive’ (srun –job- name=interactive). This will then avoid the detection of - SLURMEnvironmentand another environment can be detected automatically.- Return type:
 
 - global_rank()[source]¶
- The rank (index) of the currently running process across all nodes and devices. - Return type:
 
 - local_rank()[source]¶
- The rank (index) of the currently running process inside of the current node. - Return type:
 
 - static resolve_root_node_address(nodes)[source]¶
- The node selection format in SLURM supports several formats. - This function selects the first host name from :rtype: - str- a space-separated list of host names, e.g., ‘host0 host1 host3’ yields ‘host0’ as the root 
- a comma-separated list of host names, e.g., ‘host0,host1,host3’ yields ‘host0’ as the root 
- the range notation with brackets, e.g., ‘host[5-9]’ yields ‘host5’ as the root 
 
 - validate_settings(num_devices, num_nodes)[source]¶
- Validates settings configured in the script against the environment, and raises an exception if there is an inconsistency. - Return type: