itwinai.slurmο
configurationο
- class itwinai.slurm.configuration.SlurmScriptConfiguration(*, job_name: str | None = None, account: str, partition: str, time: str = '00:30:00', std_out: Path | None = None, err_out: Path | None = None, num_nodes: int = 1, num_tasks: int | None = None, num_tasks_per_node: int = 1, gpus_per_node: int = 4, cpus_per_task: int = 16, memory: str = '16G', exclusive: bool = False, pre_exec_command: str | None = None, exec_command: str | None = None, save_script: bool = False, submit_job: bool = False, save_dir: Path | None = PosixPath('slurm-scripts'), pre_exec_file: str | None = None, exec_file: str | None = None)[source]ο
Bases:
BaseModelConfiguration object for the SLURM script. It contains all the settings for the SLURM script such as which hardware you are requesting or for how long to run it. As it allows for any
pre_exec_commandandexec_command, it should work for any SLURM script.- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}ο
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- job_name: str | Noneο
Optional job name for the SLURM job. Defaults to None (auto-generated later).
- account: strο
Billing account to charge the job to. Required.
- partition: strο
Partition/queue the job should run on. Required.
- time: strο
Wall-clock time limit for the job (
HH:MM:SS). Defaults to00:30:00.
- std_out: Path | Noneο
Path to standard output file. Defaults to None (filled later).
- err_out: Path | Noneο
Path to standard error file. Defaults to None (filled later).
- num_nodes: intο
Number of nodes requested. Defaults to 1.
- num_tasks: int | Noneο
Total number of tasks, on all nodes. Defaults to None (computed dynamically).
- num_tasks_per_node: intο
Number of tasks per node. Defaults to 1.
- gpus_per_node: intο
GPUs per node requested. Defaults to 4.
- cpus_per_task: intο
CPUs per task requested. Defaults to 16.
- memory: strο
Memory per node requested. Defaults to β16Gβ.
- exclusive: boolο
Whether to request exclusive node access. Defaults to False.
- pre_exec_command: str | Noneο
Pre-execution command content (shell). Defaults to None (set by builder). Typically used to set up the environment before executing the command, e.g. βml Pythonβ, βsource .venv/bin/activateβ etc. Usually this should not be set by the user except for advanced use cases, and it will be generated by the SLURM script builder based on the configuration.
- exec_command: str | Noneο
Main execution command content (shell). Defaults to None (set by builder). Command to execute, typically an βsrunβ command. Usually this should not be set by the user except for advanced use cases, and it will be generated by the SLURM script builder based on the configuration.
- save_script: boolο
Whether to save the generated SLURM script. Defaults to False.
- submit_job: boolο
Whether to submit the generated SLURM script. Defaults to False.
- save_dir: Path | Noneο
Directory where the script should be saved. Defaults to βslurm-scriptsβ.
- pre_exec_file: str | Noneο
Path/URL to a pre-exec file to load content from. Ignored if not provided. Defaults to None.
- exec_file: str | Noneο
Path/URL to an exec file to load content from. Ignored if not provided. Defaults to None.
- class itwinai.slurm.configuration.MLSlurmBuilderConfig(*, job_name: str | None = None, account: str, partition: str, time: str = '00:30:00', std_out: ~pathlib.Path | None = None, err_out: ~pathlib.Path | None = None, num_nodes: int = 1, num_tasks: int | None = None, num_tasks_per_node: int = 1, gpus_per_node: int = 4, cpus_per_task: int = 16, memory: str = '16G', exclusive: bool = False, pre_exec_command: str | None = None, exec_command: str | None = None, save_script: bool = False, submit_job: bool = False, save_dir: ~pathlib.Path | None = PosixPath('slurm-scripts'), pre_exec_file: str | None = None, exec_file: str | None = None, use_ray: bool = False, container_path: ~pathlib.Path | None = None, distributed_strategy: ~typing.Literal['ddp', 'horovod', 'deepspeed'], mode: ~typing.Literal['single', 'runall', 'scaling-test'] = 'single', training_cmd: str | None = '{itwinai_launcher} exec-pipeline --config-name={config_name} --config-path={config_path} --strategy={distributed_strategy} --run-name={run_name} +pipe_key={pipe_key} ', python_venv: str | None = None, config_name: str = 'config', config_path: str = '.', pipe_key: str = 'training_pipeline', scalability_nodes: ~typing.List[int] = <factory>, py_spy: bool = False, profiling_sampling_rate: int = 10, run_name: str = 'main-run')[source]ο
Bases:
SlurmScriptConfigurationExtends the base SLURM configuration with ML builder-specific options.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}ο
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- use_ray: boolο
Whether to launch jobs via Ray. Defaults to False.
- container_path: Path | Noneο
Optional container path to export. Defaults to None.
- distributed_strategy: Literal['ddp', 'horovod', 'deepspeed']ο
Distributed strategy to use for training. Required.
- mode: Literal['single', 'runall', 'scaling-test']ο
Execution mode can be a single job, all strategies, or scaling test (with all strategies). Defaults to βsingleβ.
- training_cmd: str | Noneο
Optional custom training command template. Can reference any field in this config. Defaults to
{itwinai_launcher} exec-pipeline --config-name={config_name} --config-path={config_path} --strategy={distributed_strategy} --run_name={run_name} +pipe_key={pipe_key}.
- python_venv: str | Noneο
Python virtual environment to activate. Defaults to None.
- config_name: strο
Hydra config name to pass to exec-pipeline. Defaults to βconfigβ.
- config_path: strο
Hydra config path to pass to exec-pipeline. Defaults to β.β.
- pipe_key: strο
Pipeline key to execute. Defaults to βtraining_pipelineβ.
- scalability_nodes: List[int]ο
Node counts to use for scaling tests. Defaults to [1, 2, 4, 8].
- py_spy: boolο
Enable py-spy profiling. Defaults to False.
- profiling_sampling_rate: intο
Sampling rate for py-spy profiling. Defaults to 10.
- run_name: strο
Run name for tracking. Defaults to βmain-runβ.
script_builderο
- class itwinai.slurm.script_builder.SlurmScriptBuilder(config: SlurmScriptConfiguration)[source]ο
Bases:
objectBase builder for SLURM scripts that handles defaults, execution prep, and dispatch.
- Parameters:
config (SlurmScriptConfiguration) β configuration object.
Note
The provided
SlurmScriptConfigurationmay be modified while preparing the script.- config: SlurmScriptConfigurationο
- static submit_script(script: str) None[source]ο
Submits the given script with βsbatchβ using a temporary file.
- static save_script(script: str, file_path: Path) None[source]ο
Saves the given script to the given file path.
- class itwinai.slurm.script_builder.MLSlurmBuilder(config: MLSlurmBuilderConfig)[source]ο
Bases:
SlurmScriptBuilderBuilds a SLURM script tailored to distributed machine learning.
Uses the provided
MLSlurmBuilderConfigto build the script and inserts values as needed.- Parameters:
config (MLSlurmBuilderConfig) β Validated configuration controlling script generation.
Note
The given configuration object might be modified by some of the methods.
- config: MLSlurmBuilderConfigο
- get_exec_command() str[source]ο
Generates an execution command for the SLURM script. Considers whether ray is enabled or not and finds the appropriate expected bash function.
- get_pre_exec_command() str[source]ο
Generates a pre-execution command for the SLURM script. Adds a command to source the python venv if given and a command to export a container path variable if given.
- process_script() None[source]ο
Generate the SLURM script then print, save, and/or submit based on config flags.
Always renders the script (filling defaults, loading exec/pre-exec files).
Prints to stdout when neither
submit_jobnorsave_scriptis set.Saves to
save_dirwhensave_scriptis True.Submits via
sbatchwhensubmit_jobis True (ensures log dirs exist).
- itwinai.slurm.script_builder.generate_default_slurm_script(config: MLSlurmBuilderConfig) None[source]ο
Generates and optionally submits a default SLURM script from a validated config.
- itwinai.slurm.script_builder.process_builder(slurm_script_builder: MLSlurmBuilder)[source]ο
utilsο
- itwinai.slurm.utils.retrieve_remote_file(url: str) str[source]ο
Fetches remote file from url.
- Parameters:
url β URL to the raw configuration file (YAML/JSON format), e.g. raw GitHub link.
- itwinai.slurm.utils.remove_indentation_from_multiline_string(multiline_string: str) str[source]ο
Removes all indentation from the start of each line in a multi-line string.
If you want to remove only the shared indentation of all lines, thus preserving indentation for nested structures, use the builtin textwrap.dedent function instead.
The main purpose of this function is allowing you to define multi-line strings that only appear indented in the code, thus increasing readability.