CLI

Here you can find the itwinai CLI reference.

Usage:

$ [OPTIONS] COMMAND [ARGS]...

Options:

--install-completion: Install completion for the current shell.
--show-completion: Show completion for the current shell, to copy it or customize the installation.
--help: Show this message and exit.

Commands:

generate-flamegraph: Generates a flamegraph from the given…
generate-py-spy-report: Generates a short aggregation of the raw…
generate-scalability-report: Generates scalability reports for epoch…
sanity-check: Run sanity checks on the installation of…
generate-slurm: Generates a default SLURM script using…
exec-pipeline: Execute a pipeline from configuration file…
mlflow-ui: Visualize logs with Mlflow.
mlflow-server: Spawn Mlflow server.
kill-mlflow-server: Kill Mlflow server.
download-mlflow-data: Download metrics data from MLFlow…

`generate-flamegraph`

Generates a flamegraph from the given profiling output.

Usage:

$ generate-flamegraph [OPTIONS]

Options:

--file TEXT: The location of the raw profiling data. [required]
--output-filename TEXT: The filename of the resulting flamegraph. [default: flamegraph.svg]
--help: Show this message and exit.

`generate-py-spy-report`

Generates a short aggregation of the raw py-spy profiling data, showing which leaf functions collected the most samples.

Usage:

$ generate-py-spy-report [OPTIONS]

Options:

--file TEXT: The location of the raw profiling data. [required]
--num-rows TEXT: Number of rows to display. Pass ‘all’ to print the full table. [default: 10]
--aggregate-leaf-paths / --no-aggregate-leaf-paths: Whether to aggregate all unique leaf calls across different call stacks. [default: no-aggregate-leaf-paths]
--library-name TEXT: Which library name to find the lowest contact point of. [default: itwinai]
--help: Show this message and exit.

`generate-scalability-report`

Generates scalability reports for epoch time, GPU data, and communication data based on log files in the specified directory. Optionally, backups of the reports can be created.

This command processes log files stored in specific subdirectories under the given log_dir. It generates plots and metrics for scalability analysis and saves them in the plot_dir. If backups are enabled, the generated reports will also be copied to a backup directory under backup_root_dir.

Usage:

$ generate-scalability-report [OPTIONS]

Options:

--log-dir TEXT: Which directory to search for the scalability metrics in. [default: scalability-metrics]
--plot-dir TEXT: Which directory to save the resulting plots in. [default: plots]
--do-backup / --no-do-backup: Whether to store a backup of the scalability metrics that were used to make the report or not. [default: no-do-backup]
--run-ids TEXT: Which run ids to read, presented as comma-separated values, e.g. ‘run0,run1’.
--backup-root-dir TEXT: Which directory to store the backup files in. [default: backup-scalability-metrics/]
--plot-file-suffix TEXT: Which file suffix to use for the plots. Useful for changing between raster and vector based images [default: .png]
--help: Show this message and exit.

`sanity-check`

Run sanity checks on the installation of itwinai and its dependencies by trying to import itwinai modules. By default, only itwinai core modules (neither torch, nor tensorflow) are tested.

Usage:

$ sanity-check [OPTIONS]

Options:

--torch / --no-torch: Check also itwinai.torch modules. [default: no-torch]
--tensorflow / --no-tensorflow: Check also itwinai.tensorflow modules. [default: no-tensorflow]
--all / --no-all: Check all modules. [default: no-all]
--optional-deps TEXT: List of optional dependencies.
--help: Show this message and exit.

`generate-slurm`

Generates a default SLURM script using arguments and optionally a configuration file.

Usage:

$ generate-slurm [OPTIONS]

Options:

--job-name TEXT: The name of the SLURM job.
--account TEXT: The billing account for the SLURM job. [default: intertwin]
--time TEXT: The time limit of the SLURM job. [default: 00:30:00]
--partition TEXT: Which partition of the cluster the SLURM job is going to run on. [default: develbooster]
--std-out TEXT: The standard out file.
--err-out TEXT: The error out file.
--num-nodes INTEGER: The number of nodes that the SLURM job is going to run on. [default: 1]
--num-tasks-per-node INTEGER: The number of tasks per node. [default: 1]
--gpus-per-node INTEGER: The requested number of GPUs per node. [default: 4]
--cpus-per-gpu INTEGER: The requested number of CPUs per GPU. [default: 4]
--config-path TEXT: The path to the directory containing the config file to use for training. [default: .]
--config-name TEXT: The name of the config file to use for training. [default: config]
--pipe-key TEXT: Which pipe key to use for running the pipeline. [default: rnn_training_pipeline]
--mode TEXT: Which mode to run, e.g. scaling test, all strategies, or a single run. [default: single]
--dist-strat TEXT: Which distributed strategy to use. [default: ddp]
--pre-exec-cmd TEXT: The pre-execution command to use for the python script.
--training-cmd TEXT: The training command to use for the python script.
--python-venv TEXT: Which python venv to use for running the command. [default: .venv]
--scalability-nodes TEXT: A comma-separated list of node numbers to use for the scalability test. [default: 1,2,4,8]
--debug: Whether to include debugging information or not
--no-save-script: Whether to save the script after processing it.
--no-submit-job: Whether to submit the job when processing the script.
--config TEXT: The path to the SLURM configuration file.
--py-spy: Whether to activate profiling with py-spy or not
--profiling-rate INTEGER: The rate at which to profile with the py-spy profiler. [default: 10]
--help: Show this message and exit.

`exec-pipeline`

Execute a pipeline from configuration file using Hydra CLI. Allows dynamic override of fields which can be appended as a list of overrides (e.g., batch_size=32). By default, it will expect a configuration file called “config.yaml” in the current working directory. To override the default behavior set –config-name and –config-path. By default, this command will execute the whole pipeline under “training_pipeline” field in the configuration file. To execute a different pipeline you can override this by passing “+pipe_key=your_pipeline” in the list of overrides, and to execute only a subset of the steps, you can pass “+pipe_steps=[0,1]”.

Usage:

$ exec-pipeline [OPTIONS] [OVERRIDES]...

Arguments:

[OVERRIDES]...: Any key=value arguments to override config values (use dots for.nested=overrides), using the Hydra syntax.

Options:

--hydra-help / --no-hydra-help: Show Hydra’s help page [default: no-hydra-help]
--version / --no-version: Show Hydra’s version and exit [default: no-version]
-c, --cfg TEXT: Show config instead of running
--resolve / --no-resolve: Used in conjunction with –cfg, resolve config interpolations before printing. [default: no-resolve]
-p, --package TEXT: Config package to show
-r, --run TEXT: Run a job
-m, --multirun TEXT: Run multiple jobs with the configured launcher and sweeper
-sc, --shell-completion TEXT: Install or Uninstall shell completion
-cp, --config-path TEXT: Overrides the config_path specified in hydra.main(). The config_path is absolute, or relative to the current workign directory. Defaults to the current working directory.
-cn, --config-name TEXT: Overrides the config_name specified in hydra.main() [default: config]
-cd, --config-dir TEXT: Adds an additional config dir to the config search path
--experimental-rerun TEXT: Rerun a job from a previous config pickle
-i, --info TEXT: Print Hydra information
--help: Show this message and exit.

`mlflow-ui`

Visualize logs with Mlflow.

Usage:

$ mlflow-ui [OPTIONS]

Options:

--path TEXT: Path to logs storage. [default: mllogs/mlflow]
--port INTEGER: Port on which the MLFlow UI is listening. [default: 5000]
--host TEXT: Which host to use. Switch to ‘0.0.0.0’ to e.g. allow for port-forwarding. [default: 127.0.0.1]
--help: Show this message and exit.

`mlflow-server`

Spawn Mlflow server.

Usage:

$ mlflow-server [OPTIONS]

Options:

--path TEXT: Path to logs storage. [default: mllogs/mlflow]
--port INTEGER: Port on which the server is listening. [default: 5000]
--help: Show this message and exit.

`kill-mlflow-server`

Kill Mlflow server.

Usage:

$ kill-mlflow-server [OPTIONS]

Options:

--port INTEGER: Port on which the server is listening. [default: 5000]
--help: Show this message and exit.

`download-mlflow-data`

Download metrics data from MLFlow experiments and save to a CSV file.

Requires MLFlow authentication if the server is configured to use it. Authentication must be provided via the following environment variables: ‘MLFLOW_TRACKING_USERNAME’ and ‘MLFLOW_TRACKING_PASSWORD’.

Usage:

$ download-mlflow-data [OPTIONS]

Options:

--tracking-uri TEXT: The tracking URI of the MLFlow server. [default: https://mlflow.intertwin.fedcloud.eu/]
--experiment-id TEXT: The experiment ID that you wish to retrieve data from. [default: 48]
--output-file TEXT: The file path to save the data to. [default: mlflow_data.csv]
--help: Show this message and exit.

CLI

generate-flamegraph

generate-py-spy-report

generate-scalability-report

sanity-check

generate-slurm

exec-pipeline

mlflow-ui

mlflow-server

kill-mlflow-server

download-mlflow-data

`generate-flamegraph`

`generate-py-spy-report`

`generate-scalability-report`

`sanity-check`

`generate-slurm`

`exec-pipeline`

`mlflow-ui`

`mlflow-server`

`kill-mlflow-server`

`download-mlflow-data`