CLI
Here you can find the itwinai CLI reference.
itwinai command line interface.
Usage:
$ [OPTIONS] COMMAND [ARGS]...
Options:
-v, --version: Show version and exit.--install-completion: Install completion for the current shell.--show-completion: Show completion for the current shell, to copy it or customize the installation.--help: Show this message and exit.
Commands:
generate-flamegraph: Generates a flamegraph from the given…generate-py-spy-report: Generates an aggregation of the raw py-spy…generate-scalability-report: Generates scalability reports for epoch…sanity-check: Run sanity checks on the installation of…check-distributed-cluster: This command provides a suite of tests for…generate-slurm: Generate a SLURM script from a…run: Launch ML jobs with dependency…exec-pipeline: Execute a pipeline from configuration file…mlflow-ui: Visualize logs with Mlflow.mlflow-server: Spawn Mlflow server.kill-mlflow-server: Kill Mlflow server.download-mlflow-data: Download metrics data from MLFlow…tensorboard-ui: Visualize logs with TensorBoard.upload-model-to-hub: Upload a model checkpoint to the AI Model…
generate-flamegraph
Generates a flamegraph from the given profiling output.
Usage:
$ generate-flamegraph [OPTIONS]
Options:
--file TEXT: The location of the raw profiling data. [required]--output-filename TEXT: The filename of the resulting flamegraph. [default: flamegraph.svg]--help: Show this message and exit.
generate-py-spy-report
Generates an aggregation of the raw py-spy profiling data, showing which leaf functions collected the most samples.
Usage:
$ generate-py-spy-report [OPTIONS]
Options:
--file TEXT: The location of the raw profiling data. [required]--num-rows TEXT: Number of rows to display. Pass ‘all’ to print the full table. [default: 10]--aggregate-leaf-paths / --no-aggregate-leaf-paths: Whether to aggregate all unique leaf calls across different call stacks. [default: no-aggregate-leaf-paths]--library-name TEXT: Which library name to find the lowest contact point of. [default: itwinai]--help: Show this message and exit.
generate-scalability-report
Generates scalability reports for epoch time, GPU data, and communication data based the mlflow logs.
This command processes runs under the given experiment at a tracking uri.
It generates plots and metrics for scalability analysis and saves them in the plot_dir.
Usage:
$ generate-scalability-report [OPTIONS]
Options:
--tracking-uri TEXT: The tracking URI of the MLFlow server. [default: mllogs/mlflow]--experiment-name TEXT: The name of the mlflow experiment to use for the GPU data report. [default: unnamed-experiment]--plot-dir TEXT: Which directory to save the resulting plots in. [default: plots]--run-names TEXT: Which run names to read, presented as comma-separated values e.g. ‘run0,run1’.--plot-file-suffix TEXT: Which file suffix to use for the plots. Useful for changing between raster and vector based images [default: .png]--include-communication / --no-include-communication: Include communication data in the scalability report. Disclaimer: Communication fractions are unreliable and vary significantly for different HPC systems. [default: no-include-communication]--no-warnings / --no-no-warnings: Create plots without warnings. [default: no-no-warnings]--help: Show this message and exit.
sanity-check
Run sanity checks on the installation of itwinai and its dependencies by trying to import itwinai modules. By default, only itwinai core modules (neither torch, nor tensorflow) are tested.
Usage:
$ sanity-check [OPTIONS]
Options:
--torch / --no-torch: Check also itwinai.torch modules. [default: no-torch]--tensorflow / --no-tensorflow: Check also itwinai.tensorflow modules. [default: no-tensorflow]--all / --no-all: Check all modules. [default: no-all]--optional-deps TEXT: List of optional dependencies.--help: Show this message and exit.
check-distributed-cluster
This command provides a suite of tests for a quick sanity check of the network setup for torch distributed. Useful when working with containers on HPC. Remember to prepend torchrun in front of this command or to start a Ray cluster.
Usage:
$ check-distributed-cluster [OPTIONS]
Options:
--platform TEXT: Hardware platform: nvidia or amd [default: nvidia]--launcher TEXT: Distributed ML cluster: torchrun or ray [default: torchrun]--help: Show this message and exit.
generate-slurm
Generate a SLURM script from a configuration file.
Usage:
$ generate-slurm [OPTIONS]
Options:
-c, --config TEXT: Path or URL to a YAML SLURM configuration file. [required]-j, --submit-job / --no-submit-job: Whether to submit the SLURM job after generating the script.-s, --save-script / --no-save-script: Whether to save the generated SLURM script to disk.--help: Show this message and exit.
run
Launch ML jobs with dependency installation and SLURM scheduling.
Usage:
$ run [OPTIONS]
Options:
-c, --config TEXT: Path or URL to a configuration file in yaml format. [required]-j, --submit-job / --no-submit-job: Whether to submit the SLURM job after generating the script.-s, --save-script / --no-save-script: Whether to save the generated SLURM script to disk.--help: Show this message and exit.
exec-pipeline
Execute a pipeline from configuration file using Hydra CLI. Allows dynamic override of fields which can be appended as a list of overrides (e.g., batch_size=32). By default, it will expect a configuration file called “config.yaml” in the current working directory. To override the default behavior set –config-name and –config-path. By default, this command will execute the whole pipeline under “training_pipeline” field in the configuration file. To execute a different pipeline you can override this by passing “+pipe_key=your_pipeline” in the list of overrides, and to execute only a subset of the steps, you can pass “+pipe_steps=[0,1]”.
Usage:
$ exec-pipeline [OPTIONS] [OVERRIDES]...
Arguments:
[OVERRIDES]...: Any key=value arguments to override config values (use dots for.nested=overrides), using the Hydra syntax.
Options:
--hydra-help / --no-hydra-help: Show Hydra’s help page [default: no-hydra-help]--version / --no-version: Show Hydra’s version and exit [default: no-version]-c, --cfg TEXT: Show config instead of running--resolve / --no-resolve: Used in conjunction with –cfg, resolve config interpolations before printing. [default: no-resolve]-p, --package TEXT: Config package to show-r, --run TEXT: Run a job-m, --multirun TEXT: Run multiple jobs with the configured launcher and sweeper-sc, --shell-completion TEXT: Install or Uninstall shell completion--strategy TEXT: Override the global ‘strategy’ field in the config (creates it if missing).--run-name TEXT: Override the global ‘run_name’ field in the config (creates it if missing).-cp, --config-path TEXT: Overrides the config_path specified in hydra.main(). The config_path is absolute, or relative to the current workign directory. Defaults to the current working directory.-cn, --config-name TEXT: Overrides the config_name specified in hydra.main() [default: config]-cd, --config-dir TEXT: Adds an additional config dir to the config search path--experimental-rerun TEXT: Rerun a job from a previous config pickle-i, --info TEXT: Print Hydra information--help: Show this message and exit.
mlflow-ui
Visualize logs with Mlflow.
Usage:
$ mlflow-ui [OPTIONS]
Options:
--path TEXT: Path to logs storage. [default: mllogs/mlflow]--port INTEGER: Port on which the MLFlow UI is listening. [default: 5000]--host TEXT: Which host to use. Switch to ‘0.0.0.0’ to e.g. allow for port-forwarding. [default: 127.0.0.1]--help: Show this message and exit.
mlflow-server
Spawn Mlflow server.
Usage:
$ mlflow-server [OPTIONS]
Options:
--path TEXT: Path to logs storage. [default: mllogs/mlflow]--port INTEGER: Port on which the server is listening. [default: 5000]--help: Show this message and exit.
kill-mlflow-server
Kill Mlflow server.
Usage:
$ kill-mlflow-server [OPTIONS]
Options:
--port INTEGER: Port on which the server is listening. [default: 5000]--help: Show this message and exit.
download-mlflow-data
Download metrics data from MLFlow experiments and save to a CSV file.
Requires MLFlow authentication if the server is configured to use it. Authentication must be provided via the following environment variables: ‘MLFLOW_TRACKING_USERNAME’ and ‘MLFLOW_TRACKING_PASSWORD’.
Usage:
$ download-mlflow-data [OPTIONS]
Options:
--tracking-uri TEXT: The tracking URI of the MLFlow server. [default: https://mlflow.intertwin.fedcloud.eu/]--experiment-id TEXT: The experiment ID that you wish to retrieve data from. [default: 48]--output-file TEXT: The file path to save the data to. [default: mlflow_data.csv]--help: Show this message and exit.
tensorboard-ui
Visualize logs with TensorBoard.
Usage:
$ tensorboard-ui [OPTIONS]
Options:
--path TEXT: Path to logs storage. [default: mllogs/tensorboard]--port INTEGER: Port on which the Tensorboard UI is listening. [default: 6006]--host TEXT: Which host to use. Switch to ‘0.0.0.0’ to e.g. allow for port-forwarding. [default: 127.0.0.1]--help: Show this message and exit.
upload-model-to-hub
Upload a model checkpoint to the AI Model Hub. Please note that this command requires internet connection to push to the model hub.
The model directory should contain:
model checkpoint file(s)
manifest.yaml with model id and metadata
(optional) metadata.json with additional information
Usage:
$ upload-model-to-hub [OPTIONS] MODEL_DIR
Arguments:
MODEL_DIR: Path to directory with model checkpoint, manifest.yaml and metadata. [required]
Options:
--hub-url TEXT: URL of model hub server. If not provided, use HYPHA_SERVER_URL or .env file--api-token TEXT: API token. If not provided, use HYPHA_TOKEN or .env file.--env-file TEXT: Path to .env file containing MODEL_HUB_URL and MODEL_HUB_API_TOKEN.--upload-script TEXT: Path to upload_model.py script. If not provided, downloads from GitHub.--help: Show this message and exit.