User Installation (for Non-Developers)
This guide provides step-by-step instructions for installing the itwinai library for
users.
Setting up the system dependencies
First of all, before installing itwinai and its Python dependencies let’s make sure that the system dependencies such as CUDA drivers, compilers, and MPI libraries, are correctly set up.
Supported OSs are Linux and macOS.
Warning
On high-performance computing (HPC) systems, you must load the appropriate modules before creating or activating your Python virtual environment to ensure compatibility with system libraries.
On HPC systems, it is common to manage dependencies using Environment Modules or Lmod. These tools allow you to dynamically load and unload software modules, such as compilers, CUDA drivers, and MPI libraries. If you are unsure which modules to load for your application, contact your system administrator or refer to your HPC system’s documentation for specific guidance.
Find below the modules you should load on the supercomputers where we tested itwinai, depending
on whether you want PyTorch or TensorFlow support. If you are deploying itwinai on a
different HPC system, please refer to the Other HPCs tab.
Modules for the JUWELS system at Juelich Supercomputer (JSC):
ml --force purge
ml Stages/2025 GCC OpenMPI CUDA/12 cuDNN MPI-settings/CUDA
ml Python CMake HDF5 PnetCDF libaio mpi4py git
# Now you can create or active the python environment here
ml --force purge
ml Stages/2024 GCC/12.3.0 OpenMPI CUDA/12 MPI-settings/CUDA
ml Python/3.11 HDF5 PnetCDF libaio mpi4py CMake cuDNN/8.9.5.29-CUDA-12
# Now you can create or active the python environment here
Modules for Vega Supercomputer.
When installing the environment from the login node, make sure that the CUDA drivers are loaded correctly and the GPU is visible by running the nvidia-smi command. This is very important for a successful installation of DeepSpeed and Horovod. If the GPU is not correctly visualized, consider logging-in again to another login node. Alternatively, consider running the installation on a compute node.
ml --force purge
ml CMake/3.29.3-GCCcore-13.3.0
ml OpenMPI/4.1.6-GCC-13.2.0
ml cuDNN/8.9.7.29-CUDA-12.3.0
ml CUDA/12.6.0
ml NCCL/2.22.3-GCCcore-13.3.0-CUDA-12.6.0
ml Python/3.12.3-GCCcore-13.3.0
# Now you can create or active the python environment here
ml --force purge
ml CMake/3.29.3-GCCcore-13.3.0
ml OpenMPI/4.1.6-GCC-13.2.0
ml cuDNN/8.9.7.29-CUDA-12.3.0
ml CUDA/12.6.0
ml NCCL/2.22.3-GCCcore-13.3.0-CUDA-12.6.0
ml Python/3.12.3-GCCcore-13.3.0
# Now you can create or active the python environment here
Currently, the latest version of mpi4py on Vega is not compatible with Python 3.12,
therefore you’ll have to build it yourself in your python environment:
# Create the venv uv venv uv pip install --no-cache-dir --force-reinstall --no-binary=mpi4py mpi4py
On LUMI, Python virtual environments are discouraged in favour of containers as they create a large number of files, which affect the performances of the distributed storage system. Load the following modules before running commands in your AI containers:
ml --force purge ml LUMI partition/G module use /appl/local/containers/ai-modules module load singularity-AI-bindings
These modules are needed to bind into the container the correct software suite on LUMI. More info can be found here.
Module names and packaging conventions vary across HPC centres, but the underlying
software requirements for running itwinai are usually the same. The goal is to
load a consistent toolchain (compiler + MPI), a Python runtime, and (optionally) the
GPU communication stack.
The version numbers below are reference values known to work on at least one modern HPC software stack; other versions may work as well.
Recommended baseline (reference versions)
Python: >= 3.12 (e.g., 3.12.3)
CMake: >= 3.29 (e.g., 3.29.3) — required to compile Horovod from source
git: recent (e.g., 2.45.x)
Compiler + MPI (ABI compatibility matters)
GCC: a recent major release (e.g., 13.3.0)
MPI: a compatible MPI implementation (e.g., OpenMPI 5.0.x)
mpi4py: built/installed against the same MPI you loaded (e.g., 4.0.1)
Many sites provide these as a single toolchain module (e.g., a compiler+MPI bundle). That is fine as long as the compiler and MPI are internally compatible.
GPU software stack (only if you use GPUs)
CUDA toolkit/runtime: >= 12.6 (e.g., 12.6)
cuDNN: a CUDA-matched build (e.g., 9.5.0.* for CUDA 12)
NCCL: recommended for multi-GPU communication (version typically follows CUDA)
Notes:
For PyTorch, it is often possible to install wheels built for a CUDA version that differs from the system CUDA. However, DeepSpeed is more sensitive: it generally requires compatibility between the CUDA runtime on the system and the CUDA version used by the installed PyTorch build. In
itwinaithis is typically handled by selecting a PyTorch build compatible with your target CUDA (the project pins the PyTorch version inpyproject.toml).Some systems expose a separate module/setting to enable CUDA-aware MPI communication (distinct from the CUDA toolkit itself). If your site provides such a module, load it when running distributed GPU workloads.
Common scientific I/O libraries (as needed by your workflow/plugins)
HDF5: (e.g., 1.14.x)
PnetCDF (Parallel NetCDF): (e.g., 1.13.x)
DeepSpeed optional dependency
libaio: (e.g., 0.3.113) — enables asynchronous disk I/O for certain DeepSpeed features (optimizer states / checkpointing). DeepSpeed can still work without it, but some features may be disabled. If you do not have
libaioon your system, consider disabling AIO at build time (e.g., by not enablingDS_BUILD_AIO). This setting is activated by default in our DeepSpeed installation script.
A typical starting point (adapt module names to your site) is:
ml --force purge # Toolchain + MPI (or a site-provided toolchain module) ml <gcc-or-toolchain> <mpi> # Python runtime + build tools ml <python> <cmake> <git> # Optional: CUDA-aware MPI setting (only if your site provides it) ml <cuda-aware-mpi-setting> # Common HPC libraries (as needed by your workflow/plugins) ml <hdf5> <pnetcdf> <mpi4py> # GPU stack (only if running on GPUs) ml <cuda> <cudnn> <nccl> # Optional (only if needed by your setup) ml <libaio>
Note
While itwinai does not strictly require specific versions of CUDA, MPI or related libraries, deploying on a new HPC system can still expose compatibility issues (e.g., between the CUDA runtime, the framework build, and MPI/toolchain choices). We therefore encourage users to try a small set of software versions when setting up a new environment. Support for additional systems depends on contributor availability, but we welcome reports and improvements: please open a GitHub issue to share your findings, or submit a pull request with documentation updates that worked on your platform.
After using the commands above to load the modules, check which modules you loaded by running
the ml command in the terminal.
Creating a Python Virtual Environment
The suggested way of managing Python dependencies, including itwinai, is through Python virtual environments. Creating a virtual environment is allows to isolate dependencies and prevent conflicts with other Python projects.
Beware that some HPC centers advise against using Python virtual environments as they create a large amount of files, which can clog some distributed filesystems. In such situation, you should prefer using containers.
To manage python virtual environments we use UV, which can be installed from this page. Learn more on UV package manager from our UV tutorial
If you don’t already have a virtual environment, you can create one with the following command:
# Remember to load the software modules first (see section above)!
uv venv
# Alternatively to the command above, if you just want to use plain pip instead of UV
python -m venv .venv
Notice that a new directory called .venv is created to contain your virtual
environment. Now, you can start your virtual environment with the following command:
# Remember to load the software modules first (see section above)!
source .venv/bin/activate
Installing the itwinai Library
You can choose if you want to install itwinai with support for either PyTorch or
TensorFlow by using extras:
To install itwinai with PyTorch without GPU acceleration, you can use the
following command:
uv pip install "itwinai[torch]"
To enable GPU acceleration, you can use the following command:
uv pip install "itwinai[torch]" \
--extra-index-url https://download.pytorch.org/whl/cu121
To install itwinai with TensorFlow without GPU acceleration, you can use the
following command:
uv pip install "itwinai[tf]"
To enable GPU acceleration, you can use the following command:
uv pip install "itwinai[tf-cuda]"
Installing Horovod and Microsoft DeepSpeed
If you also want to install Horovod and Microsoft DeepSpeed for distributed ML with
PyTorch, then make sure to install them after itwinai. You can choose if you
want to do this with or without GPU (CUDA) support:
uv pip install --no-cache-dir --no-build-isolation git+https://github.com/horovod/horovod.git@3a31d93
uv pip install --no-cache-dir --no-build-isolation deepspeed==0.16.8
curl -fsSL https://github.com/interTwin-eu/itwinai/raw/main/env-files/torch/install-horovod-deepspeed-cuda.sh | bash
Warning
Horovod requires CMake>=3.13 and
other packages
Make sure to have them installed in your environment before proceeding.
Warning
The installation of Horovod and DeepSpeed needs to be executed on a machine/node where GPUs are available. On some HPC systems, such as the JUWELS system on JSC, GPUs are not available on login nodes (the host you connect to when you SSH into the system), only on compute nodes. On the JUWELS system, run this command to install DeepSpeed and Horovod directly from the repository’s root:
curl -fsSL https://github.com/interTwin-eu/itwinai/raw/main/env-files/torch/horovod-deepspeed-JSC.slurm | sbatch