Install itwinaiο
In this section, we will run you through the installation and give some instructions for the use of the itwinai framework for HPC and local systems.
If you are a developer, please refer to the developers installation guide.
User installationο
Requirements:
Linux or macOS environment. Windows was never tested.
Python virtual environmentο
Depending on your environment, there are different ways to select a specific python version.
Laptop or GPU nodeο
If you are working on a laptop or on a simple on-prem setup, you could consider using pyenv. See the installation instructions. If you are using pyenv, make sure to read this.
HPC environmentο
In HPC systems it is more popular to load dependencies using Environment Modules or Lmod. If you donβt know what modules to load, contact the system administrator to learn how to select the proper modules.
PyTorch environmentο
Commands to execute every time before installing or activating the python virtual environment for PyTorch:
Juelich Supercomputer (JSC):
ml --force purge ml Stages/2024 GCC OpenMPI CUDA/12 cuDNN MPI-settings/CUDA ml Python CMake HDF5 PnetCDF libaio mpi4py
Vega supercomputer:
ml --force purge ml Python/3.11.5-GCCcore-13.2.0 CMake/3.24.3-GCCcore-11.3.0 mpi4py OpenMPI CUDA/12.3 ml GCCcore/11.3.0 NCCL cuDNN/8.9.7.29-CUDA-12.3.0 UCX-CUDA/1.15.0-GCCcore-13.2.0-CUDA-12.3.0
TensorFlow environmentο
Commands to execute every time before installing or activating the python virtual environment for TensorFlow:
Juelich Supercomputer (JSC):
ml --force purge ml Stages/2024 GCC/12.3.0 OpenMPI CUDA/12 MPI-settings/CUDA ml Python/3.11 HDF5 PnetCDF libaio mpi4py CMake cuDNN/8.9.5.29-CUDA-12
Vega supercomputer:
ml --force purge ml Python/3.11.5-GCCcore-13.2.0 CMake/3.24.3-GCCcore-11.3.0 mpi4py OpenMPI CUDA/12.3 ml GCCcore/11.3.0 NCCL cuDNN/8.9.7.29-CUDA-12.3.0 UCX-CUDA/1.15.0-GCCcore-13.2.0-CUDA-12.3.0
Install itwinai for usersο
Install itwinai and its dependencies using the following command, and follow the instructions:
# First, load the required environment modules, if on an HPC
# Second, create a python virtual environment and activate it
$ python -m venv ENV_NAME
$ source ENV_NAME/bin/activate
# Install itwinai inside the environment
(ENV_NAME) $ export ML_FRAMEWORK="pytorch" # or "tensorflow"
(ENV_NAME) $ curl -fsSL https://github.com/interTwin-eu/itwinai/raw/main/env-files/itwinai-installer.sh | bash
The ML_FRAMEWORK environment variable controls whether you are installing
itwinai for PyTorch or TensorFlow.
[!WARNING] itwinai depends on Horovod, which requires
CMake>=1.13and other packages. Make sure to have them installed in your environment before proceeding.
Installation for developersο
If you are contributing to this repository, please continue below for more advanced instructions.
[!WARNING] Branch protection rules are applied to all branches which names match this regex:
[dm][ea][vi]*. When creating new branches, please avoid using names that match that regex, otherwise branch protection rules will block direct pushes to that branch.
Clone the itwinai repositoryο
git clone [--recurse-submodules] git@github.com:interTwin-eu/itwinai.git
Install itwinai environmentο
In this project, we are using uv as a project-wide package manager. Therefore, if
you are a developer, you should see the uv tutorial after reading
the following pip tutorial.
Installation using pipο
Creating a venvο
You can install the itwinai environment for development using pip. First, however,
you would want to make a Python venv if you havenβt already. Make sure you have
Python installed (on HPC you have to load it with module load Python), and then you
can create a venv with the following command:
python -m venv <name-of-venv>
For example, if I wanted to create a venv in the directory .venv (which is useful if
you use e.g. uv), then I would do:
python -m venv .venv
After this you can activate your venv using the following command:
source .venv/bin/activate
Now anything you pip install will be installed in your venv and if you run any python commands they will use the version from your venv.
Installation of packagesο
We provide some extras that can be activated depending on which platform you are using.
macos,amdornvidiadepending on which platform you use. Changes the version ofprov4ML.devfor development purposes. Includes libraries for testing and tensorboard etc.torchfor installation with PyTorch.
If you want to install PyTorch using CUDA then you also have to add an
--extra-index-url to the CUDA version that you want. Since you are developing the
library, you also want to enable the editable flag, -e, so that you donβt have to
reinstall everything every time you make a change. If you are on HPC, then you will
usually want to add the --no-cache-dir flag to avoid filling up your ~/.cache
directory, as you can very easily reach your disk quota otherwise. An example of a
complete command for installing as a developer on HPC with CUDA thus becomes:
pip install -e ".[torch,dev,nvidia,tf]" \
--no-cache-dir \
--extra-index-url https://download.pytorch.org/whl/cu121
If you wanted to install this locally on macOS (i.e. without CUDA) with PyTorch, you would do the following instead:
pip install -e ".[torch,dev,macos,tf]"
Horovod and DeepSpeedο
The above does not install Horovod and DeepSpeed, however, as they require a
specialized script. If you do not
require CUDA, then you can install them using pip as follows:
pip install --no-cache-dir --no-build-isolation git+https://github.com/horovod/horovod.git
pip install --no-cache-dir --no-build-isolation deepspeed
PyTorch (+ Lightning) virtual environment with makefilesο
Makefile targets for environment installation:
Juelich Supercomputer (JSC):
torch-gpu-jscVega supercomputer:
torch-env-vegaIn any other cases, when CUDA is available:
torch-envIn any other cases, when CUDA NOT is available (CPU-only installation):
torch-env-cpu
For instance, on a laptop with a CUDA-compatible GPU you can use:
make torch-env
When not on an HPC system, you can activate the python environment directly with:
source .venv-pytorch/bin/activate
Otherwise, if you are on an HPC system, please refer to this section explaining how to load the required environment modules before the python environment.
To build a Docker image for the pytorch version (need to adapt TAG):
# Local
docker buildx build -t itwinai:TAG -f env-files/torch/Dockerfile .
# Ghcr.io
docker buildx build -t ghcr.io/intertwin-eu/itwinai:TAG -f env-files/torch/Dockerfile .
docker push ghcr.io/intertwin-eu/itwinai:TAG
TensorFlow virtual environmentο
Makefile targets for environment installation:
Juelich Supercomputer (JSC):
tf-gpu-jscVega supercomputer:
tf-env-vegaIn any other case, when CUDA is available:
tensorflow-envIn any other case, when CUDA NOT is available (CPU-only installation):
tensorflow-env-cpu
For instance, on a laptop with a CUDA-compatible GPU you can use:
make tensorflow-env
When not on an HPC system, you can activate the python environment directly with:
source .venv-tf/bin/activate
Otherwise, if you are on an HPC system, please refer to this section explaining how to load the required environment modules before the python environment.
To build a Docker image for the tensorflow version (need to adapt TAG):
# Local
docker buildx build -t itwinai:TAG -f env-files/tensorflow/Dockerfile .
# Ghcr.io
docker buildx build -t ghcr.io/intertwin-eu/itwinai:TAG -f env-files/tensorflow/Dockerfile .
docker push ghcr.io/intertwin-eu/itwinai:TAG
Activate itwinai environment on HPCο
Usually, HPC systems organize their software in modules which need to be imported by the users every time they open a new shell, before activating a Python virtual environment.
Below you can find some examples on how to load the correct environment modules on the HPC systems we are currently working with.
Load modules before PyTorch virtual environmentο
Commands to be executed before activating the python virtual environment:
Juelich Supercomputer (JSC):
ml --force purge ml Stages/2024 GCC OpenMPI CUDA/12 cuDNN MPI-settings/CUDA ml Python CMake HDF5 PnetCDF libaio mpi4py
Vega supercomputer:
ml --force purge ml Python/3.11.5-GCCcore-13.2.0 CMake/3.24.3-GCCcore-11.3.0 mpi4py OpenMPI CUDA/12.3 ml GCCcore/11.3.0 NCCL cuDNN/8.9.7.29-CUDA-12.3.0 UCX-CUDA/1.15.0-GCCcore-13.2.0-CUDA-12.3.0
When not on an HPC: do nothing.
For instance, on JSC you can activate the PyTorch virtual environment in this way:
# Load environment modules
ml --force purge
ml Stages/2024 GCC OpenMPI CUDA/12 cuDNN MPI-settings/CUDA
ml Python CMake HDF5 PnetCDF libaio mpi4py
# Activate virtual env
source envAI_hdfml/bin/activate
Load modules before TensorFlow virtual environmentο
Commands to be executed before activating the python virtual environment:
Juelich Supercomputer (JSC):
ml --force purge ml Stages/2024 GCC/12.3.0 OpenMPI CUDA/12 MPI-settings/CUDA ml Python/3.11 HDF5 PnetCDF libaio mpi4py CMake cuDNN/8.9.5.29-CUDA-12
Vega supercomputer:
ml --force purge ml Python/3.11.5-GCCcore-13.2.0 CMake/3.24.3-GCCcore-11.3.0 mpi4py OpenMPI CUDA/12.3 ml GCCcore/11.3.0 NCCL cuDNN/8.9.7.29-CUDA-12.3.0 UCX-CUDA/1.15.0-GCCcore-13.2.0-CUDA-12.3.0
When not on an HPC: do nothing.
For instance, on JSC you can activate the TensorFlow virtual environment in this way:
# Load environment modules
ml --force purge
ml Stages/2024 GCC/12.3.0 OpenMPI CUDA/12 MPI-settings/CUDA
ml Python/3.11 HDF5 PnetCDF libaio mpi4py CMake cuDNN/8.9.5.29-CUDA-12
# Activate virtual env
source envAItf_hdfml/bin/activate
Test with pytestο
Do this only if you are a developer wanting to test your code with pytest.
First, you need to create virtual environments both for torch and tensorflow, following the instructions above, depending on the system that you are using (e.g., JSC).
To select the name of the torch and tf environments in which the tests will be
executed you can set the following environment variables.
If these env variables are not set, the testing suite will assume that the
PyTorch environment is under
.venv-pytorch and the TensorFlow environment is under .venv-tf.
export TORCH_ENV="my_torch_env"
export TF_ENV="my_tf_env"
Functional tests (marked with pytest.mark.functional) will be executed under
/tmp/pytest location to guarantee isolation among tests.
To run functional tests use:
pytest -v tests/ -m "functional"
[!NOTE] Depending on the system that you are using, we implemented a tailored Makefile target to run the test suite on it. Read these instructions until the end!
We provide some Makefile targets to run the whole test suite including unit, integration, and functional tests. Choose the right target depending on the system that you are using:
Makefile targets:
Juelich Supercomputer (JSC):
test-jscIn any other case:
test
For instance, to run the test suite on your laptop user:
make test
Working with Docker containersο
This section is intended for the developers of itwinai and outlines the practices used to manage container images through GitHub Container Registry (GHCR).
Terminology Recapο
Our container images follow the convention:
ghcr.io/intertwin-eu/IMAGE_NAME:TAG
For example, in ghcr.io/intertwin-eu/itwinai:0.2.2-torch2.6-jammy:
IMAGE_NAMEisitwinaiTAGis0.2.2-torch2.6-jammy
The TAG follows the convention:
[jlab-]X.Y.Z-(torch|tf)x.y-distro
Where:
X.Y.Zis the itwinai version(torch|tf)is an exclusive OR between βtorchβ and βtfβ. You can pick one or the other, but not both.x.yis the version of the ML framework (e.g., PyTorch or TensorFlow)distrois the OS distro in the container (e.g., Ubuntu Jammy)jlab-is prepended to the tag of images including JupyterLab
Image Names and Their Purposeο
We use different image names to group similar images under the same namespace:
itwinai: Production images. These should be well-maintained and orderly.itwinai-dev: Development images. Tags can vary, and may include random hashes.itwinai-cvmfs: Images that need to be made available through CVMFS via Unpacker.
[!WARNING] It is very important to keep the number of tags for
itwinai-cvmfsas low as possible. Tags should only be created under this namespace when strictly necessary. Otherwise, this could cause issues for the Unpacker.
Building a new containerο
Our docker manifests support labels to record provenance information, which can be lately
accessed by docker inspect IMAGE_NAME:TAG.
A full example below:
export BASE_IMG_NAME="what goes after the last FROM"
export IMAGE_FULL_NAME="IMAGE_NAME:TAG"
docker build \
-t "$IMAGE_FULL_NAME" \
-f path/to/Dockerfile \
--build-arg COMMIT_HASH="$(git rev-parse --verify HEAD)" \
--build-arg BASE_IMG_NAME="$BASE_IMG_NAME" \
--build-arg BASE_IMG_DIGEST="$(docker pull "$BASE_IMG_NAME" > /dev/null 2>&1 && docker inspect "$BASE_IMG_NAME" --format='{{index .RepoDigests 0}}')" \
--build-arg ITWINAI_VERSION="$(grep -Po '(?<=^version = ")[^"]*' pyproject.toml)" \
--build-arg CREATION_DATE="$(date +"%Y-%m-%dT%H:%M:%S%:z")" \
--build-arg IMAGE_FULL_NAME=$IMAGE_FULL_NAME \
.