MNIST dataset

This section covers the MNIST use case. This use case has been implemented using three different strategies, TensorFlow, PyTorch and PyTorch Lightning. You can find the files relevant to this use case in the use case’s folder on Github.

For more information on each implementation, consult their respective READMEs:

Torch Lightning

Integration author(s): Matteo Bunino (CERN)

Training

# Download dataset and exit: only run first step in the pipeline (index=0)
itwinai exec-pipeline +pipe_key=training_pipeline +pipe_steps=[0]

# Run the whole training pipeline
itwinai exec-pipeline +pipe_key=training_pipeline

View training logs on MLFLow server (if activated from the configuration):

mlflow ui --backend-store-uri mllogs/mlflow/

PyTorch

Integration author(s): Matteo Bunino (CERN)

In this simple use case integration we demonstrate how to use itwinai for a set of simple use cases based on the popular MNIST dataset.

Training a CNN classifier

It is possible to launch the training of a CNN classifier on the MNIST dataset using the YAML configuration file describing the whole training workflow. In this case, the itwinai exec-pipeline command is used to execute a ML workflow defined in the config.yaml file. You can find more details on this command in the exec-pipeline CLI reference. If not specified differently, the pipeline defined under training_pipeline is selected by default.

# Run the whole training pipeline
itwinai exec-pipeline --config-name config.yaml

Notice that the training “pipeline” starts by downloading the dataset if not available locally. Since on some HPC systems there is no internet connection on the compute nodes, it is advisable to run the dataloading step on the login node to download the dataset and, later, the whole pipeline on the compute nodes. To do that, you can use the pipe_steps option as below:

# Download dataset and exit
itwinai exec-pipeline --config-name config.yaml +pipe_steps=[dataloading_step]

# Run the whole pipeline
itwinai exec-pipeline --config-name config.yaml

Note

Setting HYDRA_FULL_ERROR=1 environment variable can be convenient when debugging errors that originate during the instantiation of the pipeline.

View training logs on MLFLow server (if activated from the configuration):

mlflow ui --backend-store-uri mllogs/mlflow/

Hyper-parameter optimization

The CNN classifier can undergo hyper-parameter optimization (HPO) to find the hyper-parameters, such as learning rate and batch size, that result in the best validation performances.

To do so, it is enough to correctly set the search_space and the tune_config in the trainer configuration in the config.yaml file. Please refer to the Ray’s official documentation to know more about RunConfig, TuneConfig, ScalingConfig, and search spaces.

Inference

Now you can use the trained model to make predictions on the MNIST dataset. Notice that the inference is defined by using a different pipeline in the config.yaml file. By default, the training_pipeline is executed, but you can run other piplines by explicitly setting the +pipe_key option.

Create sample dataset

from dataloader import InferenceMNIST
InferenceMNIST.generate_jpg_sample('mnist-sample-data/', 10)

Generate a dummy pre-trained neural network

import torch
from model import Net
dummy_nn = Net()
torch.save(dummy_nn, 'mnist-pre-trained.pth')

Run inference command. This will generate a “mnist-predictions” folder containing a CSV file with the predictions as rows.
```
itwinai exec-pipeline --config-name config.yaml +pipe_key=inference_pipeline
```

Note the same entry point as for training.

Training a GAN

In this use case you can also find an example on how to train a Generative Adversarial Network (GAN). All you need to do is specify that you wish to use the GAN by setting the +pipe_key option.

# Train a GAN
itwinai exec-pipeline --config-name config.yaml +pipe_key=training_pipeline_gan

Docker image

Build from the repository’s root with:

# Replace ghcr.io with your preferred containers registry
docker build -t ghcr.io/intertwin-eu/itwinai-dev:mnist-torch-0.0.1 -f use-cases/mnist/torch/Dockerfile .

# Optionally, push the image to the containers registry
docker push ghcr.io/intertwin-eu/itwinai-dev:mnist-torch-0.0.1

Find more base image candidates under:

Training with Docker container

docker run -it --rm --name running-inference \
    -v "$PWD":/usr/data ghcr.io/intertwin-eu/itwinai-dev:mnist-torch-0.0.1 \
    /bin/bash -c "itwinai exec-pipeline \
    --config-path /app \
    +pipe_key=training_pipeline \
    dataset_root=/usr/data/mnist-dataset "

Inference with Docker container

From wherever a sample of MNIST jpg images is available (folder called ‘mnist-sample-data/’):

├── $PWD
│   ├── mnist-sample-data
|   │   ├── digit_0.jpg
|   │   ├── digit_1.jpg
|   │   ├── digit_2.jpg
...
|   │   ├── digit_N.jpg

docker run -it --rm --name running-inference \
    -v "$PWD":/usr/data ghcr.io/intertwin-eu/itwinai-dev:mnist-torch-0.0.1 \
    /bin/bash -c "itwinai exec-pipeline \
    --config-path /app \
    +pipe_key=inference_pipeline \
    test_data_path=/usr/data/mnist-sample-data \
    inference_model_mlflow_uri=/app/mnist-pre-trained.pth \
    predictions_dir=/usr/data/mnist-predictions "

This command will store the results in a folder called “mnist-predictions”:

├── $PWD
│   ├── mnist-predictions
|   │   ├── predictions.csv