MNIST datasetο
This section covers the MNIST use case. This use case has been implemented using three
different strategies, TensorFlow, PyTorch and PyTorch Lightning. You can
find the files relevant to this use case
in the use caseβs folder on Github.
For more information on each implementation, consult their respective READMEs:
Torch Lightningο
Integration author(s): Matteo Bunino (CERN)
Trainingο
# Download dataset and exit: only run first step in the pipeline (index=0)
itwinai exec-pipeline --config config.yaml --pipe-key training_pipeline --steps 0
# Run the whole training pipeline
itwinai exec-pipeline --config config.yaml --pipe-key training_pipeline
View training logs on MLFLow server (if activated from the configuration):
mlflow ui --backend-store-uri mllogs/mlflow/
PyTorchο
Integration author(s): Matteo Bunino (CERN)
In this simple use case integration we demostrate how to use itwinai for a set of simple use cases based on the popular MNIST dataset.
Training a CNN classifierο
It is possible to launch the training of a CNN classifier on the MNIST dataset using the YAML configuration file describing the whole training workflow.
# Run the whole training pipeline
itwinai exec-pipeline --config-name config.yaml
Notice that the training βpipelineβ starts by downloading the dataset if not available locally.
Since on some HPC systems there is no internet connection on the compute nodes, it is
advisable to run the dataloading step on the login node to download the dataset and, later,
the whole pipeline on the compute nodes. To do that, you can use the pipe_steps option as
below:
# Download dataset and exit
itwinai exec-pipeline --config-name config.yaml +pipe_steps=[dataloading_step]
# Run the whole pipeline
itwinai exec-pipeline --config-name config.yaml
Note
Setting HYDRA_FULL_ERROR=1 environment variable can be convenient when debugging errors
that originate during the instantiation of the pipeline.
View training logs on MLFLow server (if activated from the configuration):
mlflow ui --backend-store-uri mllogs/mlflow/
Hyper-parameter optimizationο
The CNN classifier can undergo hyper-parameter optimization (HPO) to find the hyper-parameters, such as learning rate and batch size, that result in the best validation performances.
To do so, it is enough to correctly set the search_space and the tune_config in the trainer
configuration in the config.yaml file.
Please refer to the Rayβs official documentation to know more about
RunConfig,
TuneConfig,
ScalingConfig,
and search spaces.
Inferenceο
Now you can use the trained model to make predictions on the MNIST dataset.
Notice that the inference is defined by using a different pipeline in the config.yaml file.
By default, the training_pipeline is executed, but you can run other piplines by explicitly
setting the +pipe_key option.
Create sample dataset
from dataloader import InferenceMNIST InferenceMNIST.generate_jpg_sample('mnist-sample-data/', 10)
Generate a dummy pre-trained neural network
import torch from model import Net dummy_nn = Net() torch.save(dummy_nn, 'mnist-pre-trained.pth')
Run inference command. This will generate a βmnist-predictionsβ folder containing a CSV file with the predictions as rows.
itwinai exec-pipeline --config-name config.yaml +pipe_key=inference_pipeline
Note the same entry point as for training.
Training a GANο
In this use case you can also find an example on how to train a Generative Adversarial Network
(GAN). All you need to do is specify that you wish to use the GAN by setting the +pipe_key
option.
# Train a GAN
itwinai exec-pipeline --config-name config.yaml +pipe_key=training_pipeline_gan
Docker imageο
Build from project root with
# Local
docker buildx build -t itwinai:0.0.1-mnist-torch-0.1 -f use-cases/mnist/torch/Dockerfile .
# Ghcr.io
docker buildx build -t ghcr.io/intertwin-eu/itwinai:0.0.1-mnist-torch-0.1 -f use-cases/mnist/torch/Dockerfile .
docker push ghcr.io/intertwin-eu/itwinai:0.0.1-mnist-torch-0.1
Training with Docker containerο
docker run -it --rm --name running-inference \
-v "$PWD":/usr/data ghcr.io/intertwin-eu/itwinai:0.01-mnist-torch-0.1 \
/bin/bash -c "itwinai exec-pipeline --print-config \
--config /usr/src/app/config.yaml \
--pipe-key training_pipeline \
-o dataset_root=/usr/data/mnist-dataset "
Inference with Docker containerο
From wherever a sample of MNIST jpg images is available (folder called βmnist-sample-data/β):
βββ $PWD
β βββ mnist-sample-data
| β βββ digit_0.jpg
| β βββ digit_1.jpg
| β βββ digit_2.jpg
...
| β βββ digit_N.jpg
docker run -it --rm --name running-inference \
-v "$PWD":/usr/data ghcr.io/intertwin-eu/itwinai:0.01-mnist-torch-0.1 \
/bin/bash -c "itwinai exec-pipeline --print-config \
--config /usr/src/app/config.yaml \
--pipe-key inference_pipeline \
-o test_data_path=/usr/data/mnist-sample-data \
-o inference_model_mlflow_uri=/usr/src/app/mnist-pre-trained.pth \
-o predictions_dir=/usr/data/mnist-predictions "
This command will store the results in a folder called βmnist-predictionsβ:
βββ $PWD
β βββ mnist-predictions
| β βββ predictions.csv