MNIST dataset๏
This section covers the MNIST use case. This use case has been implemented using three
different strategies, TensorFlow, PyTorch and PyTorch Lightning. You can
find the files relevant to this use case
in the use caseโs folder on Github.
For more information on each implementation, consult their respective READMEs:
Torch Lightning๏
Integration author(s): Matteo Bunino (CERN)
Training๏
# Download dataset and exit: only run first step in the pipeline (index=0)
itwinai exec-pipeline +pipe_key=training_pipeline +pipe_steps=[0]
# Run the whole training pipeline
itwinai exec-pipeline +pipe_key=training_pipeline
View training logs on MLFLow server (if activated from the configuration):
mlflow ui --backend-store-uri mllogs/mlflow/
PyTorch๏
Integration author(s): Matteo Bunino (CERN)
In this simple use case integration we demonstrate how to use itwinai for a set of simple use cases based on the popular MNIST dataset.
Training a CNN classifier๏
It is possible to launch the training of a CNN classifier on the MNIST dataset using the
YAML configuration file describing the whole training workflow.
In this case, the itwinai exec-pipeline command is used to execute a ML workflow defined
in the config.yaml file. You can find more details on this command in the exec-pipeline CLI reference.
If not specified differently, the pipeline defined under training_pipeline is selected by
default.
# Run the whole training pipeline
itwinai exec-pipeline --config-name config.yaml
Notice that the training โpipelineโ starts by downloading the dataset if not available locally.
Since on some HPC systems there is no internet connection on the compute nodes, it is
advisable to run the dataloading step on the login node to download the dataset and, later,
the whole pipeline on the compute nodes. To do that, you can use the pipe_steps option as
below:
# Download dataset and exit
itwinai exec-pipeline --config-name config.yaml +pipe_steps=[dataloading_step]
# Run the whole pipeline
itwinai exec-pipeline --config-name config.yaml
Note
Setting HYDRA_FULL_ERROR=1 environment variable can be convenient when debugging errors
that originate during the instantiation of the pipeline.
View training logs on MLFLow server (if activated from the configuration):
mlflow ui --backend-store-uri mllogs/mlflow/
Hyper-parameter optimization๏
The CNN classifier can undergo hyper-parameter optimization (HPO) to find the hyper-parameters, such as learning rate and batch size, that result in the best validation performances.
To do so, it is enough to correctly set the search_space and the tune_config in the trainer
configuration in the config.yaml file.
Please refer to the Rayโs official documentation to know more about
RunConfig,
TuneConfig,
ScalingConfig,
and search spaces.
Inference๏
Now you can use the trained model to make predictions on the MNIST dataset.
Notice that the inference is defined by using a different pipeline in the config.yaml file.
By default, the training_pipeline is executed, but you can run other piplines by explicitly
setting the +pipe_key option.
Create sample dataset
from dataloader import InferenceMNIST InferenceMNIST.generate_jpg_sample('mnist-sample-data/', 10)
Generate a dummy pre-trained neural network
import torch from model import Net dummy_nn = Net() torch.save(dummy_nn, 'mnist-pre-trained.pth')
Run inference command. This will generate a โmnist-predictionsโ folder containing a CSV file with the predictions as rows.
itwinai exec-pipeline --config-name config.yaml +pipe_key=inference_pipeline
Note the same entry point as for training.
Training a GAN๏
In this use case you can also find an example on how to train a Generative Adversarial Network
(GAN). All you need to do is specify that you wish to use the GAN by setting the +pipe_key
option.
# Train a GAN
itwinai exec-pipeline --config-name config.yaml +pipe_key=training_pipeline_gan
Docker image๏
Build from the repositoryโs root with:
# Replace ghcr.io with your preferred containers registry
docker build -t ghcr.io/intertwin-eu/itwinai-dev:mnist-torch-0.0.1 -f use-cases/mnist/torch/Dockerfile .
# Optionally, push the image to the containers registry
docker push ghcr.io/intertwin-eu/itwinai-dev:mnist-torch-0.0.1
Find more base image candidates under:
https://github.com/interTwin-eu/itwinai/pkgs/container/itwinai
https://github.com/interTwin-eu/itwinai/pkgs/container/itwinai-dev
Training with Docker container๏
docker run -it --rm --name running-inference \
-v "$PWD":/usr/data ghcr.io/intertwin-eu/itwinai-dev:mnist-torch-0.0.1 \
/bin/bash -c "itwinai exec-pipeline \
--config-path /app \
+pipe_key=training_pipeline \
dataset_root=/usr/data/mnist-dataset "
Inference with Docker container๏
From wherever a sample of MNIST jpg images is available (folder called โmnist-sample-data/โ):
โโโ $PWD
โ โโโ mnist-sample-data
| โ โโโ digit_0.jpg
| โ โโโ digit_1.jpg
| โ โโโ digit_2.jpg
...
| โ โโโ digit_N.jpg
docker run -it --rm --name running-inference \
-v "$PWD":/usr/data ghcr.io/intertwin-eu/itwinai-dev:mnist-torch-0.0.1 \
/bin/bash -c "itwinai exec-pipeline \
--config-path /app \
+pipe_key=inference_pipeline \
test_data_path=/usr/data/mnist-sample-data \
inference_model_mlflow_uri=/app/mnist-pre-trained.pth \
predictions_dir=/usr/data/mnist-predictions "
This command will store the results in a folder called โmnist-predictionsโ:
โโโ $PWD
โ โโโ mnist-predictions
| โ โโโ predictions.csv