ML-based extreme events detection and characterization (xtclim, CERFACS)ο
The code is adapted from CERFACSβ repository. The implementation of a pipeline with itwinai framework is shown below.
Methodο
Convolutional Variational AutoEncoder.
Inputο
β3D daily imagesβ, daily screenshots of Europe for three climate variables (maximum temperature, precipitation, wind).
Outputο
Error between original and reconstructed image: postprocessed for analysis in the
scenario_season_comparison.ipynbfile.Ideaο
The more unusual an image (anomaly), the higher error.
Information on filesο
In the preprocessing folder, the
preprocess_functions_2d_ssp.pyclass loads NetCDF files from adatafolder, which has to be specified indataset_rootin the config filepipeline.yaml(please change the location). The data can be found here. The given class normalizes and adjusts the data for the network. The functionpreprocess_2d_seasons.pysplits the data into seasonal files. Preprocessed data is stored in theinputfolder.The file
train.pytrains the network. Caution: It will overwrite the weights of the network already saved in outputs (unless you change the path nameoutputs/cvae_model_3d.pthin the script).The
anomaly.pyfile evaluates the network on the available datasets - train, test, and projection.Installationο
Please follow the documentation to install the itwinai environment. After that, install the required libraries within the itwinai environment with:
pip install -r requirements.txtHow to launch pipeline locallyο
The config file
pipeline.yamlcontains all the steps to execute the workflow. This file also contains all the seasons, and a separate run is launched for each season. You can launch the pipeline throughtrain.pyfrom the root of the repository with:python train.pyHow to launch pipeline on an HPC systemο
The
startscriptjob script can be used to launch a pipeline with SLURM on an HPC system. These steps should be followed to export the environment variables required by the script.# Distributed training with torch DistributedDataParallel PYTHON_VENV=".venv" DIST_MODE="ddp" RUN_NAME="ddp-cerfacs" sbatch --export=ALL,DIST_MODE="$DIST_MODE",RUN_NAME="$RUN_NAME",PYTHON_VENV="$PYTHON_VENV" \ startscriptThe results and/or errors are available in
job.outandjob.errlog files. Training and inference steps are defined in the pipeline, where distributed resources are exploited in both the steps.With MLFLow logger, the logs can be visualized in the MLFlow UI:
itwinai mlflow-ui --path mllogs/mlflow --port 5000 --host 127.0.0.1 # In background itwinai mlflow-ui --path mllogs/mlflow --port 5000 --host 127.0.0.1 &Hyperparameter Optimization (HPO)ο
The repository also provides functionality to perform HPO with Ray. With HPO, multiple trials with different hyperparameter configurations are run in a distributed infrastructure, typically in an HPC environment. This allows searching for optimal configurations that provide the minimal/maximal loss for the investigated network. The
hpo.pyfile contains the implementation, which launches thepipeline.yamlpipeline. To launch an HPO experiment, simply run:sbatch slurm_ray.shThe parsing arguments to the
hpo.pyfile can be changed to customize the required parameters that need to be considered in the HPO process.