Normalizing flow for generating lattice field configurations (Lattice QCD, ETHZ/CSIC)ο
The code is adapted from this notebook from the Lattice QCD use case.
More information on the use case is available in the published deliverables, D4.2, D7.2 and D7.4.
About the use-case and integrationο
Integration author(s): Rakesh Sarma (Juelich)
This package provides utilities for implementing the method of normalizing flows as a generative model for lattice field theory. The method of normalizing flows is a powerful generative modeling approach that learns complex probability distributions by transforming samples from a simple distribution through a series of invertible transformations. It has found applications in various domains, including generative image modeling.
The package currently supports scalar theories in any dimension, and we are actively extending it to accommodate gauge theories, broadening its applicability.
In a nutshell, three essential components are required for the method of normalizing flows:
A prior distribution to draw initial samples.
A neural network to perform a series of invertible transformations on the samples.
An action that specifies the target distribution, defining the goal of the generative model.
The central high-level class of the package is called Model, which can be
instantiated by providing instances of the three objects mentioned above:
the prior, the neural network, and the action.
The Fitter class trains this instance of the Model class, which is
instantiated during the initialization of the Fitter class. The Fitter
class inherits from the TorchTrainer class provided by itwinai. Importantly,
the distributed training logic is obtained through this class and allows the
user to seamlessly integrate any one of the multiple distributed training
strategies configured in itwinai. Besides, it also provides functionalities
to perform profiling, logging and HyperParameter Optimization (HPO), which
are all inherited from the TorchTrainer class.
Important to note: Environment setupο
In order to minimize source files from normflow in the itwinai main repository,
some of the source files are cloned from a forked normflow repository, which can
be accessed here. The files
which are part of the itwinai integration for normflow are included in the itwinai
repository. In order to correctly build the environment for normflow, it is
important to follow these steps mentioned below.
Once
itwinaiis cloned and the environment is setup (see instructions here), please navigate to theuse-cases/lattice-qcddirectory:
cd use-cases/lattice-qcd
In this directory, the source files from the integration are included. The other files required to build the
normflowenvironment are imported from the forked repository mentioned above. To do this, please run the following lines on your terminal:
mkdir tmp-repo && cd tmp-repo
cd tmp-repo/
git init
git remote add origin https://github.com/r-sarma/normflow_itwinai_source.git
git sparse-checkout init --cone
git sparse-checkout set src
git pull origin master
mv src ../src
cd ..
rm -rf tmp-repo
These lines clone the required files from the forked repository, and then
removes the tmp-repo since this is not needed for setting up the environment.
Next, the file
_normflowcore.pyhas to be moved to the newly-createdsrcdirectory. This file contains theTrainerconfiguration and enables access to all the logic defined for distributed training strategy, logging, profiling and other functionalities offered byitwinai.
mv _normflowcore.py src/
When this is done, all the files required to install
normfloware present, which can be done with:
pip install .
Launching a training pipelineο
The training configuration is handled with yaml based configuration files.
An example of such a configuration file is provided in config.yaml. An example
to demonstrate the scalar theory in zero dimension, i.e., a scenario with one
point and one degree of freedom is provided in this file.
model:
_target_: normflow.Model
net_:
_target_: normflow.nn.DistConvertor_
knots_len: 10
symmetric: True
prior:
_target_: normflow.prior.NormalPrior
shape: [1]
action:
_target_: normflow.action.ScalarPhi4Action
kappa: 0
m_sq: -1.2
lambd: 0.5
In this example, we have:
Prior Distribution: A univariate normal distribution, represented with a tensor shaped (1,).
Action: A quartic scalar theory is defined with parameters
kappa=0,m_sq=-2.0, andlambda=0.2.Neural Network: The
DistConvertor_class is used to create the transformation network, withknots_len=10and symmetry enabled. Any instance of this class converts the probability distribution of inputs using a rational quadratic spline. In this example, the spline has 10 knots, and the distribution is assumed to be symmetric with respect to the origin.
The other parameters for training are specified within the same yaml file.
For conciseness, only some of the options, the config parameter, epochs and strategy
are shown in the snippet here.
config:
optim_lr: 0.001
weight_decay: 0.01
batch_size: 128
epochs: 100
strategy: "ddp"
These settings will train the model for 100 epochs with a batch size of 128,
learning rate optim_lr of 0.001 and weight_decay of 0.01. The strategy
for distributing the training is specified with the strategy flag, which is set
to ddp here, which implies the default PyTorch-based Distributed Data Parallel (DDP)
strategy.
In order to launch the pipeline, simply run:
itwinai exec-pipeline --config-name config.yaml +pipe_key=training_pipeline
The above code block results in an output similar to:
>>> Training progress (cpu) <<<
Note: log(q/p) is estimated with normalized p; mean & error are obtained from samples in a batch
Epoch: 1 | loss: -0.5387 | ess: 0.8755
ConsoleLogger: epoch_loss = -0.5386953949928284
Epoch: 10 | loss: -0.4528 | ess: 0.8750
ConsoleLogger: epoch_loss = -0.45278581976890564
Epoch: 20 | loss: -0.7644 | ess: 0.8989
ConsoleLogger: epoch_loss = -0.76436448097229
Epoch: 30 | loss: -0.6655 | ess: 0.8778
ConsoleLogger: epoch_loss = -0.6654551029205322
Epoch: 40 | loss: -0.7596 | ess: 0.8895
ConsoleLogger: epoch_loss = -0.7596336603164673
Epoch: 50 | loss: -0.7449 | ess: 0.8836
ConsoleLogger: epoch_loss = -0.7449398040771484
Epoch: 60 | loss: -0.7312 | ess: 0.8920
ConsoleLogger: epoch_loss = -0.7311607599258423
Epoch: 70 | loss: -0.8115 | ess: 0.8982
ConsoleLogger: epoch_loss = -0.8114994764328003
Epoch: 80 | loss: -0.8218 | ess: 0.9046
ConsoleLogger: epoch_loss = -0.8217660188674927
Epoch: 90 | loss: -0.7733 | ess: 0.9065
ConsoleLogger: epoch_loss = -0.7732580900192261
Epoch 100 | Model Snapshot saved at checkpoint.E100.tar
Epoch: 100 | loss: -0.9057 | ess: 0.9045
ConsoleLogger: epoch_loss = -0.9056745767593384
(cpu) Time = 2.72 sec.
###############################
# 'Fitter' executed in 4.229s #
###############################
#################################
# 'Pipeline' executed in 4.229s #
#################################
This output indicates the loss values at specified epochs during the training process, providing insight into the modelβs performance over time.
Alternatively, the model initialization and training can also be performed
using the train.py file, which provides another interface to the user to build
the models and launch the training with configurations files defined within the
Python script. One can use this script to launch the training with:
python train.py
For working on High Performance Computing (HPC) systems, a startscript.sh file
is provided. This can be launched by:
sbatch startscript.sh
In the startscript, nodes specifies the number of workers to be used for training.
You can efficiently scale your model training across multiple GPUs with any one of
the distributed training strategies configured in itwinai, which is read from the
config.yaml file in the startscript. This flexibility allows you to tackle
larger datasets and more complex models with ease. When using the train.py file
to launch the training, the last line in the startscript.sh file should be modified
to train.py.
This example demonstrates the flexibility of using the package to implement scalar field theories in a simplified zero-dimensional setting. It can be generalized to any dimension by changing the shape provided to the prior distribution.