Distributed Data Parallelism

Deep neural networks (DNN) are often extremely large and are trained on massive amounts of data, more than most computers have memory for. Even smaller DNNs can take days to train. Distributed Data Parallelisation (DDP) adresses these two issues, long training times and limited memory, by using multiple machines to host and train both model and data.

Data parallelisation is an easy way for a developer to vastly reduce training times. Rather than using single-node parallelism, Distributed Data Parallelism (DDP) scales to multiple machnies. This scaling maximises parallelisability of your model and drastically reduces training times.

Another benefit of DDP is removal of single-machine memory constraints. Since a dataset or model can be stored across several machines, it becomes possible to analyse much larger datasets or models.

Below is a list of resources expanding on theoretical aspects and practical implementations of DDP:

Investigation of expected performance improvement: