Distributed machine learning training
Here you can find a collection of tutorials for distributing PyTorch and Tensorflow based workflows.
Distributed ML with PyTorch
- 1. Introduction to distributed ML with PyTorch
- 2. Distributed training on MNIST dataset
- 3. Using the itwinai TorchTrainer Class
- 4. GAN tutorial with PyTorch
- 5. PyTorch scaling test
- 6. itwinai and containers (Docker and Singularity)
- 7. Tutorial on Kubeflow and TorchTrainer class
- 8. Distributed Machine Learning on HPC from k8s using KubeRay operator and interLink
Distributed ML with TensorFlow
Machine learning workflows
Here you can find a collection of tutorials for various complexity ML workflows.
Hyperparameter Optimization Workflows
This tutorial provides an overview of Hyperparameter Optimization (HPO) workflows.