Distributed machine learning training
Here you can find a collection of tutorials for distributing PyTorch and Tensorflow based workflows.
Distributed ML with PyTorch
- 1. Introduction to distributed ML with PyTorch
- 2. Distributed training on MNIST dataset
- 3. Using the itwinai TorchTrainer Class
- 4. GAN tutorial with PyTorch
- 5. PyTorch scaling test
- 6. itwinai and containers (Docker and Singularity)
- 7. Tutorial on Kubeflow and TorchTrainer class
- 8. Distributed Machine Learning on HPC from k8s using KubeRay operator and interLink
Distributed ML with TensorFlow
Machine Learning Workflows
Here you can find a collection of tutorials for various complexity ML workflows.
Hyperparameter Optimization
This tutorial provides an overview of Hyperparameter Optimization (HPO) workflows.
Code Profiling and Optimization
Here you can find our tutorials on how to do profiling with itwinai: