History

binmakeswell 11ee8ae478 [tutorial] add cifar10 for diffusion (#1907 )		2022-11-11 19:03:50 +08:00
..
auto_parallel	[tutorial] removed duplicated tutorials (#1904 )	2022-11-11 17:23:40 +08:00
hybrid_parallel	[tutorial] edited hands-on practices (#1899 )	2022-11-11 17:08:17 +08:00
large_batch_optimizer	[tutorial] edited hands-on practices (#1899 )	2022-11-11 17:08:17 +08:00
opt	[tutorial] edited hands-on practices (#1899 )	2022-11-11 17:08:17 +08:00
sequence_parallel	[tutorial] edited hands-on practices (#1899 )	2022-11-11 17:08:17 +08:00
stable_diffusion	[tutorial] add cifar10 for diffusion (#1907 )	2022-11-11 19:03:50 +08:00
README.md	[example] initialize tutorial (#1865 )	2022-11-10 14:05:27 +08:00

README.md

Colossal-AI Tutorial Hands-on

Introduction

Welcome to the Colossal-AI tutorial, which has been accepted as official tutorials by top conference SC, AAAI, PPoPP, etc.

Colossal-AI, a unified deep learning system for the big model era, integrates many advanced technologies such as multi-dimensional tensor parallelism, sequence parallelism, heterogeneous memory management, large-scale optimization, adaptive task scheduling, etc. By using Colossal-AI, we could help users to efficiently and quickly deploy large AI model training and inference, reducing large AI model training budgets and scaling down the labor cost of learning and deployment.

🚀 Quick Links

Colossal-AI | Paper | Documentation | Forum | Slack

Table of Content

Multi-dimensional Parallelism
- Know the components and sketch of Colossal-AI
- Step-by-step from PyTorch to Colossal-AI
- Try data/pipeline parallelism and 1D/2D/2.5D/3D tensor parallelism using a unified model
Sequence Parallelism
- Try sequence parallelism with BERT
- Combination of data/pipeline/sequence parallelism
- Faster training and longer sequence length
Auto-Parallelism
- Parallelism with normal non-distributed training code
- Model tracing + solution solving + runtime communication inserting all in one auto-parallelism system
- Try single program, multiple data (SPMD) parallel with auto-parallelism SPMD solver on ResNet50
Large Batch Training Optimization
- Comparison of small/large batch size with SGD/LARS optimizer
- Acceleration from a larger batch size
Fine-tuning and Serving for OPT from Hugging Face
- Try OPT model imported from Hugging Face with Colossal-AI
- Fine-tuning OPT with limited hardware using ZeRO, Gemini and parallelism
- Deploy the fine-tuned model to inference service
Acceleration of Stable Diffusion
- Stable Diffusion with Lightning
- Try Lightning Colossal-AI strategy to optimize memory and accelerate speed

Discussion

Discussion about the Colossal-AI project is always welcomed! We would love to exchange ideas with the community to better help this project grow. If you think there is a need to discuss anything, you may jump to our Slack.

If you encounter any problem while running these tutorials, you may want to raise an issue in this repository.