History

Hongxin Liu ae02d4e4f7 [bf16] add bf16 support (#3882 ) * [bf16] add bf16 support for fused adam (#3844) * [bf16] fused adam kernel support bf16 * [test] update fused adam kernel test * [test] update fused adam test * [bf16] cpu adam and hybrid adam optimizers support bf16 (#3860) * [bf16] implement mixed precision mixin and add bf16 support for low level zero (#3869) * [bf16] add mixed precision mixin * [bf16] low level zero optim support bf16 * [text] update low level zero test * [text] fix low level zero grad acc test * [bf16] add bf16 support for gemini (#3872) * [bf16] gemini support bf16 * [test] update gemini bf16 test * [doc] update gemini docstring * [bf16] add bf16 support for plugins (#3877) * [bf16] add bf16 support for legacy zero (#3879) * [zero] init context support bf16 * [zero] legacy zero support bf16 * [test] add zero bf16 test * [doc] add bf16 related docstring for legacy zero		2023-06-05 15:58:31 +08:00
..
README.md	[format] applied code formatting on changed files in pull request 2997 (#3008 )	2023-03-06 10:42:22 +08:00
__init__.py	[hotfix] fix CPUAdam kernel nullptr (#1410 )	2022-08-05 19:45:45 +08:00
colossalai_optimizer.py	[checkpoint] add ColoOptimizer checkpointing (#1316 )	2022-07-15 09:52:55 +08:00
cpu_adam.py	[bf16] add bf16 support (#3882 )	2023-06-05 15:58:31 +08:00
fused_adam.py	[bf16] add bf16 support (#3882 )	2023-06-05 15:58:31 +08:00
fused_lamb.py	[doc] updated kernel-related optimisers' docstring (#2385 )	2023-01-09 17:13:53 +08:00
fused_sgd.py	[doc] updated kernel-related optimisers' docstring (#2385 )	2023-01-09 17:13:53 +08:00
hybrid_adam.py	[bf16] add bf16 support (#3882 )	2023-06-05 15:58:31 +08:00
lamb.py	[hotfix] adapt ProcessGroup and Optimizer to ColoTensor (#1388 )	2022-07-29 19:33:24 +08:00
lars.py	Fixed docstring in colossalai (#171 )	2022-01-21 10:44:30 +08:00
nvme_optimizer.py	[gemini] gemini supports lazy init (#3379 )	2023-04-12 16:03:25 +08:00

README.md

Colossal-AI Optimization Techniques

Introduction

Welcome to the large-scale deep learning optimization techniques of Colossal-AI, which has been accepted as official tutorials by top conference SC, AAAI, PPoPP, CVPR, ISC, etc.

Colossal-AI, a unified deep learning system for the big model era, integrates many advanced technologies such as multi-dimensional tensor parallelism, sequence parallelism, heterogeneous memory management, large-scale optimization, adaptive task scheduling, etc. By using Colossal-AI, we could help users to efficiently and quickly deploy large AI model training and inference, reducing large AI model training budgets and scaling down the labor cost of learning and deployment.

🚀 Quick Links

Colossal-AI | Paper | Documentation | Forum | Slack

Table of Content

Large transformer models display promising performance on a wide spectrum of AI applications. Both academia and industry are scaling DL training on larger clusters. However, degrading generalization performance, non-negligible communication overhead, and increasing model size prevent DL researchers and engineers from exploring large-scale AI models.

We aim to provide a clear sketch of the optimizations for large-scale deep learning with regard to model accuracy and model efficiency. One way to achieve the goal of maintaining or improving the model accuracy in the large-scale setting while maintaining compute efficiency is to design algorithms that are less communication and memory hungry. Notably, they are not mutually exclusive but can be optimized jointly to further speed up training.

Model Accuracy
- Gradient Descent Optimization
  - Gradient Descent Variants
  - Momentum
  - Adaptive Gradient
- Large Batch Training Optimization
  - LARS
  - LAMB
  - Generalization Gap
- Second-Order Optimization
  - Hessian-Free
  - K-FAC
  - Shampoo
Model Accuracy
- Communication Efficiency
  - Reduce Volumn of Comm.
  - Reduce Frequency of Comm.
- Memory Efficiency
  - Mix-Precision Training
  - Memory-Efficient Methods, e.g. ZeRO, Gemini, etc.

Some of the above are still under development. If you wish to make a contribution to this repository, please read the Contributing section below.

Discussion

Discussion about the Colossal-AI project is always welcomed! We would love to exchange ideas with the community to better help this project grow. If you think there is a need to discuss anything, you may jump to our Slack.

If you encounter any problem while running these optimizers, you may want to raise an issue in this repository.

Contributing

This project welcomes constructive ideas and implementations from the community.

Update an Optimizer

If you find that an optimizer is broken (not working) or not user-friendly, you may put up a pull request to this repository and update this optimizer.

Add a New Optimizer

If you wish to add an optimizer for a specific application, please follow the steps below.

create the new optimizer file in the current folder
Prepare the corresponding example files in the Examples repository to prove effectiveness of the new optimizer
Prepare a detailed readme on environment setup, dataset preparation, code execution, etc. in your example folder
Update the table of content (last section above) in this readme file

If your PR is accepted, we may invite you to put up a tutorial or blog in ColossalAI Documentation.