add optimizer README for tutorials (#1707)

2022-10-14 17:10:18 +08:00 · 2022-10-14 17:10:18 +08:00 · 5f41463a76
parent 6c331a5a09
commit 5f41463a76
1 changed files with 82 additions and 0 deletions
--- a/colossalai/nn/optimizer/README.md
+++ b/colossalai/nn/optimizer/README.md
@ -0,0 +1,82 @@
+# Colossal-AI Optimization Techniques
+
+## Introduction
+
+Welcome to the large-scale deep learning optimization techniques of [Colossal-AI](https://github.com/hpcaitech/ColossalAI), 
+which has been accepted as official tutorials by top conference [AAAI](https://aaai.org/Conferences/AAAI-23/), [PPoPP](https://ppopp23.sigplan.org/), etc.
+
+
+[Colossal-AI](https://github.com/hpcaitech/ColossalAI), a unified deep learning system for the big model era, integrates
+many advanced technologies such as multi-dimensional tensor parallelism, sequence parallelism, heterogeneous memory management,
+large-scale optimization, adaptive task scheduling, etc. By using Colossal-AI, we could help users to efficiently and 
+quickly deploy large AI model training and inference, reducing large AI model training budgets and scaling down the labor cost of learning and deployment.
+
+### 🚀 Quick Links
+
+[**Colossal-AI**](https://github.com/hpcaitech/ColossalAI) |
+[**Paper**](https://arxiv.org/abs/2110.14883) | 
+[**Documentation**](https://www.colossalai.org/) | 
+[**Forum**](https://github.com/hpcaitech/ColossalAI/discussions) | 
+[**Slack**](https://join.slack.com/t/colossalaiworkspace/shared_invite/zt-z7b26eeb-CBp7jouvu~r0~lcFzX832w)
+
+
+## Table of Content
+
+Large transformer models display promising performance on a wide spectrum of AI applications. 
+Both academia and industry are scaling DL training on larger clusters. However, degrading generalization performance, non-negligible communication overhead, and increasing model size prevent DL researchers and engineers from exploring large-scale AI models.
+
+We aim to provide a clear sketch of the optimizations for large-scale deep learning with regard to model accuracy and model efficiency. 
+One way to achieve the goal of maintaining or improving the model accuracy in the large-scale setting while maintaining compute efficiency is to design algorithms that
+are less communication and memory hungry. Notably, they are not mutually exclusive but can
+be optimized jointly to further speed up training.
+
+1. Model Accuracy
+    - Gradient Descent Optimization
+      - Gradient Descent Variants
+      - Momentum
+      - Adaptive Gradient
+    - Large Batch Training Optimization
+      - LARS
+      - LAMB
+      - Generalization Gap
+    - Second-Order Optimization
+      - Hessian-Free
+      - K-FAC
+      - Shampoo
+
+2. Model Accuracy
+    - Communication Efficiency
+      - Reduce Volumn of Comm.
+      - Reduce Frequency of Comm.
+    - Memory Efficiency
+      - Mix-Precision Training
+      - Memory-Efficient Methods, e.g. ZeRO, Gemini, etc.
+      
+Some of the above are still under development. **If you wish to make a contribution to this repository, please read the `Contributing` section below.**
+
+## Discussion
+
+Discussion about the Colossal-AI project is always welcomed! We would love to exchange ideas with the community to better help this project grow.
+If you think there is a need to discuss anything, you may jump to our [Slack](https://join.slack.com/t/colossalaiworkspace/shared_invite/zt-z7b26eeb-CBp7jouvu~r0~lcFzX832w).
+
+If you encounter any problem while running these optimizers, you may want to raise an issue in this repository.
+
+## Contributing
+
+This project welcomes constructive ideas and implementations from the community. 
+
+### Update an Optimizer
+
+If you find that an optimizer is broken (not working) or not user-friendly, you may put up a pull request to this repository and update this optimizer.
+
+### Add a New Optimizer
+
+If you wish to add an optimizer for a specific application, please follow the steps below.
+
+1. create the new optimizer file in the current folder
+2. Prepare the corresponding example files in the [Examples](https://github.com/hpcaitech/ColossalAI-Examples) repository to prove effectiveness of the new optimizer
+3. Prepare a detailed readme on environment setup, dataset preparation, code execution, etc. in your example folder
+4. Update the table of content (last section above) in this readme file
+
+
+If your PR is accepted, we may invite you to put up a tutorial or blog in [ColossalAI Documentation](https://colossalai.org/).