From 5f41463a7646fabcca371eba2de97562e8d568c3 Mon Sep 17 00:00:00 2001 From: binmakeswell Date: Fri, 14 Oct 2022 17:10:18 +0800 Subject: [PATCH] add optimizer README for tutorials (#1707) --- colossalai/nn/optimizer/README.md | 82 +++++++++++++++++++++++++++++++ 1 file changed, 82 insertions(+) create mode 100644 colossalai/nn/optimizer/README.md diff --git a/colossalai/nn/optimizer/README.md b/colossalai/nn/optimizer/README.md new file mode 100644 index 000000000..268e37d57 --- /dev/null +++ b/colossalai/nn/optimizer/README.md @@ -0,0 +1,82 @@ +# Colossal-AI Optimization Techniques + +## Introduction + +Welcome to the large-scale deep learning optimization techniques of [Colossal-AI](https://github.com/hpcaitech/ColossalAI), +which has been accepted as official tutorials by top conference [AAAI](https://aaai.org/Conferences/AAAI-23/), [PPoPP](https://ppopp23.sigplan.org/), etc. + + +[Colossal-AI](https://github.com/hpcaitech/ColossalAI), a unified deep learning system for the big model era, integrates +many advanced technologies such as multi-dimensional tensor parallelism, sequence parallelism, heterogeneous memory management, +large-scale optimization, adaptive task scheduling, etc. By using Colossal-AI, we could help users to efficiently and +quickly deploy large AI model training and inference, reducing large AI model training budgets and scaling down the labor cost of learning and deployment. + +### 🚀 Quick Links + +[**Colossal-AI**](https://github.com/hpcaitech/ColossalAI) | +[**Paper**](https://arxiv.org/abs/2110.14883) | +[**Documentation**](https://www.colossalai.org/) | +[**Forum**](https://github.com/hpcaitech/ColossalAI/discussions) | +[**Slack**](https://join.slack.com/t/colossalaiworkspace/shared_invite/zt-z7b26eeb-CBp7jouvu~r0~lcFzX832w) + + +## Table of Content + +Large transformer models display promising performance on a wide spectrum of AI applications. +Both academia and industry are scaling DL training on larger clusters. However, degrading generalization performance, non-negligible communication overhead, and increasing model size prevent DL researchers and engineers from exploring large-scale AI models. + +We aim to provide a clear sketch of the optimizations for large-scale deep learning with regard to model accuracy and model efficiency. +One way to achieve the goal of maintaining or improving the model accuracy in the large-scale setting while maintaining compute efficiency is to design algorithms that +are less communication and memory hungry. Notably, they are not mutually exclusive but can +be optimized jointly to further speed up training. + +1. Model Accuracy + - Gradient Descent Optimization + - Gradient Descent Variants + - Momentum + - Adaptive Gradient + - Large Batch Training Optimization + - LARS + - LAMB + - Generalization Gap + - Second-Order Optimization + - Hessian-Free + - K-FAC + - Shampoo + +2. Model Accuracy + - Communication Efficiency + - Reduce Volumn of Comm. + - Reduce Frequency of Comm. + - Memory Efficiency + - Mix-Precision Training + - Memory-Efficient Methods, e.g. ZeRO, Gemini, etc. + +Some of the above are still under development. **If you wish to make a contribution to this repository, please read the `Contributing` section below.** + +## Discussion + +Discussion about the Colossal-AI project is always welcomed! We would love to exchange ideas with the community to better help this project grow. +If you think there is a need to discuss anything, you may jump to our [Slack](https://join.slack.com/t/colossalaiworkspace/shared_invite/zt-z7b26eeb-CBp7jouvu~r0~lcFzX832w). + +If you encounter any problem while running these optimizers, you may want to raise an issue in this repository. + +## Contributing + +This project welcomes constructive ideas and implementations from the community. + +### Update an Optimizer + +If you find that an optimizer is broken (not working) or not user-friendly, you may put up a pull request to this repository and update this optimizer. + +### Add a New Optimizer + +If you wish to add an optimizer for a specific application, please follow the steps below. + +1. create the new optimizer file in the current folder +2. Prepare the corresponding example files in the [Examples](https://github.com/hpcaitech/ColossalAI-Examples) repository to prove effectiveness of the new optimizer +3. Prepare a detailed readme on environment setup, dataset preparation, code execution, etc. in your example folder +4. Update the table of content (last section above) in this readme file + + +If your PR is accepted, we may invite you to put up a tutorial or blog in [ColossalAI Documentation](https://colossalai.org/).