ColossalAI/docs/source/en/concepts/colossalai_overview.md

# Colossal-AI Overview

Author: Shenggui Li, Siqi Mai

## About Colossal-AI

With the development of deep learning model size, it is important to shift to a new training paradigm. The traditional training method with no parallelism and optimization became a thing of the past and new training methods are the key to make training large-scale models efficient and cost-effective.

Colossal-AI is designed to be a unified system to provide an integrated set of training skills and utilities to the user. You can find the common training utilities such as mixed precision training and gradient accumulation. Besides, we provide an array of parallelism including data, tensor and pipeline parallelism. We optimize tensor parallelism with different multi-dimensional distributed matrix-matrix multiplication algorithm. We also provided different pipeline parallelism methods to allow the user to scale their model across nodes efficiently. More advanced features such as offloading can be found in this tutorial documentation in detail as well.

## General Usage

We aim to make Colossal-AI easy to use and non-intrusive to user code. There is a simple general workflow if you want to use Colossal-AI.

<figure style={{textAlign: "center"}}>
<img src="https://s2.loli.net/2022/01/28/ZK7ICWzbMsVuJof.png"/>
<figcaption>Workflow</figcaption>
</figure>

1. Prepare a configuration file where specifies the features you want to use and your parameters.
2. Initialize distributed backend with `colossalai.launch`
3. Inject the training features into your training components (e.g. model, optimizer) with `colossalai.booster`.
4. Run training and testing

We will cover the whole workflow in the `basic tutorials` section.

## Future Development

The Colossal-AI system will be expanded to include more training skills, these new developments may include but are not limited to:

1. optimization of distributed operations
2. optimization of training on heterogenous system
3. implementation of training utilities to reduce model size and speed up training while preserving model performance
4. expansion of existing parallelism methods

We welcome ideas and contribution from the community and you can post your idea for future development in our forum.

<!-- doc-test-command: echo "colossalai_overview.md does not need test"  -->
[doc] migrate the markdown files (#2652) 2023-02-09 06:21:38 +00:00			`# Colossal-AI Overview`

			`Author: Shenggui Li, Siqi Mai`

			`## About Colossal-AI`

			`With the development of deep learning model size, it is important to shift to a new training paradigm. The traditional training method with no parallelism and optimization became a thing of the past and new training methods are the key to make training large-scale models efficient and cost-effective.`

[doc] Fix typo under colossalai and doc(#3618) * Fixed several spelling errors under colossalai * Fix the spelling error in colossalai and docs directory * Cautious Changed the spelling error under the example folder * Update runtime_preparation_pass.py revert autograft to autograd * Update search_chunk.py utile to until * Update check_installation.py change misteach to mismatch in line 91 * Update 1D_tensor_parallel.md revert to perceptron * Update 2D_tensor_parallel.md revert to perceptron in line 73 * Update 2p5D_tensor_parallel.md revert to perceptron in line 71 * Update 3D_tensor_parallel.md revert to perceptron in line 80 * Update README.md revert to resnet in line 42 * Update reorder_graph.py revert to indice in line 7 * Update p2p.py revert to megatron in line 94 * Update initialize.py revert to torchrun in line 198 * Update routers.py change to detailed in line 63 * Update routers.py change to detailed in line 146 * Update README.md revert random number in line 402 2023-04-26 03:38:43 +00:00			Colossal-AI is designed to be a unified system to provide an integrated set of training skills and utilities to the user. You can find the common training utilities such as mixed precision training and gradient accumulation. Besides, we provide an array of parallelism including data, tensor and pipeline parallelism. We optimize tensor parallelism with different multi-dimensional distributed matrix-matrix multiplication algorithm. We also provided different pipeline parallelism methods to allow the user to scale their model across nodes efficiently. More advanced features such as offloading can be found in this tutorial documentation in detail as well.
[doc] migrate the markdown files (#2652) 2023-02-09 06:21:38 +00:00
			`## General Usage`

[doc] Fix typo under colossalai and doc(#3618) * Fixed several spelling errors under colossalai * Fix the spelling error in colossalai and docs directory * Cautious Changed the spelling error under the example folder * Update runtime_preparation_pass.py revert autograft to autograd * Update search_chunk.py utile to until * Update check_installation.py change misteach to mismatch in line 91 * Update 1D_tensor_parallel.md revert to perceptron * Update 2D_tensor_parallel.md revert to perceptron in line 73 * Update 2p5D_tensor_parallel.md revert to perceptron in line 71 * Update 3D_tensor_parallel.md revert to perceptron in line 80 * Update README.md revert to resnet in line 42 * Update reorder_graph.py revert to indice in line 7 * Update p2p.py revert to megatron in line 94 * Update initialize.py revert to torchrun in line 198 * Update routers.py change to detailed in line 63 * Update routers.py change to detailed in line 146 * Update README.md revert random number in line 402 2023-04-26 03:38:43 +00:00			`We aim to make Colossal-AI easy to use and non-intrusive to user code. There is a simple general workflow if you want to use Colossal-AI.`
[doc] migrate the markdown files (#2652) 2023-02-09 06:21:38 +00:00
			`<figure style={{textAlign: "center"}}>`
			`<img src="https://s2.loli.net/2022/01/28/ZK7ICWzbMsVuJof.png"/>`
			`<figcaption>Workflow</figcaption>`
			`</figure>`

[doc] Fix typo under colossalai and doc(#3618) * Fixed several spelling errors under colossalai * Fix the spelling error in colossalai and docs directory * Cautious Changed the spelling error under the example folder * Update runtime_preparation_pass.py revert autograft to autograd * Update search_chunk.py utile to until * Update check_installation.py change misteach to mismatch in line 91 * Update 1D_tensor_parallel.md revert to perceptron * Update 2D_tensor_parallel.md revert to perceptron in line 73 * Update 2p5D_tensor_parallel.md revert to perceptron in line 71 * Update 3D_tensor_parallel.md revert to perceptron in line 80 * Update README.md revert to resnet in line 42 * Update reorder_graph.py revert to indice in line 7 * Update p2p.py revert to megatron in line 94 * Update initialize.py revert to torchrun in line 198 * Update routers.py change to detailed in line 63 * Update routers.py change to detailed in line 146 * Update README.md revert random number in line 402 2023-04-26 03:38:43 +00:00			`1. Prepare a configuration file where specifies the features you want to use and your parameters.`
[doc] migrate the markdown files (#2652) 2023-02-09 06:21:38 +00:00			2. Initialize distributed backend with `colossalai.launch`
[doc]update moe chinese document. (#3890) * [doc]update-moe * [doc]update-moe * [doc]update-moe * [doc]update-moe * [doc]update-moe 2023-06-05 07:57:54 +00:00			3. Inject the training features into your training components (e.g. model, optimizer) with `colossalai.booster`.
[doc] migrate the markdown files (#2652) 2023-02-09 06:21:38 +00:00			`4. Run training and testing`

			We will cover the whole workflow in the `basic tutorials` section.

			`## Future Development`

			`The Colossal-AI system will be expanded to include more training skills, these new developments may include but are not limited to:`

			`1. optimization of distributed operations`
			`2. optimization of training on heterogenous system`
			`3. implementation of training utilities to reduce model size and speed up training while preserving model performance`
			`4. expansion of existing parallelism methods`

			`We welcome ideas and contribution from the community and you can post your idea for future development in our forum.`
[doc]update moe chinese document. (#3890) * [doc]update-moe * [doc]update-moe * [doc]update-moe * [doc]update-moe * [doc]update-moe 2023-06-05 07:57:54 +00:00
			`<!-- doc-test-command: echo "colossalai_overview.md does not need test" -->`