ColossalAI/README.md

# Colossal-AI
<div id="top" align="center">

   [![logo](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/Colossal-AI_logo.png)](https://www.colossalai.org/)

   An integrated large-scale model training system with efficient parallelization techniques.

   <h3> <a href="https://arxiv.org/abs/2110.14883"> Paper </a> | 
   <a href="https://www.colossalai.org/"> Documentation </a> | 
   <a href="https://github.com/hpcaitech/ColossalAI-Examples"> Examples </a> |   
   <a href="https://github.com/hpcaitech/ColossalAI/discussions"> Forum </a> | 
   <a href="https://medium.com/@hpcaitech"> Blog </a></h3>

   [![Build](https://github.com/hpcaitech/ColossalAI/actions/workflows/build.yml/badge.svg)](https://github.com/hpcaitech/ColossalAI/actions/workflows/build.yml)
   [![Documentation](https://readthedocs.org/projects/colossalai/badge/?version=latest)](https://colossalai.readthedocs.io/en/latest/?badge=latest)
   [![CodeFactor](https://www.codefactor.io/repository/github/hpcaitech/colossalai/badge)](https://www.codefactor.io/repository/github/hpcaitech/colossalai)
   [![HuggingFace badge](https://img.shields.io/badge/%F0%9F%A4%97HuggingFace-Join-yellow)](https://huggingface.co/hpcai-tech)
   [![slack badge](https://img.shields.io/badge/Slack-join-blueviolet?logo=slack&amp)](https://join.slack.com/t/colossalaiworkspace/shared_invite/zt-z7b26eeb-CBp7jouvu~r0~lcFzX832w)
   [![WeChat badge](https://img.shields.io/badge/微信-加入-green?logo=wechat&amp)](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/WeChat.png)
   

   | [English](README.md) | [中文](README-zh-Hans.md) |

</div>

## Table of Contents
<ul>
 <li><a href="#Features">Features</a> </li>
 <li>
   <a href="#Demo">Demo</a> 
   <ul>
     <li><a href="#ViT">ViT</a></li>
     <li><a href="#GPT-3">GPT-3</a></li>
     <li><a href="#GPT-2">GPT-2</a></li>
     <li><a href="#BERT">BERT</a></li>
   </ul>
 </li>

 <li>
   <a href="#Installation">Installation</a>
   <ul>
     <li><a href="#PyPI">PyPI</a></li>
     <li><a href="#Install-From-Source">Install From Source</a></li>
   </ul>
 </li>
 <li><a href="#Use-Docker">Use Docker</a></li>
 <li><a href="#Community">Community</a></li>
 <li><a href="#contributing">Contributing</a></li>
 <li><a href="#Quick-View">Quick View</a></li>
   <ul>
     <li><a href="#Start-Distributed-Training-in-Lines">Start Distributed Training in Lines</a></li>
     <li><a href="#Write-a-Simple-2D-Parallel-Model">Write a Simple 2D Parallel Model</a></li>
   </ul>
 <li><a href="#Cite-Us">Cite Us</a></li>
</ul>

## Features

Colossal-AI provides a collection of parallel training components for you. We aim to support you to write your
distributed deep learning models just like how you write your model on your laptop. We provide user-friendly tools to kickstart
distributed training in a few lines.

- Data Parallelism
- Pipeline Parallelism
- 1D, 2D, 2.5D, 3D tensor parallelism
- Sequence parallelism
- Friendly trainer and engine
- Extensible for new parallelism
- Mixed Precision Training
- Zero Redundancy Optimizer (ZeRO)

<p align="right">(<a href="#top">back to top</a>)</p>

## Demo
### ViT
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/ViT.png" width="450" />

- 14x larger batch size, and 5x faster training for Tensor Parallelism = 64

### GPT-3
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT3.png" width=700/>

- Save 50% GPU resources, and 10.7% acceleration

### GPT-2
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT2.png" width=800/>

- 11x lower GPU memory consumption, and superlinear scaling efficiency with Tensor Parallelism

<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/(updated)GPT-2.png" width=800>

- 24x larger model size on the same hardware
- over 3x acceleration
### BERT
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/BERT.png" width=800/>

- 2x faster training, or 50% longer sequence length

Please visit our [documentation and tutorials](https://www.colossalai.org/) for more details.

<p align="right">(<a href="#top">back to top</a>)</p>

## Installation

### PyPI

```bash
pip install colossalai
```
This command will install CUDA extension if your have installed CUDA, NVCC and torch. 

If you don't want to install CUDA extension, you should add `--global-option="--no_cuda_ext"`, like:
```bash
pip install colossalai --global-option="--no_cuda_ext"
```

If you want to use `ZeRO`, you can run:
```bash
pip install colossalai[zero]
```

### Install From Source

> The version of Colossal-AI will be in line with the main branch of the repository. Feel free to create an issue if you encounter any problems. :-)

```shell
git clone https://github.com/hpcaitech/ColossalAI.git
cd ColossalAI
# install dependency
pip install -r requirements/requirements.txt

# install colossalai
pip install .
```

If you don't want to install and enable CUDA kernel fusion (compulsory installation when using fused optimizer):

```shell
pip install --global-option="--no_cuda_ext" .
```

<p align="right">(<a href="#top">back to top</a>)</p>

## Use Docker

Run the following command to build a docker image from Dockerfile provided.

```bash
cd ColossalAI
docker build -t colossalai ./docker
```

Run the following command to start the docker container in interactive mode.

```bash
docker run -ti --gpus all --rm --ipc=host colossalai bash
```

<p align="right">(<a href="#top">back to top</a>)</p>

## Community

Join the Colossal-AI community on [Forum](https://github.com/hpcaitech/ColossalAI/discussions),
[Slack](https://join.slack.com/t/colossalaiworkspace/shared_invite/zt-z7b26eeb-CBp7jouvu~r0~lcFzX832w),
and [WeChat](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/WeChat.png "qrcode") to share your suggestions, feedback, and questions with our engineering team.

## Contributing

If you wish to contribute to this project, please follow the guideline in [Contributing](./CONTRIBUTING.md).

Thanks so much to all of our amazing contributors!

<a href="https://github.com/hpcaitech/ColossalAI/graphs/contributors"><img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/contributor_avatar.png" width="800px"></a>

*The order of contributor avatars is randomly shuffled.*

<p align="right">(<a href="#top">back to top</a>)</p>

## Quick View

### Start Distributed Training in Lines

```python
import colossalai
from colossalai.utils import get_dataloader


# my_config can be path to config file or a dictionary obj
# 'localhost' is only for single node, you need to specify
# the node name if using multiple nodes
colossalai.launch(
    config=my_config,
    rank=rank,
    world_size=world_size,
    backend='nccl',
    port=29500,
    host='localhost'
)

# build your model
model = ...

# build you dataset, the dataloader will have distributed data
# sampler by default
train_dataset = ...
train_dataloader = get_dataloader(dataset=dataset,
                                shuffle=True
                                )


# build your optimizer
optimizer = ...

# build your loss function
criterion = ...

# initialize colossalai
engine, train_dataloader, _, _ = colossalai.initialize(
    model=model,
    optimizer=optimizer,
    criterion=criterion,
    train_dataloader=train_dataloader
)

# start training
engine.train()
for epoch in range(NUM_EPOCHS):
    for data, label in train_dataloader:
        engine.zero_grad()
        output = engine(data)
        loss = engine.criterion(output, label)
        engine.backward(loss)
        engine.step()

```

### Write a Simple 2D Parallel Model

Let's say we have a huge MLP model and its very large hidden size makes it difficult to fit into a single GPU. We can
then distribute the model weights across GPUs in a 2D mesh while you still write your model in a familiar way.

```python
from colossalai.nn import Linear2D
import torch.nn as nn


class MLP_2D(nn.Module):

    def __init__(self):
        super().__init__()
        self.linear_1 = Linear2D(in_features=1024, out_features=16384)
        self.linear_2 = Linear2D(in_features=16384, out_features=1024)

    def forward(self, x):
        x = self.linear_1(x)
        x = self.linear_2(x)
        return x

```

<p align="right">(<a href="#top">back to top</a>)</p>

## Cite Us

```
@article{bian2021colossal,
  title={Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training},
  author={Bian, Zhengda and Liu, Hongxin and Wang, Boxiang and Huang, Haichen and Li, Yongbin and Wang, Chuanrui and Cui, Fan and You, Yang},
  journal={arXiv preprint arXiv:2110.14883},
  year={2021}
}
```

<p align="right">(<a href="#top">back to top</a>)</p>
fixed some typos in the documents, added blog link and paper author information in README 2021-11-03 08:07:28 +00:00			`# Colossal-AI`
update README and images path (#384) 2022-03-11 05:53:38 +00:00			`<div id="top" align="center">`
removed tutorial markdown and refreshed rst files for consistency 2022-01-19 08:06:53 +00:00
update README and images path (#384) 2022-03-11 05:53:38 +00:00			`[![logo](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/Colossal-AI_logo.png)](https://www.colossalai.org/)`

			`An integrated large-scale model training system with efficient parallelization techniques.`
removed tutorial markdown and refreshed rst files for consistency 2022-01-19 08:06:53 +00:00
fixed utils docstring and add example to readme (#200) 2022-02-03 03:37:17 +00:00			`<h3> <a href="https://arxiv.org/abs/2110.14883"> Paper </a> \|`
			`<a href="https://www.colossalai.org/"> Documentation </a> \|`
			`<a href="https://github.com/hpcaitech/ColossalAI-Examples"> Examples </a> \|`
			`<a href="https://github.com/hpcaitech/ColossalAI/discussions"> Forum </a> \|`
update README and images path (#384) 2022-03-11 05:53:38 +00:00			`<a href="https://medium.com/@hpcaitech"> Blog </a></h3>`
updated readme and change log (#224) 2022-02-14 09:22:48 +00:00
fixed broken badge link 2022-03-13 01:11:48 +00:00			`[![Build](https://github.com/hpcaitech/ColossalAI/actions/workflows/build.yml/badge.svg)](https://github.com/hpcaitech/ColossalAI/actions/workflows/build.yml)`
Update workflow files and README.md (#166) 2022-01-19 12:15:14 +00:00			`[![Documentation](https://readthedocs.org/projects/colossalai/badge/?version=latest)](https://colossalai.readthedocs.io/en/latest/?badge=latest)`
[misc] replace codebeat with codefactor on readme (#436) 2022-03-16 09:43:52 +00:00			`[![CodeFactor](https://www.codefactor.io/repository/github/hpcaitech/colossalai/badge)](https://www.codefactor.io/repository/github/hpcaitech/colossalai)`
update hf badge link (#410) 2022-03-14 09:07:01 +00:00			`[![HuggingFace badge](https://img.shields.io/badge/%F0%9F%A4%97HuggingFace-Join-yellow)](https://huggingface.co/hpcai-tech)`
add badge and contributor list 2022-03-04 10:04:51 +00:00			`[![slack badge](https://img.shields.io/badge/Slack-join-blueviolet?logo=slack&amp)](https://join.slack.com/t/colossalaiworkspace/shared_invite/zt-z7b26eeb-CBp7jouvu~r0~lcFzX832w)`
update README and images path (#384) 2022-03-11 05:53:38 +00:00			`[![WeChat badge](https://img.shields.io/badge/微信-加入-green?logo=wechat&amp)](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/WeChat.png)`
update hf badge link (#410) 2022-03-14 09:07:01 +00:00
add Chinese README 2022-02-18 08:28:37 +00:00
			`\| [English](README.md) \| [中文](README-zh-Hans.md) \|`
update README and images path (#384) 2022-03-11 05:53:38 +00:00
add logo at homepage, add forum in issue template (#161) 2022-01-19 06:29:31 +00:00			`</div>`
update documentation 2021-10-29 01:29:20 +00:00
update README and images path (#384) 2022-03-11 05:53:38 +00:00			`## Table of Contents`
			`<ul>`
			`<li><a href="#Features">Features</a> </li>`
			`<li>`
			`<a href="#Demo">Demo</a>`
			`<ul>`
			`<li><a href="#ViT">ViT</a></li>`
			`<li><a href="#GPT-3">GPT-3</a></li>`
			`<li><a href="#GPT-2">GPT-2</a></li>`
			`<li><a href="#BERT">BERT</a></li>`
			`</ul>`
			`</li>`

			`<li>`
			`<a href="#Installation">Installation</a>`
			`<ul>`
			`<li><a href="#PyPI">PyPI</a></li>`
			`<li><a href="#Install-From-Source">Install From Source</a></li>`
			`</ul>`
			`</li>`
			`<li><a href="#Use-Docker">Use Docker</a></li>`
			`<li><a href="#Community">Community</a></li>`
			`<li><a href="#contributing">Contributing</a></li>`
			`<li><a href="#Quick-View">Quick View</a></li>`
			`<ul>`
			`<li><a href="#Start-Distributed-Training-in-Lines">Start Distributed Training in Lines</a></li>`
			`<li><a href="#Write-a-Simple-2D-Parallel-Model">Write a Simple 2D Parallel Model</a></li>`
			`</ul>`
			`<li><a href="#Cite-Us">Cite Us</a></li>`
			`</ul>`
add Chinese README 2022-02-18 08:28:37 +00:00
			`## Features`

			`Colossal-AI provides a collection of parallel training components for you. We aim to support you to write your`
Update README.md (#514) 2022-03-25 04:12:05 +00:00			`distributed deep learning models just like how you write your model on your laptop. We provide user-friendly tools to kickstart`
add Chinese README 2022-02-18 08:28:37 +00:00			`distributed training in a few lines.`

			`- Data Parallelism`
			`- Pipeline Parallelism`
			`- 1D, 2D, 2.5D, 3D tensor parallelism`
			`- Sequence parallelism`
			`- Friendly trainer and engine`
			`- Extensible for new parallelism`
			`- Mixed Precision Training`
			`- Zero Redundancy Optimizer (ZeRO)`

update README and images path (#384) 2022-03-11 05:53:38 +00:00			`<p align="right">(<a href="#top">back to top</a>)</p>`

			`## Demo`
add Chinese README 2022-02-18 08:28:37 +00:00			`### ViT`
Fix/format (#366) 2022-03-10 05:32:56 +00:00			`<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/ViT.png" width="450" />`
add Chinese README 2022-02-18 08:28:37 +00:00
Update README.md (#514) 2022-03-25 04:12:05 +00:00			`- 14x larger batch size, and 5x faster training for Tensor Parallelism = 64`
add Chinese README 2022-02-18 08:28:37 +00:00
update experimental visualization (#253) 2022-02-28 08:03:13 +00:00			`### GPT-3`
Fix/format (#366) 2022-03-10 05:32:56 +00:00			`<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT3.png" width=700/>`
add Chinese README 2022-02-18 08:28:37 +00:00
Update README.md (#514) 2022-03-25 04:12:05 +00:00			`- Save 50% GPU resources, and 10.7% acceleration`
update experimental visualization (#253) 2022-02-28 08:03:13 +00:00
			`### GPT-2`
Fix/format (#366) 2022-03-10 05:32:56 +00:00			`<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT2.png" width=800/>`
update experimental visualization (#253) 2022-02-28 08:03:13 +00:00
Update README.md (#514) 2022-03-25 04:12:05 +00:00			`- 11x lower GPU memory consumption, and superlinear scaling efficiency with Tensor Parallelism`
update experimental visualization (#253) 2022-02-28 08:03:13 +00:00
update GPT-2 experiment result (#666) 2022-04-04 05:47:43 +00:00			`<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/(updated)GPT-2.png" width=800>`
Update Experiment result about Colossal-AI with ZeRO (#479) * [readme] add experimental visualisation regarding ColossalAI with ZeRO (#476) * Hotfix/readme (#478) * add experimental visualisation regarding ColossalAI with ZeRO * adjust newly-added figure size 2022-03-21 08:34:07 +00:00
update GPT-2 experiment result (#666) 2022-04-04 05:47:43 +00:00			`- 24x larger model size on the same hardware`
			`- over 3x acceleration`
add Chinese README 2022-02-18 08:28:37 +00:00			`### BERT`
Fix/format (#366) 2022-03-10 05:32:56 +00:00			`<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/BERT.png" width=800/>`
add Chinese README 2022-02-18 08:28:37 +00:00
add community group and update issue template(#271) 2022-02-28 09:07:14 +00:00			`- 2x faster training, or 50% longer sequence length`
add Chinese README 2022-02-18 08:28:37 +00:00
			`Please visit our [documentation and tutorials](https://www.colossalai.org/) for more details.`

update README and images path (#384) 2022-03-11 05:53:38 +00:00			`<p align="right">(<a href="#top">back to top</a>)</p>`
add Chinese README 2022-02-18 08:28:37 +00:00
Migrated project 2021-10-28 16:21:23 +00:00			`## Installation`

update setup and workflow (#222) 2022-02-14 09:09:30 +00:00			`### PyPI`

			```bash
			`pip install colossalai`
			```
			`This command will install CUDA extension if your have installed CUDA, NVCC and torch.`
update examples and sphnix docs for the new api (#63) 2021-12-13 14:07:01 +00:00
update setup and workflow (#222) 2022-02-14 09:09:30 +00:00			If you don't want to install CUDA extension, you should add `--global-option="--no_cuda_ext"`, like:
			```bash
			`pip install colossalai --global-option="--no_cuda_ext"`
			```

			If you want to use `ZeRO`, you can run:
			```bash
			`pip install colossalai[zero]`
			```

			`### Install From Source`

Update README.md (#514) 2022-03-25 04:12:05 +00:00			`> The version of Colossal-AI will be in line with the main branch of the repository. Feel free to create an issue if you encounter any problems. :-)`
Migrated project 2021-10-28 16:21:23 +00:00
			```shell
update examples and sphnix docs for the new api (#63) 2021-12-13 14:07:01 +00:00			`git clone https://github.com/hpcaitech/ColossalAI.git`
Migrated project 2021-10-28 16:21:23 +00:00			`cd ColossalAI`
			`# install dependency`
			`pip install -r requirements/requirements.txt`

			`# install colossalai`
			`pip install .`
			```

update setup and workflow (#222) 2022-02-14 09:09:30 +00:00			`If you don't want to install and enable CUDA kernel fusion (compulsory installation when using fused optimizer):`
Migrated project 2021-10-28 16:21:23 +00:00
			```shell
update setup and workflow (#222) 2022-02-14 09:09:30 +00:00			`pip install --global-option="--no_cuda_ext" .`
Migrated project 2021-10-28 16:21:23 +00:00			```

update README and images path (#384) 2022-03-11 05:53:38 +00:00			`<p align="right">(<a href="#top">back to top</a>)</p>`
add badge and contributor list 2022-03-04 10:04:51 +00:00
added docker documentation (#152) 2022-01-18 05:35:18 +00:00			`## Use Docker`

			`Run the following command to build a docker image from Dockerfile provided.`

			```bash
			`cd ColossalAI`
			`docker build -t colossalai ./docker`
			```

			`Run the following command to start the docker container in interactive mode.`

			```bash
			`docker run -ti --gpus all --rm --ipc=host colossalai bash`
			```

update README and images path (#384) 2022-03-11 05:53:38 +00:00			`<p align="right">(<a href="#top">back to top</a>)</p>`
add badge and contributor list 2022-03-04 10:04:51 +00:00
			`## Community`

			`Join the Colossal-AI community on [Forum](https://github.com/hpcaitech/ColossalAI/discussions),`
			`[Slack](https://join.slack.com/t/colossalaiworkspace/shared_invite/zt-z7b26eeb-CBp7jouvu~r0~lcFzX832w),`
Update README.md (#514) 2022-03-25 04:12:05 +00:00			`and [WeChat](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/WeChat.png "qrcode") to share your suggestions, feedback, and questions with our engineering team.`
add badge and contributor list 2022-03-04 10:04:51 +00:00
updated readme and change log (#224) 2022-02-14 09:22:48 +00:00			`## Contributing`

add badge and contributor list 2022-03-04 10:04:51 +00:00			`If you wish to contribute to this project, please follow the guideline in [Contributing](./CONTRIBUTING.md).`

			`Thanks so much to all of our amazing contributors!`
updated readme and change log (#224) 2022-02-14 09:22:48 +00:00
add badge and contributor list 2022-03-04 10:04:51 +00:00			`<a href="https://github.com/hpcaitech/ColossalAI/graphs/contributors"><img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/contributor_avatar.png" width="800px"></a>`

			`The order of contributor avatars is randomly shuffled.`
updated readme and change log (#224) 2022-02-14 09:22:48 +00:00
update README and images path (#384) 2022-03-11 05:53:38 +00:00			`<p align="right">(<a href="#top">back to top</a>)</p>`

Migrated project 2021-10-28 16:21:23 +00:00			`## Quick View`

			`### Start Distributed Training in Lines`

			```python
			`import colossalai`
update markdown docs (english) (#60) 2021-12-10 06:37:33 +00:00			`from colossalai.utils import get_dataloader`


			`# my_config can be path to config file or a dictionary obj`
			`# 'localhost' is only for single node, you need to specify`
			`# the node name if using multiple nodes`
			`colossalai.launch(`
			`config=my_config,`
			`rank=rank,`
			`world_size=world_size,`
			`backend='nccl',`
			`port=29500,`
			`host='localhost'`
Migrated project 2021-10-28 16:21:23 +00:00			`)`
update markdown docs (english) (#60) 2021-12-10 06:37:33 +00:00
			`# build your model`
removed tutorial markdown and refreshed rst files for consistency 2022-01-19 08:06:53 +00:00			`model = ...`
update markdown docs (english) (#60) 2021-12-10 06:37:33 +00:00
removed tutorial markdown and refreshed rst files for consistency 2022-01-19 08:06:53 +00:00			`# build you dataset, the dataloader will have distributed data`
update markdown docs (english) (#60) 2021-12-10 06:37:33 +00:00			`# sampler by default`
removed tutorial markdown and refreshed rst files for consistency 2022-01-19 08:06:53 +00:00			`train_dataset = ...`
update markdown docs (english) (#60) 2021-12-10 06:37:33 +00:00			`train_dataloader = get_dataloader(dataset=dataset,`
add logo at homepage, add forum in issue template (#161) 2022-01-19 06:29:31 +00:00			`shuffle=True`
update examples and sphnix docs for the new api (#63) 2021-12-13 14:07:01 +00:00			`)`
update markdown docs (english) (#60) 2021-12-10 06:37:33 +00:00

add Chinese README 2022-02-18 08:28:37 +00:00			`# build your optimizer`
removed tutorial markdown and refreshed rst files for consistency 2022-01-19 08:06:53 +00:00			`optimizer = ...`
update markdown docs (english) (#60) 2021-12-10 06:37:33 +00:00
			`# build your loss function`
			`criterion = ...`

add Chinese README 2022-02-18 08:28:37 +00:00			`# initialize colossalai`
update markdown docs (english) (#60) 2021-12-10 06:37:33 +00:00			`engine, train_dataloader, _, _ = colossalai.initialize(`
			`model=model,`
			`optimizer=optimizer,`
			`criterion=criterion,`
			`train_dataloader=train_dataloader`
			`)`

			`# start training`
			`engine.train()`
			`for epoch in range(NUM_EPOCHS):`
			`for data, label in train_dataloader:`
			`engine.zero_grad()`
			`output = engine(data)`
			`loss = engine.criterion(output, label)`
			`engine.backward(loss)`
			`engine.step()`

Migrated project 2021-10-28 16:21:23 +00:00			```

			`### Write a Simple 2D Parallel Model`

			`Let's say we have a huge MLP model and its very large hidden size makes it difficult to fit into a single GPU. We can`
			`then distribute the model weights across GPUs in a 2D mesh while you still write your model in a familiar way.`

			```python
			`from colossalai.nn import Linear2D`
			`import torch.nn as nn`


			`class MLP_2D(nn.Module):`

			`def __init__(self):`
			`super().__init__()`
			`self.linear_1 = Linear2D(in_features=1024, out_features=16384)`
			`self.linear_2 = Linear2D(in_features=16384, out_features=1024)`

			`def forward(self, x):`
			`x = self.linear_1(x)`
			`x = self.linear_2(x)`
			`return x`

			```

update README and images path (#384) 2022-03-11 05:53:38 +00:00			`<p align="right">(<a href="#top">back to top</a>)</p>`
Migrated project 2021-10-28 16:21:23 +00:00
fixed some typos in the documents, added blog link and paper author information in README 2021-11-03 08:07:28 +00:00			`## Cite Us`
Migrated project 2021-10-28 16:21:23 +00:00
fixed some typos in the documents, added blog link and paper author information in README 2021-11-03 08:07:28 +00:00			```
			`@article{bian2021colossal,`
			`title={Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training},`
			`author={Bian, Zhengda and Liu, Hongxin and Wang, Boxiang and Huang, Haichen and Li, Yongbin and Wang, Chuanrui and Cui, Fan and You, Yang},`
			`journal={arXiv preprint arXiv:2110.14883},`
			`year={2021}`
			`}`
			```
update README and images path (#384) 2022-03-11 05:53:38 +00:00
[profiler] add MemProfiler (#356) * add memory trainer hook * fix bug * add memory trainer hook * fix import bug * fix import bug * add trainer hook * fix #370 git log bug * modify `to_tensorboard` function to support better output * remove useless output * change the name of `MemProfiler` * complete memory profiler * replace error with warning * finish trainer hook * modify interface of MemProfiler * modify `__init__.py` in profiler * remove unnecessary pass statement * add usage to doc string * add usage to trainer hook * new location to store temp data file 2022-03-29 04:48:34 +00:00			`<p align="right">(<a href="#top">back to top</a>)</p>`