add Chinese README

2022-02-18 16:28:37 +08:00 · 2022-02-18 16:28:37 +08:00 · 753035edd3
parent 82023779bb
commit 753035edd3
5 changed files with 253 additions and 18 deletions
--- a/README-zh-Hans.md
+++ b/README-zh-Hans.md
@ -0,0 +1,207 @@
+# Colossal-AI
+
+[![logo](./docs/images/Colossal-AI_logo.png)](https://www.colossalai.org/)
+
+<div align="center">
+   <h3> <a href="https://arxiv.org/abs/2110.14883"> 论文 </a> | 
+   <a href="https://www.colossalai.org/"> 文档 </a> | 
+   <a href="https://github.com/hpcaitech/ColossalAI-Examples"> 样例 </a> |   
+   <a href="https://github.com/hpcaitech/ColossalAI/discussions"> 论坛 </a> | 
+   <a href="https://medium.com/@hpcaitech"> 博客 </a></h3> 
+   <br/>
+
+   [![Build](https://github.com/hpcaitech/ColossalAI/actions/workflows/PR_CI.yml/badge.svg)](https://github.com/hpcaitech/ColossalAI/actions/workflows/PR_CI.yml)
+   [![Documentation](https://readthedocs.org/projects/colossalai/badge/?version=latest)](https://colossalai.readthedocs.io/en/latest/?badge=latest)
+   [![codebeat badge](https://codebeat.co/badges/bfe8f98b-5d61-4256-8ad2-ccd34d9cc156)](https://codebeat.co/projects/github-com-hpcaitech-colossalai-main)
+
+   | [English](README.md) | [中文](README-zh-Hans.md) |
+</div>
+一个整合高效并行技术的AI大模型训练系统。
+
+## 特点
+
+Colossal-AI为您提供了一系列并行训练组件。我们的目标是让您的分布式AI模型训练像普通的单GPU模型一样简单。我们提供的友好工具可以让您在几行代码内快速开始分布式训练。
+
+- 数据并行
+- 流水线并行
+- 1维, 2维, 2.5维, 3维张量并行 
+- 序列并行
+- 友好的trainer和engine
+- 可扩展新的并行方式
+- 混合精度
+- 零冗余优化器 (ZeRO)
+
+## 样例
+### ViT
+
+<img src="./docs/images/ViT_TP.png" width="400" />
+
+- 14倍批大小
+- 5倍训练速度
+
+### GPT-3 & GPT-2
+
+![GPT_2_3](./docs/images/GPT_2_3.png)
+
+- GPT-3：释放 50% GPU 资源占用, 或 10.7% 加速
+- GPT-2：降低11倍GPU显存占用，或超线性扩展 
+
+### BERT
+
+![BERT_seq](./docs/images/BERT_seq.png)
+
+- 2倍训练速度
+- 1.5倍序列长度
+
+请访问我们的[文档和教程](https://www.colossalai.org/)以了解详情。
+
+
+## 安装
+
+### PyPI
+
+```bash
+pip install colossalai
+```
+该命令将会安装CUDA extension，如果你已安装CUDA, NVCC和torch。 
+
+如果你不想安装CUDA extension, 可在命令中添加`--global-option="--no_cuda_ext"`, 例如:
+```bash
+pip install colossalai --global-option="--no_cuda_ext"
+```
+
+如果你想使用`ZeRO`, 你可以使用:
+```bash
+pip install colossalai[zero]
+```
+
+### 从源代码安装
+
+> Colossal-AI的版本将与该项目的主分支保持一致。欢迎通过issue反馈你遇到的任何问题 :)
+
+```shell
+git clone https://github.com/hpcaitech/ColossalAI.git
+cd ColossalAI
+# 安装依赖
+pip install -r requirements/requirements.txt
+
+# 安装 colossalai
+pip install .
+```
+
+如果你不想安装和使用CUDA kernel fusion (使用fused优化器需安装):
+
+```shell
+pip install --global-option="--no_cuda_ext" .
+```
+
+## 使用 Docker
+
+运行以下命令从我们提供的docker文件中建立docker镜像。
+
+```bash
+cd ColossalAI
+docker build -t colossalai ./docker
+```
+
+运行以下命令从以交互式启动docker镜像.
+
+```bash
+docker run -ti --gpus all --rm --ipc=host colossalai bash
+```
+
+## 做出贡献
+
+欢迎为该项目做出贡献，请参阅[贡献指南](./CONTRIBUTING.md)。
+
+
+## 快速预览
+
+### Start Distributed Training in Lines
+
+```python
+import colossalai
+from colossalai.utils import get_dataloader
+
+
+# my_config可以是config文件的路径或字典对象
+# 'localhost' 仅适用于单节点，在多节点时需指明节点名
+colossalai.launch(
+    config=my_config,
+    rank=rank,
+    world_size=world_size,
+    backend='nccl',
+    port=29500,
+    host='localhost'
+)
+
+# 构建模型
+model = ...
+
+# 构建数据集, dataloader会默认处理分布式数据sampler
+train_dataset = ...
+train_dataloader = get_dataloader(dataset=dataset,
+                                shuffle=True
+                                )
+
+
+# 构建优化器
+optimizer = ...
+
+# 构建损失函数
+criterion = ...
+
+# 初始化colossalai
+engine, train_dataloader, _, _ = colossalai.initialize(
+    model=model,
+    optimizer=optimizer,
+    criterion=criterion,
+    train_dataloader=train_dataloader
+)
+
+# 开始训练
+engine.train()
+for epoch in range(NUM_EPOCHS):
+    for data, label in train_dataloader:
+        engine.zero_grad()
+        output = engine(data)
+        loss = engine.criterion(output, label)
+        engine.backward(loss)
+        engine.step()
+
+```
+
+### 构建一个简单的2维并行模型
+
+假设我们有一个非常巨大的MLP模型，它巨大的hidden size使得它难以被单个GPU容纳。我们可以将该模型的权重以二维网格的形式分配到多个GPU上，且保持你熟悉的模型构建方式。
+
+```python
+from colossalai.nn import Linear2D
+import torch.nn as nn
+
+
+class MLP_2D(nn.Module):
+
+    def __init__(self):
+        super().__init__()
+        self.linear_1 = Linear2D(in_features=1024, out_features=16384)
+        self.linear_2 = Linear2D(in_features=16384, out_features=1024)
+
+    def forward(self, x):
+        x = self.linear_1(x)
+        x = self.linear_2(x)
+        return x
+
+```
+
+
+## 引用
+
+```
+@article{bian2021colossal,
+  title={Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training},
+  author={Bian, Zhengda and Liu, Hongxin and Wang, Boxiang and Huang, Haichen and Li, Yongbin and Wang, Chuanrui and Cui, Fan and You, Yang},
+  journal={arXiv preprint arXiv:2110.14883},
+  year={2021}
+}
+```
--- a/README.md
+++ b/README.md
@ -13,9 +13,52 @@
   [![Build](https://github.com/hpcaitech/ColossalAI/actions/workflows/PR_CI.yml/badge.svg)](https://github.com/hpcaitech/ColossalAI/actions/workflows/PR_CI.yml)
   [![Documentation](https://readthedocs.org/projects/colossalai/badge/?version=latest)](https://colossalai.readthedocs.io/en/latest/?badge=latest)
   [![codebeat badge](https://codebeat.co/badges/bfe8f98b-5d61-4256-8ad2-ccd34d9cc156)](https://codebeat.co/projects/github-com-hpcaitech-colossalai-main)
+
+   | [English](README.md) | [中文](README-zh-Hans.md) |
 </div>
 An integrated large-scale model training system with efficient parallelization techniques.

+
+## Features
+
+Colossal-AI provides a collection of parallel training components for you. We aim to support you to write your
+distributed deep learning models just like how you write your single-GPU model. We provide friendly tools to kickstart
+distributed training in a few lines.
+
+- Data Parallelism
+- Pipeline Parallelism
+- 1D, 2D, 2.5D, 3D tensor parallelism
+- Sequence parallelism
+- Friendly trainer and engine
+- Extensible for new parallelism
+- Mixed Precision Training
+- Zero Redundancy Optimizer (ZeRO)
+
+## Examples
+### ViT
+
+<img src="./docs/images/ViT_TP.png" width="400" />
+
+- 14x larger batch size
+- 5x faster training
+
+### GPT-3 & GPT-2
+
+![GPT_2_3](./docs/images/GPT_2_3.png)
+
+- Free 50% GPU resources, or 10.7% acceleration for GPT-3
+- 11x lower GPU RAM, or superlinear scaling for GPT-2
+
+### BERT
+
+![BERT_seq](./docs/images/BERT_seq.png)
+
+- 2x faster training
+- 50% longer sequence length
+
+Please visit our [documentation and tutorials](https://www.colossalai.org/) for more details.
+
+
 ## Installation

 ### PyPI
@ -37,7 +80,7 @@ pip install colossalai[zero]

 ### Install From Source

-> The documentation will be in line with the main branch of the repository. Feel free to raise an issue if you encounter any problem. :)
+> The version of Colossal-AI will be in line with the main branch of the repository. Feel free to raise an issue if you encounter any problem. :)

 ```shell
 git clone https://github.com/hpcaitech/ColossalAI.git
@ -107,13 +150,13 @@ train_dataloader = get_dataloader(dataset=dataset,
                                )


-# build your
+# build your optimizer
 optimizer = ...

 # build your loss function
 criterion = ...

-# build your lr_scheduler
+# initialize colossalai
 engine, train_dataloader, _, _ = colossalai.initialize(
    model=model,
    optimizer=optimizer,
@ -157,21 +200,6 @@ class MLP_2D(nn.Module):

 ```

-## Features
-
-Colossal-AI provides a collection of parallel training components for you. We aim to support you to write your
-distributed deep learning models just like how you write your single-GPU model. We provide friendly tools to kickstart
-distributed training in a few lines.
-
- Data Parallelism
- Pipeline Parallelism
- 1D, 2D, 2.5D, 3D and sequence parallelism
- Friendly trainer and engine
- Extensible for new parallelism
- Mixed Precision Training
- Zero Redundancy Optimizer (ZeRO)
-
-Please visit our [documentation and tutorials](https://www.colossalai.org/) for more details.

 ## Cite Us

--- a/docs/images/BERT_seq.png
+++ b/docs/images/BERT_seq.png
--- a/docs/images/GPT_2_3.png
+++ b/docs/images/GPT_2_3.png
--- a/docs/images/ViT_TP.png
+++ b/docs/images/ViT_TP.png