From ad2cf58f503b47bc25e1005b58ee7ca8b25ddf8d Mon Sep 17 00:00:00 2001 From: binmakeswell Date: Fri, 19 May 2023 18:03:56 +0800 Subject: [PATCH] [chat] add performance and tutorial (#3786) --- README.md | 16 +++++++++++++--- applications/Chat/README.md | 22 ++++++++++++++++++---- applications/Chat/examples/README.md | 3 +++ docs/README-zh-Hans.md | 16 +++++++++++++--- examples/tutorial/README.md | 6 +++++- 5 files changed, 52 insertions(+), 11 deletions(-) diff --git a/README.md b/README.md index 79f733122..2e6dcaa1e 100644 --- a/README.md +++ b/README.md @@ -127,12 +127,22 @@ distributed training and inference in a few lines. ### ColossalChat
- - + +
-[ColossalChat](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat): An open-source solution for cloning [ChatGPT](https://openai.com/blog/chatgpt/) with a complete RLHF pipeline. [[code]](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat) [[blog]](https://medium.com/@yangyou_berkeley/colossalchat-an-open-source-solution-for-cloning-chatgpt-with-a-complete-rlhf-pipeline-5edf08fb538b) [[demo]](https://chat.colossalai.org) +[ColossalChat](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat): An open-source solution for cloning [ChatGPT](https://openai.com/blog/chatgpt/) with a complete RLHF pipeline. +[[code]](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat) +[[blog]](https://medium.com/@yangyou_berkeley/colossalchat-an-open-source-solution-for-cloning-chatgpt-with-a-complete-rlhf-pipeline-5edf08fb538b) +[[demo]](https://www.youtube.com/watch?v=HcTiHzApHm0) +[[tutorial]](https://www.youtube.com/watch?v=-qFBZFmOJfg) + +

+ +

+ +- Up to 10 times faster for RLHF PPO Stage3 Training

diff --git a/applications/Chat/README.md b/applications/Chat/README.md index 9ba831973..bc8481d96 100644 --- a/applications/Chat/README.md +++ b/applications/Chat/README.md @@ -67,13 +67,24 @@ More details can be found in the latest news. * [2023/02] [Open Source Solution Replicates ChatGPT Training Process! Ready to go with only 1.6GB GPU Memory](https://www.hpc-ai.tech/blog/colossal-ai-chatgpt) ## Online demo -You can experience the performance of Coati7B on this page. +

+ + + +
-[chat.colossalai.org](https://chat.colossalai.org/) +[ColossalChat](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat): An open-source solution for cloning [ChatGPT](https://openai.com/blog/chatgpt/) with a complete RLHF pipeline. +[[code]](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat) +[[blog]](https://medium.com/@yangyou_berkeley/colossalchat-an-open-source-solution-for-cloning-chatgpt-with-a-complete-rlhf-pipeline-5edf08fb538b) +[[demo]](https://www.youtube.com/watch?v=HcTiHzApHm0) +[[tutorial]](https://www.youtube.com/watch?v=-qFBZFmOJfg) -Due to resource constraints, we will only provide this service from 29th Mar 2023 to 5 April 2023. However, we have provided the inference code in the [inference](./inference/) folder. The WebUI will be open-sourced soon as well. +

+ +

+ +> DeepSpeedChat performance comes from its blog on 2023 April 12, ColossalChat performance can be reproduced on an AWS p4d.24xlarge node with 8 A100-40G GPUs with the following command: torchrun --standalone --nproc_per_node 8 benchmark_opt_lora_dummy.py --max_timesteps 1 --update_timesteps 1 --use_kernels --strategy colossalai_zero2 --experience_batch_size 64 --train_batch_size 32 -> Warning: Due to model and dataset size limitations, Coati is just a baby model, Coati7B may output incorrect information and lack the ability for multi-turn dialogue. There is still significant room for improvement. ## Install ### Install the environment @@ -112,12 +123,14 @@ Here is how we collected the data Stage1 is supervised instructs fine-tuning, which uses the datasets mentioned earlier to fine-tune the model. You can run the `examples/train_sft.sh` to start a supervised instructs fine-tuning. +[[Stage1 tutorial video]](https://www.youtube.com/watch?v=-qFBZFmOJfg) ### RLHF Training Stage2 - Training reward model Stage2 trains a reward model, which obtains corresponding scores by manually ranking different outputs for the same prompt and supervises the training of the reward model You can run the `examples/train_rm.sh` to start a reward model training. +[[Stage2 tutorial video]](https://www.youtube.com/watch?v=gMx2CApKhuo) ### RLHF Training Stage3 - Training model with reinforcement learning by human feedback @@ -128,6 +141,7 @@ Stage3 uses reinforcement learning algorithm, which is the most complex part of

You can run the `examples/train_prompts.sh` to start training PPO with human feedback. +[[Stage3 tutorial video]](https://www.youtube.com/watch?v=Z8wwSHxPL9g) For more details, see [`examples/`](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat/examples). diff --git a/applications/Chat/examples/README.md b/applications/Chat/examples/README.md index 60f876eda..72810738d 100644 --- a/applications/Chat/examples/README.md +++ b/applications/Chat/examples/README.md @@ -48,6 +48,7 @@ The following pic shows how we collected the data. ## Stage1 - Supervised instructs tuning Stage1 is supervised instructs fine-tuning, which uses the datasets mentioned earlier to fine-tune the model. +[[Stage1 tutorial video]](https://www.youtube.com/watch?v=-qFBZFmOJfg) You can run the `examples/train_sft.sh` to start a supervised instructs fine-tuning. @@ -83,6 +84,7 @@ torchrun --standalone --nproc_per_node=4 train_sft.py \ ## Stage2 - Training reward model We train a reward model in stage 2, which obtains corresponding scores by manually ranking different outputs for the same prompt and supervises the training of the reward model. +[[Stage2 tutorial video]](https://www.youtube.com/watch?v=gMx2CApKhuo) You can run the `examples/train_rm.sh` to start a reward model training. @@ -141,6 +143,7 @@ Stage3 uses reinforcement learning algorithm, which is the most complex part of You can run the `examples/train_prompts.sh` to start PPO training. You can also use the cmd following to start PPO training. +[[Stage3 tutorial video]](https://www.youtube.com/watch?v=Z8wwSHxPL9g) ``` torchrun --standalone --nproc_per_node=4 train_prompts.py \ diff --git a/docs/README-zh-Hans.md b/docs/README-zh-Hans.md index 9d5bcfe3f..c3deca7e9 100644 --- a/docs/README-zh-Hans.md +++ b/docs/README-zh-Hans.md @@ -121,12 +121,22 @@ Colossal-AI 为您提供了一系列并行组件。我们的目标是让您的 ### ColossalChat
- - + +
-[ColossalChat](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat): 完整RLHF流程0门槛克隆 [ChatGPT](https://openai.com/blog/chatgpt/) [[代码]](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat) [[博客]](https://medium.com/@yangyou_berkeley/colossalchat-an-open-source-solution-for-cloning-chatgpt-with-a-complete-rlhf-pipeline-5edf08fb538b) [[在线样例]](https://chat.colossalai.org) +[ColossalChat](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat): 完整RLHF流程0门槛克隆 [ChatGPT](https://openai.com/blog/chatgpt/) +[[代码]](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat) +[[博客]](https://medium.com/@yangyou_berkeley/colossalchat-an-open-source-solution-for-cloning-chatgpt-with-a-complete-rlhf-pipeline-5edf08fb538b) +[[在线样例]](https://www.youtube.com/watch?v=HcTiHzApHm0) +[[教程]](https://www.youtube.com/watch?v=-qFBZFmOJfg) + +

+ +

+ +- 最高可提升RLHF PPO阶段3训练速度10倍

diff --git a/examples/tutorial/README.md b/examples/tutorial/README.md index f4843331f..933026166 100644 --- a/examples/tutorial/README.md +++ b/examples/tutorial/README.md @@ -29,7 +29,11 @@ quickly deploy large AI model training and inference, reducing large AI model tr - Fine-tuning and Inference for OPT [[code]](https://github.com/hpcaitech/ColossalAI/tree/main/examples/tutorial/opt) [[video]](https://www.youtube.com/watch?v=jbEFNVzl67Y) - Optimized AlphaFold [[code]](https://github.com/hpcaitech/ColossalAI/tree/main/examples/tutorial/fastfold) [[video]](https://www.youtube.com/watch?v=-zP13LfJP7w) - Optimized Stable Diffusion [[code]](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/diffusion) [[video]](https://www.youtube.com/watch?v=8KHeUjjc-XQ) - + - ColossalChat: Cloning ChatGPT with a Complete RLHF Pipeline +[[code]](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat) +[[blog]](https://medium.com/@yangyou_berkeley/colossalchat-an-open-source-solution-for-cloning-chatgpt-with-a-complete-rlhf-pipeline-5edf08fb538b) +[[demo]](https://www.youtube.com/watch?v=HcTiHzApHm0) +[[video]](https://www.youtube.com/watch?v=-qFBZFmOJfg) ## Discussion