[chat] polish tutorial doc (#3551)

* [chat] clean up duplicate tutorial

* [chat] clean up duplicate tutorial

* [chat] clean up duplicate tutorial

* [chat] clean up duplicate tutorial
pull/3563/head
binmakeswell 2023-04-13 18:11:48 +08:00 committed by GitHub
parent 77efdfe1dd
commit 535b896435
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 39 additions and 97 deletions

View File

@ -15,20 +15,18 @@
- [Install the Transformers](#install-the-transformers) - [Install the Transformers](#install-the-transformers)
- [How to use?](#how-to-use) - [How to use?](#how-to-use)
- [Supervised datasets collection](#supervised-datasets-collection) - [Supervised datasets collection](#supervised-datasets-collection)
- [Stage1 - Supervised instructs tuning](#stage1---supervised-instructs-tuning) - [RLHF Training Stage1 - Supervised instructs tuning](#RLHF-training-stage1---supervised-instructs-tuning)
- [Stage2 - Training reward model](#stage2---training-reward-model) - [RLHF Training Stage2 - Training reward model](#RLHF-training-stage2---training-reward-model)
- [Stage3 - Training model with reinforcement learning by human feedback](#stage3---training-model-with-reinforcement-learning-by-human-feedback) - [RLHF Training Stage3 - Training model with reinforcement learning by human feedback](#RLHF-training-stage3---training-model-with-reinforcement-learning-by-human-feedback)
- [Inference - After Training](#inference---after-training) - [Inference Quantization and Serving - After Training](#inference-quantization-and-serving---after-training)
- [8-bit setup](#8-bit-setup)
- [4-bit setup](#4-bit-setup)
- [Coati7B examples](#coati7b-examples) - [Coati7B examples](#coati7b-examples)
- [Generation](#generation) - [Generation](#generation)
- [Open QA](#open-qa) - [Open QA](#open-qa)
- [Limitation for LLaMA-finetuned models](#limitation-for-llama-finetuned-models) - [Limitation for LLaMA-finetuned models](#limitation)
- [Limitation of dataset](#limitation-of-dataset) - [Limitation of dataset](#limitation)
- [FAQ](#faq) - [FAQ](#faq)
- [How to save/load checkpoint](#how-to-saveload-checkpoint) - [How to save/load checkpoint](#faq)
- [How to train with limited resources](#how-to-train-with-limited-resources) - [How to train with limited resources](#faq)
- [The Plan](#the-plan) - [The Plan](#the-plan)
- [Real-time progress](#real-time-progress) - [Real-time progress](#real-time-progress)
- [Invitation to open-source contribution](#invitation-to-open-source-contribution) - [Invitation to open-source contribution](#invitation-to-open-source-contribution)
@ -107,43 +105,19 @@ Here is how we collected the data
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/data-collect.png" width=500/> <img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/data-collect.png" width=500/>
</p> </p>
### Stage1 - Supervised instructs tuning ### RLHF Training Stage1 - Supervised instructs tuning
Stage1 is supervised instructs fine-tuning, which uses the datasets mentioned earlier to fine-tune the model Stage1 is supervised instructs fine-tuning, which uses the datasets mentioned earlier to fine-tune the model.
you can run the `examples/train_sft.sh` to start a supervised instructs fine-tuning You can run the `examples/train_sft.sh` to start a supervised instructs fine-tuning.
``` ### RLHF Training Stage2 - Training reward model
torchrun --standalone --nproc_per_node=4 train_sft.py \
--pretrain "/path/to/LLaMa-7B/" \
--model 'llama' \
--strategy colossalai_zero2 \
--log_interval 10 \
--save_path /path/to/Coati-7B \
--dataset /path/to/data.json \
--batch_size 4 \
--accimulation_steps 8 \
--lr 2e-5 \
--max_datasets_size 512 \
--max_epochs 1 \
```
### Stage2 - Training reward model
Stage2 trains a reward model, which obtains corresponding scores by manually ranking different outputs for the same prompt and supervises the training of the reward model Stage2 trains a reward model, which obtains corresponding scores by manually ranking different outputs for the same prompt and supervises the training of the reward model
you can run the `examples/train_rm.sh` to start a reward model training You can run the `examples/train_rm.sh` to start a reward model training.
``` ### RLHF Training Stage3 - Training model with reinforcement learning by human feedback
torchrun --standalone --nproc_per_node=4 train_reward_model.py
--pretrain "/path/to/LLaMa-7B/" \
--model 'llama' \
--strategy colossalai_zero2 \
--loss_fn 'log_exp'\
--save_path 'rmstatic.pt' \
```
### Stage3 - Training model with reinforcement learning by human feedback
Stage3 uses reinforcement learning algorithm, which is the most complex part of the training process: Stage3 uses reinforcement learning algorithm, which is the most complex part of the training process:
@ -151,63 +125,16 @@ Stage3 uses reinforcement learning algorithm, which is the most complex part of
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/stage-3.jpeg" width=800/> <img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/stage-3.jpeg" width=800/>
</p> </p>
you can run the `examples/train_prompts.sh` to start training PPO with human feedback You can run the `examples/train_prompts.sh` to start training PPO with human feedback.
```
torchrun --standalone --nproc_per_node=4 train_prompts.py \
--pretrain "/path/to/LLaMa-7B/" \
--model 'llama' \
--strategy colossalai_zero2 \
--prompt_path /path/to/your/prompt_dataset \
--pretrain_dataset /path/to/your/pretrain_dataset \
--rm_pretrain /your/pretrain/rm/defination \
--rm_path /your/rm/model/path
```
For more details, see [`examples/`](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat/examples). For more details, see [`examples/`](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat/examples).
### Inference - After Training ### Inference Quantization and Serving - After Training
#### 8-bit setup
8-bit quantization is originally supported by the latest [transformers](https://github.com/huggingface/transformers). Please install it from source. We provide an online inference server and a benchmark. We aim to run inference on single GPU, so quantization is essential when using large models.
Please ensure you have downloaded HF-format model weights of LLaMA models. We support 8-bit quantization (RTN), 4-bit quantization (GPTQ), and FP16 inference. You can
Online inference server scripts can help you deploy your own services.
Usage:
```python
from transformers import LlamaForCausalLM
USE_8BIT = True # use 8-bit quantization; otherwise, use fp16
model = LlamaForCausalLM.from_pretrained(
"pretrained/path",
load_in_8bit=USE_8BIT,
torch_dtype=torch.float16,
device_map="auto",
)
if not USE_8BIT:
model.half() # use fp16
model.eval()
```
**Troubleshooting**: if you get errors indicating your CUDA-related libraries are not found when loading the 8-bit model, you can check whether your `LD_LIBRARY_PATH` is correct.
E.g. you can set `export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH`.
#### 4-bit setup
Please ensure you have downloaded the HF-format model weights of LLaMA models first.
Then you can follow [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa). This lib provides efficient CUDA kernels and weight conversion scripts.
After installing this lib, we may convert the original HF-format LLaMA model weights to a 4-bit version.
```shell
CUDA_VISIBLE_DEVICES=0 python llama.py /path/to/pretrained/llama-7b c4 --wbits 4 --groupsize 128 --save llama7b-4bit.pt
```
Run this command in your cloned `GPTQ-for-LLaMa` directory, then you will get a 4-bit weight file `llama7b-4bit-128g.pt`.
**Troubleshooting**: if you get errors about `position_ids`, you can checkout to commit `50287c3b9ae4a3b66f6b5127c643ec39b769b155`(`GPTQ-for-LLaMa` repo).
For more details, see [`inference/`](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat/inference). For more details, see [`inference/`](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat/inference).
@ -283,24 +210,27 @@ For more details, see [`inference/`](https://github.com/hpcaitech/ColossalAI/tre
You can find more examples in this [repo](https://github.com/XueFuzhao/InstructionWild/blob/main/comparison.md). You can find more examples in this [repo](https://github.com/XueFuzhao/InstructionWild/blob/main/comparison.md).
### Limitation for LLaMA-finetuned models ### Limitation
<details><summary><b>Limitation for LLaMA-finetuned models</b></summary>
- Both Alpaca and ColossalChat are based on LLaMA. It is hard to compensate for the missing knowledge in the pre-training stage. - Both Alpaca and ColossalChat are based on LLaMA. It is hard to compensate for the missing knowledge in the pre-training stage.
- Lack of counting ability: Cannot count the number of items in a list. - Lack of counting ability: Cannot count the number of items in a list.
- Lack of Logics (reasoning and calculation) - Lack of Logics (reasoning and calculation)
- Tend to repeat the last sentence (fail to produce the end token). - Tend to repeat the last sentence (fail to produce the end token).
- Poor multilingual results: LLaMA is mainly trained on English datasets (Generation performs better than QA). - Poor multilingual results: LLaMA is mainly trained on English datasets (Generation performs better than QA).
</details>
### Limitation of dataset <details><summary><b>Limitation of dataset</b></summary>
- Lack of summarization ability: No such instructions in finetune datasets. - Lack of summarization ability: No such instructions in finetune datasets.
- Lack of multi-turn chat: No such instructions in finetune datasets - Lack of multi-turn chat: No such instructions in finetune datasets
- Lack of self-recognition: No such instructions in finetune datasets - Lack of self-recognition: No such instructions in finetune datasets
- Lack of Safety: - Lack of Safety:
- When the input contains fake facts, the model makes up false facts and explanations. - When the input contains fake facts, the model makes up false facts and explanations.
- Cannot abide by OpenAI's policy: When generating prompts from OpenAI API, it always abides by its policy. So no violation case is in the datasets. - Cannot abide by OpenAI's policy: When generating prompts from OpenAI API, it always abides by its policy. So no violation case is in the datasets.
</details>
## FAQ ## FAQ
### How to save/load checkpoint <details><summary><b>How to save/load checkpoint</b></summary>
We have integrated the Transformers save and load pipeline, allowing users to freely call Hugging Face's language models and save them in the HF format. We have integrated the Transformers save and load pipeline, allowing users to freely call Hugging Face's language models and save them in the HF format.
@ -325,7 +255,9 @@ trainer.fit()
trainer.save_model(path=args.save_path, only_rank0=True, tokenizer=tokenizer) trainer.save_model(path=args.save_path, only_rank0=True, tokenizer=tokenizer)
``` ```
### How to train with limited resources </details>
<details><summary><b>How to train with limited resources</b></summary>
Here are some examples that can allow you to train a 7B model on a single or multiple consumer-grade GPUs. Here are some examples that can allow you to train a 7B model on a single or multiple consumer-grade GPUs.
@ -360,7 +292,7 @@ torchrun --standalone --nproc_per_node=1 train_sft.py \
--lr 2e-5 \ --lr 2e-5 \
--max_datasets_size 512 \ --max_datasets_size 512 \
--max_epochs 1 \ --max_epochs 1 \
``` ```
If you have 4x32 GB GPUs, you can even train the whole 7B model using our `colossalai_zero2_cpu` strategy! The script is given as follows. If you have 4x32 GB GPUs, you can even train the whole 7B model using our `colossalai_zero2_cpu` strategy! The script is given as follows.
``` ```
@ -377,6 +309,8 @@ torchrun --standalone --nproc_per_node=4 train_sft.py \
--max_datasets_size 512 \ --max_datasets_size 512 \
--max_epochs 1 \ --max_epochs 1 \
``` ```
</details>
## The Plan ## The Plan
@ -409,6 +343,14 @@ and [WeChat(微信)](https://raw.githubusercontent.com/hpcaitech/public_assets/m
Thanks so much to all of our amazing contributors! Thanks so much to all of our amazing contributors!
## Quick Preview ## Quick Preview
<div align="center">
<a href="https://chat.colossalai.org/">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/Chat-demo.png" width="700" />
</a>
</div>
- An open-source low cost solution for cloning [ChatGPT](https://openai.com/blog/chatgpt/) with a complete RLHF pipeline. [[demo]](https://chat.colossalai.org)
<p id="ChatGPT_scaling" align="center"> <p id="ChatGPT_scaling" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chatgpt/ChatGPT%20scaling.png" width=800/> <img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chatgpt/ChatGPT%20scaling.png" width=800/>
</p> </p>