2023-03-28 12:25:36 +00:00
< h1 align = "center" >
2023-03-29 00:47:00 +00:00
< img width = "auto" height = "100px" , src = "https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/logo_coati.png" / >
< br / >
< span > ColossalChat< / span >
2023-03-28 12:25:36 +00:00
< / h1 >
## Table of Contents
- [Table of Contents ](#table-of-contents )
2023-03-29 00:47:00 +00:00
- [What is ColossalChat and Coati ? ](#what-is-colossalchat-and-coati- )
2023-03-28 12:25:36 +00:00
- [Online demo ](#online-demo )
- [Install ](#install )
- [Install the environment ](#install-the-environment )
- [Install the Transformers ](#install-the-transformers )
- [How to use? ](#how-to-use )
- [Supervised datasets collection ](#supervised-datasets-collection )
2023-04-13 10:11:48 +00:00
- [RLHF Training Stage1 - Supervised instructs tuning ](#RLHF-training-stage1---supervised-instructs-tuning )
- [RLHF Training Stage2 - Training reward model ](#RLHF-training-stage2---training-reward-model )
- [RLHF Training Stage3 - Training model with reinforcement learning by human feedback ](#RLHF-training-stage3---training-model-with-reinforcement-learning-by-human-feedback )
- [Inference Quantization and Serving - After Training ](#inference-quantization-and-serving---after-training )
2023-03-28 12:25:36 +00:00
- [Coati7B examples ](#coati7b-examples )
2023-03-28 15:07:15 +00:00
- [Generation ](#generation )
- [Open QA ](#open-qa )
2023-04-13 10:11:48 +00:00
- [Limitation for LLaMA-finetuned models ](#limitation )
- [Limitation of dataset ](#limitation )
2023-03-28 12:25:36 +00:00
- [FAQ ](#faq )
2023-04-13 10:11:48 +00:00
- [How to save/load checkpoint ](#faq )
- [How to train with limited resources ](#faq )
2023-03-28 12:25:36 +00:00
- [The Plan ](#the-plan )
- [Real-time progress ](#real-time-progress )
- [Invitation to open-source contribution ](#invitation-to-open-source-contribution )
- [Quick Preview ](#quick-preview )
- [Authors ](#authors )
- [Citations ](#citations )
- [Licenses ](#licenses )
---
2023-03-29 00:47:00 +00:00
## What is ColossalChat and Coati ?
2023-03-28 12:25:36 +00:00
2023-03-29 01:27:55 +00:00
[ColossalChat ](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat ) is the project to implement LLM with RLHF, powered by the [Colossal-AI ](https://github.com/hpcaitech/ColossalAI ) project.
2023-03-29 00:47:00 +00:00
Coati stands for `ColossalAI Talking Intelligence` . It is the name for the module implemented in this project and is also the name of the large language model developed by the ColossalChat project.
The Coati package provides a unified large language model framework that has implemented the following functions
2023-03-28 12:25:36 +00:00
- Supports comprehensive large-model training acceleration capabilities for ColossalAI, without requiring knowledge of complex distributed training algorithms
- Supervised datasets collection
2023-03-30 06:18:37 +00:00
- Supervised instructions fine-tuning
2023-03-28 12:25:36 +00:00
- Training reward model
- Reinforcement learning with human feedback
- Quantization inference
- Fast model deploying
2023-03-30 06:18:37 +00:00
- Perfectly integrated with the Hugging Face ecosystem, a high degree of model customization
2023-03-28 12:25:36 +00:00
2023-03-29 01:27:55 +00:00
< div align = "center" >
< p align = "center" >
< img src = "https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chatgpt/chatgpt.png" width = 700/ >
< / p >
2023-03-29 00:47:00 +00:00
2023-03-29 01:27:55 +00:00
Image source: https://openai.com/blog/chatgpt
< / div >
2023-03-28 12:25:36 +00:00
2023-03-29 01:27:55 +00:00
**As Colossa-AI is undergoing some major updates, this project will be actively maintained to stay in line with the Colossal-AI project.**
2023-03-28 12:25:36 +00:00
2023-03-29 01:27:55 +00:00
More details can be found in the latest news.
* [2023/03] [ColossalChat: An Open-Source Solution for Cloning ChatGPT With a Complete RLHF Pipeline ](https://medium.com/@yangyou_berkeley/colossalchat-an-open-source-solution-for-cloning-chatgpt-with-a-complete-rlhf-pipeline-5edf08fb538b )
* [2023/02] [Open Source Solution Replicates ChatGPT Training Process! Ready to go with only 1.6GB GPU Memory ](https://www.hpc-ai.tech/blog/colossal-ai-chatgpt )
2023-03-29 00:47:00 +00:00
2023-03-28 12:25:36 +00:00
## Online demo
You can experience the performance of Coati7B on this page.
[chat.colossalai.org ](https://chat.colossalai.org/ )
2023-03-29 00:47:00 +00:00
Due to resource constraints, we will only provide this service from 29th Mar 2023 to 5 April 2023. However, we have provided the inference code in the [inference ](./inference/ ) folder. The WebUI will be open-sourced soon as well.
2023-03-28 12:25:36 +00:00
> Warning: Due to model and dataset size limitations, Coati is just a baby model, Coati7B may output incorrect information and lack the ability for multi-turn dialogue. There is still significant room for improvement.
## Install
### Install the environment
```shell
2023-03-28 15:34:21 +00:00
conda create -n coati
2023-03-28 12:25:36 +00:00
conda activate coati
pip install .
```
### Install the Transformers
Given Hugging Face hasn't officially supported the LLaMA models, We fork a branch of Transformers that can be compatible with our code
```shell
git clone https://github.com/hpcaitech/transformers
cd transformers
pip install .
```
## How to use?
### Supervised datasets collection
2023-03-30 06:18:37 +00:00
we collected 104K bilingual datasets of Chinese and English, and you can find the datasets in this repo
2023-03-28 15:07:15 +00:00
[InstructionWild ](https://github.com/XueFuzhao/InstructionWild )
2023-03-28 12:25:36 +00:00
Here is how we collected the data
< p align = "center" >
2023-03-28 15:34:21 +00:00
< img src = "https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/data-collect.png" width = 500/ >
2023-03-28 12:25:36 +00:00
< / p >
2023-04-13 10:11:48 +00:00
### RLHF Training Stage1 - Supervised instructs tuning
2023-03-28 12:25:36 +00:00
2023-04-13 10:11:48 +00:00
Stage1 is supervised instructs fine-tuning, which uses the datasets mentioned earlier to fine-tune the model.
2023-03-28 12:25:36 +00:00
2023-04-13 10:11:48 +00:00
You can run the `examples/train_sft.sh` to start a supervised instructs fine-tuning.
2023-03-28 12:25:36 +00:00
2023-04-13 10:11:48 +00:00
### RLHF Training Stage2 - Training reward model
2023-03-28 12:25:36 +00:00
Stage2 trains a reward model, which obtains corresponding scores by manually ranking different outputs for the same prompt and supervises the training of the reward model
2023-04-13 10:11:48 +00:00
You can run the `examples/train_rm.sh` to start a reward model training.
2023-03-28 12:25:36 +00:00
2023-04-13 10:11:48 +00:00
### RLHF Training Stage3 - Training model with reinforcement learning by human feedback
2023-03-28 12:25:36 +00:00
Stage3 uses reinforcement learning algorithm, which is the most complex part of the training process:
< p align = "center" >
2023-03-28 18:32:17 +00:00
< img src = "https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/stage-3.jpeg" width = 800/ >
2023-03-28 12:25:36 +00:00
< / p >
2023-04-13 10:11:48 +00:00
You can run the `examples/train_prompts.sh` to start training PPO with human feedback.
2023-03-28 12:25:36 +00:00
2023-03-28 18:32:17 +00:00
For more details, see [`examples/` ](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat/examples ).
2023-03-28 12:25:36 +00:00
2023-04-13 10:11:48 +00:00
### Inference Quantization and Serving - After Training
2023-03-28 18:32:17 +00:00
2023-04-13 10:11:48 +00:00
We provide an online inference server and a benchmark. We aim to run inference on single GPU, so quantization is essential when using large models.
2023-03-28 18:32:17 +00:00
2023-04-13 10:11:48 +00:00
We support 8-bit quantization (RTN), 4-bit quantization (GPTQ), and FP16 inference. You can
Online inference server scripts can help you deploy your own services.
2023-03-28 18:32:17 +00:00
For more details, see [`inference/` ](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat/inference ).
2023-03-28 12:25:36 +00:00
## Coati7B examples
2023-03-28 15:07:15 +00:00
### Generation
< details > < summary > < b > E-mail< / b > < / summary >
2023-03-28 15:34:21 +00:00
![phd ](https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/Phd.png )
2023-03-28 15:07:15 +00:00
< / details >
< details > < summary > < b > coding< / b > < / summary >
2023-03-28 15:34:21 +00:00
![sort ](https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/quick_sort.png )
2023-03-28 15:07:15 +00:00
< / details >
< details > < summary > < b > regex< / b > < / summary >
2023-03-28 15:34:21 +00:00
![regex ](https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/regex.png )
2023-03-28 15:07:15 +00:00
< / details >
< details > < summary > < b > Tex< / b > < / summary >
2023-03-28 15:34:21 +00:00
![tex ](https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/tex.png )
2023-03-28 15:07:15 +00:00
< / details >
< details > < summary > < b > writing< / b > < / summary >
2023-03-28 15:34:21 +00:00
![writing ](https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/writing.png )
2023-03-28 15:07:15 +00:00
< / details >
< details > < summary > < b > Table< / b > < / summary >
2023-03-28 15:34:21 +00:00
![Table ](https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/table.png )
2023-03-28 15:07:15 +00:00
< / details >
### Open QA
< details > < summary > < b > Game< / b > < / summary >
2023-03-28 15:34:21 +00:00
![Game ](https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/game.png )
2023-03-28 15:07:15 +00:00
< / details >
< details > < summary > < b > Travel< / b > < / summary >
2023-03-28 15:34:21 +00:00
![Travel ](https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/travel.png )
2023-03-28 15:07:15 +00:00
< / details >
< details > < summary > < b > Physical< / b > < / summary >
2023-03-28 18:32:17 +00:00
![Physical ](https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/physical.png )
2023-03-28 15:07:15 +00:00
< / details >
< details > < summary > < b > Chemical< / b > < / summary >
2023-03-28 15:34:21 +00:00
![Chemical ](https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/chemical.png )
2023-03-28 15:07:15 +00:00
< / details >
< details > < summary > < b > Economy< / b > < / summary >
2023-03-28 15:34:21 +00:00
![Economy ](https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/economy.png )
2023-03-28 15:07:15 +00:00
< / details >
2023-03-28 12:25:36 +00:00
2023-03-29 02:25:50 +00:00
You can find more examples in this [repo ](https://github.com/XueFuzhao/InstructionWild/blob/main/comparison.md ).
2023-03-28 18:32:17 +00:00
2023-04-13 10:11:48 +00:00
### Limitation
< details > < summary > < b > Limitation for LLaMA-finetuned models< / b > < / summary >
2023-03-28 16:53:23 +00:00
- Both Alpaca and ColossalChat are based on LLaMA. It is hard to compensate for the missing knowledge in the pre-training stage.
- Lack of counting ability: Cannot count the number of items in a list.
- Lack of Logics (reasoning and calculation)
- Tend to repeat the last sentence (fail to produce the end token).
- Poor multilingual results: LLaMA is mainly trained on English datasets (Generation performs better than QA).
2023-04-13 10:11:48 +00:00
< / details >
2023-03-28 18:32:17 +00:00
2023-04-13 10:11:48 +00:00
< details > < summary > < b > Limitation of dataset< / b > < / summary >
2023-03-28 16:53:23 +00:00
- Lack of summarization ability: No such instructions in finetune datasets.
- Lack of multi-turn chat: No such instructions in finetune datasets
- Lack of self-recognition: No such instructions in finetune datasets
- Lack of Safety:
- When the input contains fake facts, the model makes up false facts and explanations.
- Cannot abide by OpenAI's policy: When generating prompts from OpenAI API, it always abides by its policy. So no violation case is in the datasets.
2023-04-13 10:11:48 +00:00
< / details >
2023-03-28 18:32:17 +00:00
2023-03-28 12:25:36 +00:00
## FAQ
2023-04-13 10:11:48 +00:00
< details > < summary > < b > How to save/load checkpoint< / b > < / summary >
2023-03-28 12:25:36 +00:00
We have integrated the Transformers save and load pipeline, allowing users to freely call Hugging Face's language models and save them in the HF format.
```
from coati.models.llama import LlamaLM
from coati.trainer import SFTTrainer
model = LlamaLM(pretrained=args.pretrain)
tokenizer = AutoTokenizer.from_pretrained(args.pretrain)
trainer = SFTTrainer(model=model,
strategy=strategy,
optim=optim,
train_dataloader=train_dataloader,
eval_dataloader=eval_dataloader,
batch_size=args.batch_size,
max_epochs=args.max_epochs,
accimulation_steps = args.accimulation_steps
)
trainer.fit()
trainer.save_model(path=args.save_path, only_rank0=True, tokenizer=tokenizer)
```
2023-04-13 10:11:48 +00:00
< / details >
< details > < summary > < b > How to train with limited resources< / b > < / summary >
2023-04-12 07:47:09 +00:00
Here are some examples that can allow you to train a 7B model on a single or multiple consumer-grade GPUs.
If you only have a single 24G GPU, you can use the following script. `batch_size` and `lora_rank` are the most important parameters to successfully train the model.
```
torchrun --standalone --nproc_per_node=1 train_sft.py \
--pretrain "/path/to/LLaMa-7B/" \
--model 'llama' \
--strategy naive \
--log_interval 10 \
--save_path /path/to/Coati-7B \
--dataset /path/to/data.json \
--batch_size 1 \
--accimulation_steps 8 \
--lr 2e-5 \
--max_datasets_size 512 \
--max_epochs 1 \
--lora_rank 16 \
```
`colossalai_gemini` strategy can enable a single 24G GPU to train the whole model without using LoRA if you have sufficient CPU memory. You can use the following script.
```
torchrun --standalone --nproc_per_node=1 train_sft.py \
--pretrain "/path/to/LLaMa-7B/" \
--model 'llama' \
--strategy colossalai_gemini \
--log_interval 10 \
--save_path /path/to/Coati-7B \
--dataset /path/to/data.json \
--batch_size 1 \
--accimulation_steps 8 \
--lr 2e-5 \
--max_datasets_size 512 \
--max_epochs 1 \
2023-04-13 10:11:48 +00:00
```
2023-04-12 07:47:09 +00:00
If you have 4x32 GB GPUs, you can even train the whole 7B model using our `colossalai_zero2_cpu` strategy! The script is given as follows.
```
torchrun --standalone --nproc_per_node=4 train_sft.py \
--pretrain "/path/to/LLaMa-7B/" \
--model 'llama' \
--strategy colossalai_zero2_cpu \
--log_interval 10 \
--save_path /path/to/Coati-7B \
--dataset /path/to/data.json \
--batch_size 1 \
--accimulation_steps 8 \
--lr 2e-5 \
--max_datasets_size 512 \
--max_epochs 1 \
```
2023-04-13 10:11:48 +00:00
< / details >
2023-04-12 07:47:09 +00:00
2023-03-28 12:25:36 +00:00
## The Plan
- [x] implement PPO fine-tuning
- [x] implement training reward model
- [x] support LoRA
- [x] support inference
- [x] support llama from [facebook ](https://github.com/facebookresearch/llama )
- [x] implement PPO-ptx fine-tuning
- [ ] integrate with Ray
- [ ] support more RL paradigms, like Implicit Language Q-Learning (ILQL),
2023-03-30 06:18:37 +00:00
- [ ] support chain-of-thought by [langchain ](https://github.com/hwchase17/langchain )
2023-03-28 12:25:36 +00:00
### Real-time progress
You will find our progress in github project broad
[Coati ](https://github.com/orgs/hpcaitech/projects/17/views/1 )
## Invitation to open-source contribution
Referring to the successful attempts of [BLOOM ](https://bigscience.huggingface.co/ ) and [Stable Diffusion ](https://en.wikipedia.org/wiki/Stable_Diffusion ), any and all developers and partners with computing powers, datasets, models are welcome to join and build the Colossal-AI community, making efforts towards the era of big AI models from the starting point of replicating ChatGPT!
You may contact us or participate in the following ways:
1. [Leaving a Star ⭐ ](https://github.com/hpcaitech/ColossalAI/stargazers ) to show your like and support. Thanks!
2. Posting an [issue ](https://github.com/hpcaitech/ColossalAI/issues/new/choose ), or submitting a PR on GitHub follow the guideline in [Contributing ](https://github.com/hpcaitech/ColossalAI/blob/main/CONTRIBUTING.md ).
3. Join the Colossal-AI community on
[Slack ](https://join.slack.com/t/colossalaiworkspace/shared_invite/zt-z7b26eeb-CBp7jouvu~r0~lcFzX832w ),
and [WeChat(微信) ](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/WeChat.png "qrcode" ) to share your ideas.
4. Send your official proposal to email contact@hpcaitech.com
Thanks so much to all of our amazing contributors!
## Quick Preview
2023-04-13 10:11:48 +00:00
< div align = "center" >
< a href = "https://chat.colossalai.org/" >
< img src = "https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/Chat-demo.png" width = "700" / >
< / a >
< / div >
- An open-source low cost solution for cloning [ChatGPT ](https://openai.com/blog/chatgpt/ ) with a complete RLHF pipeline. [[demo]](https://chat.colossalai.org)
2023-03-28 12:25:36 +00:00
< p id = "ChatGPT_scaling" align = "center" >
< img src = "https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chatgpt/ChatGPT%20scaling.png" width = 800/ >
< / p >
- Up to 7.73 times faster for single server training and 1.42 times faster for single-GPU inference
< p id = "ChatGPT-1GPU" align = "center" >
< img src = "https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chatgpt/ChatGPT-1GPU.jpg" width = 450/ >
< / p >
- Up to 10.3x growth in model capacity on one GPU
- A mini demo training process requires only 1.62GB of GPU memory (any consumer-grade GPU)
< p id = "inference" align = "center" >
< img src = "https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chatgpt/LoRA%20data.jpg" width = 600/ >
< / p >
- Increase the capacity of the fine-tuning model by up to 3.7 times on a single GPU
- Keep in a sufficiently high running speed
## Authors
2023-03-29 03:04:30 +00:00
Coati is developed by ColossalAI Team:
- [Fazzie ](https://fazzie-key.cool/about/index.html )
- [FrankLeeeee ](https://github.com/FrankLeeeee )
- [BlueRum ](https://github.com/ht-zhou )
- [ver217 ](https://github.com/ver217 )
- [ofey404 ](https://github.com/ofey404 )
The Phd student from [(HPC-AI) Lab ](https://ai.comp.nus.edu.sg/ ) also contributed a lot to this project.
- [Zangwei Zheng ](https://github.com/zhengzangw )
- [Xue Fuzhao ](https://github.com/XueFuzhao )
2023-03-28 12:25:36 +00:00
## Citations
```bibtex
@article {Hu2021LoRALA,
title = {LoRA: Low-Rank Adaptation of Large Language Models},
author = {Edward J. Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen-Zhu and Yuanzhi Li and Shean Wang and Weizhu Chen},
journal = {ArXiv},
year = {2021},
volume = {abs/2106.09685}
}
@article {ouyang2022training,
title={Training language models to follow instructions with human feedback},
author={Ouyang, Long and Wu, Jeff and Jiang, Xu and Almeida, Diogo and Wainwright, Carroll L and Mishkin, Pamela and Zhang, Chong and Agarwal, Sandhini and Slama, Katarina and Ray, Alex and others},
journal={arXiv preprint arXiv:2203.02155},
year={2022}
}
@article {touvron2023llama,
title={LLaMA: Open and Efficient Foundation Language Models},
author={Touvron, Hugo and Lavril, Thibaut and Izacard, Gautier and Martinet, Xavier and Lachaux, Marie-Anne and Lacroix, Timoth{\'e}e and Rozi{\`e}re, Baptiste and Goyal, Naman and Hambro, Eric and Azhar, Faisal and Rodriguez, Aurelien and Joulin, Armand and Grave, Edouard and Lample, Guillaume},
journal={arXiv preprint arXiv:2302.13971},
year={2023}
}
@misc {alpaca,
author = {Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto },
title = {Stanford Alpaca: An Instruction-following LLaMA model},
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/tatsu-lab/stanford_alpaca}},
}
2023-03-28 16:38:36 +00:00
@misc {instructionwild,
author = {Fuzhao Xue and Zangwei Zheng and Yang You },
title = {Instruction in the Wild: A User-based Instruction Dataset},
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/XueFuzhao/InstructionWild}},
}
2023-03-28 12:25:36 +00:00
```
## Licenses
Coati is licensed under the [Apache 2.0 License ](LICENSE ).