Image source: https://openai.com/blog/chatgpt
> DeepSpeedChat performance comes from its blog on 2023 April 12, ColossalChat performance can be reproduced on an AWS p4d.24xlarge node with 8 A100-40G GPUs with the following command: `torchrun --standalone --nproc_per_node 8 benchmark_opt_lora_dummy.py --num_collect_steps 1 --use_kernels --strategy colossalai_zero2 --experience_batch_size 64 --train_batch_size 32` ## Install ### Install the environment ```bash conda create -n coati conda activate coati git clone https://github.com/hpcaitech/ColossalAI.git cd ColossalAI/applications/Chat pip install . ``` ### Install the Transformers ```bash pip install transformers==4.30.2 ``` ## How to use? ### Supervised datasets collection We collected 104K bilingual datasets of Chinese and English, and you can find the datasets in this repo [InstructionWild](https://github.com/XueFuzhao/InstructionWild) and in this [file](https://github.com/XueFuzhao/InstructionWild/blob/main/data/README.md). Here is how we collected the data
### RLHF Training Stage1 - Supervised instructs tuning Stage1 is supervised instructs fine-tuning, which uses the datasets mentioned earlier to fine-tune the model. You can run the `examples/train_sft.sh` to start a supervised instructs fine-tuning. [[Stage1 tutorial video]](https://www.youtube.com/watch?v=-qFBZFmOJfg) **Note**: the supervised dataset follows the following format, ```json [ { "instruction": "Provide a list of the top 10 most popular mobile games in Asia", "input": "", "output": "The top 10 most popular mobile games in Asia are:\n1) PUBG Mobile\n2) Pokemon Go\n3) Candy Crush Saga\n4) Free Fire\n5) Clash of Clans\n6) Mario Kart Tour\n7) Arena of Valor\n8) Fantasy Westward Journey\n9) Subway Surfers\n10) ARK Survival Evolved", "id": 0 }, ... ] ``` ### RLHF Training Stage2 - Training reward model Stage2 trains a reward model, which obtains corresponding scores by manually ranking different outputs for the same prompt and supervises the training of the reward model You can run the `examples/train_rm.sh` to start a reward model training. [[Stage2 tutorial video]](https://www.youtube.com/watch?v=gMx2CApKhuo) ### RLHF Training Stage3 - Training model with reinforcement learning by human feedback Stage3 uses reinforcement learning algorithm, which is the most complex part of the training process:
You can run the `examples/train_prompts.sh` to start training PPO with human feedback. [[Stage3 tutorial video]](https://www.youtube.com/watch?v=Z8wwSHxPL9g) **Note**: the required datasets follow the following format, - `pretrain dataset` ```json [ { "instruction": "Provide a list of the top 10 most popular mobile games in Asia", "input": "", "output": "The top 10 most popular mobile games in Asia are:\n1) PUBG Mobile\n2) Pokemon Go\n3) Candy Crush Saga\n4) Free Fire\n5) Clash of Clans\n6) Mario Kart Tour\n7) Arena of Valor\n8) Fantasy Westward Journey\n9) Subway Surfers\n10) ARK Survival Evolved", "id": 0 }, ... ] ``` - `prompt dataset` ```json [ { "instruction": "Edit this paragraph to make it more concise: \"Yesterday, I went to the store and bought some things. Then, I came home and put them away. After that, I went for a walk and met some friends.\"", "id": 0 }, { "instruction": "Write a descriptive paragraph about a memorable vacation you went on", "id": 1 }, ... ] ``` For more details, see [`examples/`](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat/examples). ### Inference Quantization and Serving - After Training We provide an online inference server and a benchmark. We aim to run inference on single GPU, so quantization is essential when using large models. We support 8-bit quantization (RTN), 4-bit quantization (GPTQ), and FP16 inference. Online inference server scripts can help you deploy your own services. For more details, see [`inference/`](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat/inference). ## Coati7B examples ### Generation
- Up to 7.73 times faster for single server training and 1.42 times faster for single-GPU inference
- Up to 10.3x growth in model capacity on one GPU - A mini demo training process requires only 1.62GB of GPU memory (any consumer-grade GPU)
- Increase the capacity of the fine-tuning model by up to 3.7 times on a single GPU - Keep in a sufficiently high running speed | Model Pair | Alpaca-7B ⚔ Coati-7B | Coati-7B ⚔ Alpaca-7B | | :-----------: | :------------------: | :------------------: | | Better Cases | 38 ⚔ **41** | **45** ⚔ 33 | | Win Rate | 48% ⚔ **52%** | **58%** ⚔ 42% | | Average Score | 7.06 ⚔ **7.13** | **7.31** ⚔ 6.82 | - Our Coati-7B model performs better than Alpaca-7B when using GPT-4 to evaluate model performance. The Coati-7B model we evaluate is an old version we trained a few weeks ago and the new version is around the corner. ## Authors Coati is developed by ColossalAI Team: - [Fazzie](https://fazzie-key.cool/about/index.html) - [FrankLeeeee](https://github.com/FrankLeeeee) - [BlueRum](https://github.com/ht-zhou) - [ver217](https://github.com/ver217) - [ofey404](https://github.com/ofey404) - [Wenhao Chen](https://github.com/CWHer) The PhD student from [(HPC-AI) Lab](https://ai.comp.nus.edu.sg/) also contributed a lot to this project. - [Zangwei Zheng](https://github.com/zhengzangw) - [Xue Fuzhao](https://github.com/XueFuzhao) ## Citations ```bibtex @article{Hu2021LoRALA, title = {LoRA: Low-Rank Adaptation of Large Language Models}, author = {Edward J. Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen-Zhu and Yuanzhi Li and Shean Wang and Weizhu Chen}, journal = {ArXiv}, year = {2021}, volume = {abs/2106.09685} } @article{ouyang2022training, title={Training language models to follow instructions with human feedback}, author={Ouyang, Long and Wu, Jeff and Jiang, Xu and Almeida, Diogo and Wainwright, Carroll L and Mishkin, Pamela and Zhang, Chong and Agarwal, Sandhini and Slama, Katarina and Ray, Alex and others}, journal={arXiv preprint arXiv:2203.02155}, year={2022} } @article{touvron2023llama, title={LLaMA: Open and Efficient Foundation Language Models}, author={Touvron, Hugo and Lavril, Thibaut and Izacard, Gautier and Martinet, Xavier and Lachaux, Marie-Anne and Lacroix, Timoth{\'e}e and Rozi{\`e}re, Baptiste and Goyal, Naman and Hambro, Eric and Azhar, Faisal and Rodriguez, Aurelien and Joulin, Armand and Grave, Edouard and Lample, Guillaume}, journal={arXiv preprint arXiv:2302.13971}, year={2023} } @misc{alpaca, author = {Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto }, title = {Stanford Alpaca: An Instruction-following LLaMA model}, year = {2023}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/tatsu-lab/stanford_alpaca}}, } @misc{instructionwild, author = {Fuzhao Xue and Zangwei Zheng and Yang You }, title = {Instruction in the Wild: A User-based Instruction Dataset}, year = {2023}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/XueFuzhao/InstructionWild}}, } ``` ## Licenses Coati is licensed under the [Apache 2.0 License](LICENSE).