Image source: https://openai.com/blog/chatgpt
### Stage1 - Supervised instructs tuning Stage1 is supervised instructs fine-tuning, which uses the datasets mentioned earlier to fine-tune the model you can run the `examples/train_sft.sh` to start a supervised instructs fine-tuning ``` torchrun --standalone --nproc_per_node=4 train_sft.py \ --pretrain "/path/to/LLaMa-7B/" \ --model 'llama' \ --strategy colossalai_zero2 \ --log_interval 10 \ --save_path /path/to/Coati-7B \ --dataset /path/to/data.json \ --batch_size 4 \ --accimulation_steps 8 \ --lr 2e-5 \ --max_datasets_size 512 \ --max_epochs 1 \ ``` ### Stage2 - Training reward model Stage2 trains a reward model, which obtains corresponding scores by manually ranking different outputs for the same prompt and supervises the training of the reward model you can run the `examples/train_rm.sh` to start a reward model training ``` torchrun --standalone --nproc_per_node=4 train_reward_model.py --pretrain "/path/to/LLaMa-7B/" \ --model 'llama' \ --strategy colossalai_zero2 \ --loss_fn 'log_exp'\ --save_path 'rmstatic.pt' \ ``` ### Stage3 - Training model with reinforcement learning by human feedback Stage3 uses reinforcement learning algorithm, which is the most complex part of the training process:
you can run the `examples/train_prompts.sh` to start training PPO with human feedback ``` torchrun --standalone --nproc_per_node=4 train_prompts.py \ --pretrain "/path/to/LLaMa-7B/" \ --model 'llama' \ --strategy colossalai_zero2 \ --prompt_path /path/to/your/prompt_dataset \ --pretrain_dataset /path/to/your/pretrain_dataset \ --rm_pretrain /your/pretrain/rm/defination \ --rm_path /your/rm/model/path ``` For more details, see [`examples/`](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat/examples). ### Inference - After Training #### 8-bit setup 8-bit quantization is originally supported by the latest [transformers](https://github.com/huggingface/transformers). Please install it from source. Please ensure you have downloaded HF-format model weights of LLaMA models. Usage: ```python from transformers import LlamaForCausalLM USE_8BIT = True # use 8-bit quantization; otherwise, use fp16 model = LlamaForCausalLM.from_pretrained( "pretrained/path", load_in_8bit=USE_8BIT, torch_dtype=torch.float16, device_map="auto", ) if not USE_8BIT: model.half() # use fp16 model.eval() ``` **Troubleshooting**: if you get error indicating your CUDA-related libraries not found when loading 8-bit model, you can check whether your `LD_LIBRARY_PATH` is correct. E.g. you can set `export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH`. #### 4-bit setup Please ensure you have downloaded HF-format model weights of LLaMA models first. Then you can follow [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa). This lib provides efficient CUDA kernels and weight convertion script. After installing this lib, we may convert the original HF-format LLaMA model weights to 4-bit version. ```shell CUDA_VISIBLE_DEVICES=0 python llama.py /path/to/pretrained/llama-7b c4 --wbits 4 --groupsize 128 --save llama7b-4bit.pt ``` Run this command in your cloned `GPTQ-for-LLaMa` directory, then you will get a 4-bit weight file `llama7b-4bit-128g.pt`. **Troubleshooting**: if you get error about `position_ids`, you can checkout to commit `50287c3b9ae4a3b66f6b5127c643ec39b769b155`(`GPTQ-for-LLaMa` repo). For more details, see [`inference/`](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat/inference). ## Coati7B examples ### Generation
- Up to 7.73 times faster for single server training and 1.42 times faster for single-GPU inference
- Up to 10.3x growth in model capacity on one GPU - A mini demo training process requires only 1.62GB of GPU memory (any consumer-grade GPU)
- Increase the capacity of the fine-tuning model by up to 3.7 times on a single GPU - Keep in a sufficiently high running speed ## Authors Coati is developed by ColossalAI Team: [Fazzie](https://fazzie-key.cool/about/index.html), [FrankLeeeee](https://github.com/FrankLeeeee), [BlueRum](https://github.com/ht-zhou), [ver217](https://github.com/ver217) The Phd student [Zangwei Zheng](https://github.com/zhengzangw) and [Xue Fuzhao](https://github.com/XueFuzhao) also contributed a lot to this project. ## Citations ```bibtex @article{Hu2021LoRALA, title = {LoRA: Low-Rank Adaptation of Large Language Models}, author = {Edward J. Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen-Zhu and Yuanzhi Li and Shean Wang and Weizhu Chen}, journal = {ArXiv}, year = {2021}, volume = {abs/2106.09685} } @article{ouyang2022training, title={Training language models to follow instructions with human feedback}, author={Ouyang, Long and Wu, Jeff and Jiang, Xu and Almeida, Diogo and Wainwright, Carroll L and Mishkin, Pamela and Zhang, Chong and Agarwal, Sandhini and Slama, Katarina and Ray, Alex and others}, journal={arXiv preprint arXiv:2203.02155}, year={2022} } @article{touvron2023llama, title={LLaMA: Open and Efficient Foundation Language Models}, author={Touvron, Hugo and Lavril, Thibaut and Izacard, Gautier and Martinet, Xavier and Lachaux, Marie-Anne and Lacroix, Timoth{\'e}e and Rozi{\`e}re, Baptiste and Goyal, Naman and Hambro, Eric and Azhar, Faisal and Rodriguez, Aurelien and Joulin, Armand and Grave, Edouard and Lample, Guillaume}, journal={arXiv preprint arXiv:2302.13971}, year={2023} } @misc{alpaca, author = {Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto }, title = {Stanford Alpaca: An Instruction-following LLaMA model}, year = {2023}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/tatsu-lab/stanford_alpaca}}, } @misc{instructionwild, author = {Fuzhao Xue and Zangwei Zheng and Yang You }, title = {Instruction in the Wild: A User-based Instruction Dataset}, year = {2023}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/XueFuzhao/InstructionWild}}, } ``` ## Licenses Coati is licensed under the [Apache 2.0 License](LICENSE).