mirror of https://github.com/hpcaitech/ColossalAI
23cd5e2ccf
* [chatgpt]update ci * Update test_ci.sh * Update test_ci.sh * Update test_ci.sh * test * Update train_prompts.py * Update train_dummy.py * add save_path * polish * add save path * polish * add save path * polish * delete bloom-560m test delete bloom-560m test because of oom * add ddp test |
||
---|---|---|
.. | ||
README.md | ||
inference.py | ||
requirements.txt | ||
test_ci.sh | ||
train_dummy.py | ||
train_dummy.sh | ||
train_prompts.py | ||
train_prompts.sh | ||
train_reward_model.py | ||
train_rm.sh |
README.md
Examples
Install requirements
pip install -r requirements.txt
Train the reward model (Stage 2)
We use rm-static as dataset to train our reward model. It is a dataset of chosen & rejected response of the same prompt.
You can download the dataset from huggingface automatically.
Use these code to train your reward model.
# Naive reward model training
python train_reward_model.py --pretrain <your model path> --model <your model type> --strategy naive
# use colossalai_zero2
torchrun --standalone --nproc_per_node=2 train_reward_model.py --pretrain <your model path> --model <your model type> --strategy colossalai_zero2
Train with dummy prompt data (Stage 3)
This script supports 3 strategies:
- naive
- ddp
- colossalai
It uses random generated prompt data.
Naive strategy only support single GPU training:
python train_dummy.py --strategy naive
# display cli help
python train_dummy.py -h
DDP strategy and ColossalAI strategy support multi GPUs training:
# run DDP on 2 GPUs
torchrun --standalone --nproc_per_node=2 train_dummy.py --strategy ddp
# run ColossalAI on 2 GPUs
torchrun --standalone --nproc_per_node=2 train_dummy.py --strategy colossalai_zero2
Train with real prompt data (Stage 3)
We use awesome-chatgpt-prompts as example dataset. It is a small dataset with hundreds of prompts.
You should download prompts.csv
first.
This script also supports 3 strategies.
# display cli help
python train_dummy.py -h
# run naive on 1 GPU
python train_prompts.py prompts.csv --strategy naive
# run DDP on 2 GPUs
torchrun --standalone --nproc_per_node=2 train_prompts.py prompts.csv --strategy ddp
# run ColossalAI on 2 GPUs
torchrun --standalone --nproc_per_node=2 train_prompts.py prompts.csv --strategy colossalai_zero2
Inference example(After Stage3)
We support naive inference demo after training.
# inference, using pretrain path to configure model
python inference.py --model_path <your actor model path> --model <your model type> --pretrain <your pretrain model name/path>
# example
python inference.py --model_path ./actor_checkpoint_prompts.pt --pretrain bigscience/bloom-560m --model bloom
data
- rm-static
- hh-rlhf
- openai/summarize_from_feedback
- openai/webgpt_comparisons
- Dahoas/instruct-synthetic-prompt-responses
Support Model
GPT
- GPT2-S (s)
- GPT2-M (m)
- GPT2-L (l)
- GPT2-XL (xl)
- GPT2-4B (4b)
- GPT2-6B (6b)
- GPT2-8B (8b)
- GPT2-10B (10b)
- GPT2-12B (12b)
- GPT2-15B (15b)
- GPT2-18B (18b)
- GPT2-20B (20b)
- GPT2-24B (24b)
- GPT2-28B (28b)
- GPT2-32B (32b)
- GPT2-36B (36b)
- GPT2-40B (40b)
- GPT3 (175b)
BLOOM
- BLOOM-560m
- BLOOM-1b1
- BLOOM-3b
- BLOOM-7b
- BLOOM-175b