History

BlueRum 23cd5e2ccf [chatgpt]update ci (#3087 ) * [chatgpt]update ci * Update test_ci.sh * Update test_ci.sh * Update test_ci.sh * test * Update train_prompts.py * Update train_dummy.py * add save_path * polish * add save path * polish * add save path * polish * delete bloom-560m test delete bloom-560m test because of oom * add ddp test		2023-03-14 11:01:17 +08:00
..
README.md	[format] applied code formatting on changed files in pull request 3025 (#3026 )	2023-03-07 12:55:17 +08:00
inference.py	change nn to models (#3032 )	2023-03-07 16:34:22 +08:00
requirements.txt	[app] add chatgpt application (#2698 )	2023-02-14 22:17:25 +08:00
test_ci.sh	[chatgpt]update ci (#3087 )	2023-03-14 11:01:17 +08:00
train_dummy.py	[chatgpt]update ci (#3087 )	2023-03-14 11:01:17 +08:00
train_dummy.sh	[chatgpt] support colossalai strategy to train rm (#2742 )	2023-02-16 11:24:07 +08:00
train_prompts.py	[chatgpt]update ci (#3087 )	2023-03-14 11:01:17 +08:00
train_prompts.sh	[chatgpt] support colossalai strategy to train rm (#2742 )	2023-02-16 11:24:07 +08:00
train_reward_model.py	change nn to models (#3032 )	2023-03-07 16:34:22 +08:00
train_rm.sh	[chatgpt]support opt & gpt for rm training (#2876 )	2023-02-22 16:58:11 +08:00

README.md

Examples

Install requirements

pip install -r requirements.txt

Train the reward model (Stage 2)

We use rm-static as dataset to train our reward model. It is a dataset of chosen & rejected response of the same prompt.

You can download the dataset from huggingface automatically.

Use these code to train your reward model.

# Naive reward model training
python train_reward_model.py --pretrain <your model path> --model <your model type> --strategy naive
# use colossalai_zero2
torchrun --standalone --nproc_per_node=2 train_reward_model.py --pretrain <your model path> --model <your model type> --strategy colossalai_zero2

Train with dummy prompt data (Stage 3)

This script supports 3 strategies:

naive
ddp
colossalai

It uses random generated prompt data.

Naive strategy only support single GPU training:

python train_dummy.py --strategy naive
# display cli help
python train_dummy.py -h

DDP strategy and ColossalAI strategy support multi GPUs training:

# run DDP on 2 GPUs
torchrun --standalone --nproc_per_node=2 train_dummy.py --strategy ddp
# run ColossalAI on 2 GPUs
torchrun --standalone --nproc_per_node=2 train_dummy.py --strategy colossalai_zero2

Train with real prompt data (Stage 3)

We use awesome-chatgpt-prompts as example dataset. It is a small dataset with hundreds of prompts.

You should download prompts.csv first.

This script also supports 3 strategies.

# display cli help
python train_dummy.py -h
# run naive on 1 GPU
python train_prompts.py prompts.csv --strategy naive
# run DDP on 2 GPUs
torchrun --standalone --nproc_per_node=2 train_prompts.py prompts.csv --strategy ddp
# run ColossalAI on 2 GPUs
torchrun --standalone --nproc_per_node=2 train_prompts.py prompts.csv --strategy colossalai_zero2

Inference example(After Stage3)

We support naive inference demo after training.

# inference, using pretrain path to configure model
python inference.py --model_path <your actor model path> --model <your model type> --pretrain <your pretrain model name/path>
# example
python inference.py --model_path ./actor_checkpoint_prompts.pt --pretrain bigscience/bloom-560m --model bloom

data

Support Model

GPT

GPT2-S (s)
GPT2-M (m)
GPT2-L (l)
GPT2-XL (xl)
GPT2-4B (4b)
GPT2-6B (6b)
GPT2-8B (8b)
GPT2-10B (10b)
GPT2-12B (12b)
GPT2-15B (15b)
GPT2-18B (18b)
GPT2-20B (20b)
GPT2-24B (24b)
GPT2-28B (28b)
GPT2-32B (32b)
GPT2-36B (36b)
GPT2-40B (40b)
GPT3 (175b)