mirror of https://github.com/hpcaitech/ColossalAI
[ChatGPT] fix README (#2966)
* Update README.md * fix README * Update README.md * Update README.md --------- Co-authored-by: fastalgo <youyang@cs.berkeley.edu> Co-authored-by: BlueRum <70618399+ht-zhou@users.noreply.github.com>pull/2969/head
parent
b0a8766381
commit
bbf9c827c3
|
@ -1,5 +1,13 @@
|
||||||
# RLHF - Colossal-AI
|
# RLHF - Colossal-AI
|
||||||
|
|
||||||
|
## Table of Contents
|
||||||
|
|
||||||
|
- [What is RLHF - Colossal-AI?](#intro)
|
||||||
|
- [How to Install?](#install)
|
||||||
|
- [The Plan](#the-plan)
|
||||||
|
- [How can you partcipate in open source?](#invitation-to-open-source-contribution)
|
||||||
|
---
|
||||||
|
## Intro
|
||||||
Implementation of RLHF (Reinforcement Learning with Human Feedback) powered by Colossal-AI. It supports distributed training and offloading, which can fit extremly large models. More details can be found in the [blog](https://www.hpc-ai.tech/blog/colossal-ai-chatgpt).
|
Implementation of RLHF (Reinforcement Learning with Human Feedback) powered by Colossal-AI. It supports distributed training and offloading, which can fit extremly large models. More details can be found in the [blog](https://www.hpc-ai.tech/blog/colossal-ai-chatgpt).
|
||||||
|
|
||||||
<p align="center">
|
<p align="center">
|
||||||
|
@ -20,7 +28,6 @@ Implementation of RLHF (Reinforcement Learning with Human Feedback) powered by C
|
||||||
pip install .
|
pip install .
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
## Usage
|
## Usage
|
||||||
|
|
||||||
The main entrypoint is `Trainer`. We only support PPO trainer now. We support many training strategies:
|
The main entrypoint is `Trainer`. We only support PPO trainer now. We support many training strategies:
|
||||||
|
@ -128,14 +135,24 @@ To load optimizer checkpoint:
|
||||||
strategy.load_optimizer(actor_optim, 'actor_optim_checkpoint.pt')
|
strategy.load_optimizer(actor_optim, 'actor_optim_checkpoint.pt')
|
||||||
```
|
```
|
||||||
|
|
||||||
## Todo
|
## The Plan
|
||||||
|
|
||||||
- [x] implement PPO fine-tuning
|
- [x] implement PPO fine-tuning
|
||||||
- [x] implement training reward model
|
- [x] implement training reward model
|
||||||
- [x] support LoRA
|
- [x] support LoRA
|
||||||
|
- [x] support inference
|
||||||
|
- [ ] open source the reward model weight
|
||||||
|
- [ ] support llama from [facebook](https://github.com/facebookresearch/llama)
|
||||||
|
- [ ] support BoN(best of N sample)
|
||||||
- [ ] implement PPO-ptx fine-tuning
|
- [ ] implement PPO-ptx fine-tuning
|
||||||
- [ ] integrate with Ray
|
- [ ] integrate with Ray
|
||||||
- [ ] support more RL paradigms, like Implicit Language Q-Learning (ILQL)
|
- [ ] support more RL paradigms, like Implicit Language Q-Learning (ILQL),
|
||||||
|
- [ ] support chain of throught by [langchain](https://github.com/hwchase17/langchain)
|
||||||
|
|
||||||
|
### Real-time progress
|
||||||
|
You will find our progress in github project broad
|
||||||
|
|
||||||
|
[Open ChatGPT](https://github.com/orgs/hpcaitech/projects/17/views/1)
|
||||||
|
|
||||||
## Invitation to open-source contribution
|
## Invitation to open-source contribution
|
||||||
Referring to the successful attempts of [BLOOM](https://bigscience.huggingface.co/) and [Stable Diffusion](https://en.wikipedia.org/wiki/Stable_Diffusion), any and all developers and partners with computing powers, datasets, models are welcome to join and build an ecosystem with Colossal-AI, making efforts towards the era of big AI models from the starting point of replicating ChatGPT!
|
Referring to the successful attempts of [BLOOM](https://bigscience.huggingface.co/) and [Stable Diffusion](https://en.wikipedia.org/wiki/Stable_Diffusion), any and all developers and partners with computing powers, datasets, models are welcome to join and build an ecosystem with Colossal-AI, making efforts towards the era of big AI models from the starting point of replicating ChatGPT!
|
||||||
|
|
|
@ -73,14 +73,21 @@ We support naive inference demo after training.
|
||||||
python inference.py --pretrain <your actor model path> --model <your model type>
|
python inference.py --pretrain <your actor model path> --model <your model type>
|
||||||
```
|
```
|
||||||
|
|
||||||
|
#### data
|
||||||
|
- [x] [rm-static](https://huggingface.co/datasets/Dahoas/rm-static)
|
||||||
|
- [x] [hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf)
|
||||||
|
- [ ] [openai/summarize_from_feedback](https://huggingface.co/datasets/openai/summarize_from_feedback)
|
||||||
|
- [ ] [openai/webgpt_comparisons](https://huggingface.co/datasets/openai/webgpt_comparisons)
|
||||||
|
- [ ] [Dahoas/instruct-synthetic-prompt-responses](https://huggingface.co/datasets/Dahoas/instruct-synthetic-prompt-responses)
|
||||||
|
|
||||||
## Support Model
|
## Support Model
|
||||||
|
|
||||||
### GPT
|
### GPT
|
||||||
- [ ] GPT2-S (s)
|
- [x] GPT2-S (s)
|
||||||
- [ ] GPT2-M (m)
|
- [x] GPT2-M (m)
|
||||||
- [ ] GPT2-L (l)
|
- [x] GPT2-L (l)
|
||||||
- [ ] GPT2-XL (xl)
|
- [ ] GPT2-XL (xl)
|
||||||
- [ ] GPT2-4B (4b)
|
- [x] GPT2-4B (4b)
|
||||||
- [ ] GPT2-6B (6b)
|
- [ ] GPT2-6B (6b)
|
||||||
- [ ] GPT2-8B (8b)
|
- [ ] GPT2-8B (8b)
|
||||||
- [ ] GPT2-10B (10b)
|
- [ ] GPT2-10B (10b)
|
||||||
|
@ -99,7 +106,7 @@ python inference.py --pretrain <your actor model path> --model <your model type>
|
||||||
- [x] [BLOOM-560m](https://huggingface.co/bigscience/bloom-560m)
|
- [x] [BLOOM-560m](https://huggingface.co/bigscience/bloom-560m)
|
||||||
- [x] [BLOOM-1b1](https://huggingface.co/bigscience/bloom-1b1)
|
- [x] [BLOOM-1b1](https://huggingface.co/bigscience/bloom-1b1)
|
||||||
- [x] [BLOOM-3b](https://huggingface.co/bigscience/bloom-3b)
|
- [x] [BLOOM-3b](https://huggingface.co/bigscience/bloom-3b)
|
||||||
- [x] [BLOOM-7b](https://huggingface.co/bigscience/bloomz-7b1)
|
- [x] [BLOOM-7b](https://huggingface.co/bigscience/bloom-7b1)
|
||||||
- [ ] BLOOM-175b
|
- [ ] BLOOM-175b
|
||||||
|
|
||||||
### OPT
|
### OPT
|
||||||
|
|
|
@ -4,3 +4,4 @@ datasets
|
||||||
loralib
|
loralib
|
||||||
colossalai>=0.2.4
|
colossalai>=0.2.4
|
||||||
torch
|
torch
|
||||||
|
langchain
|
||||||
|
|
Loading…
Reference in New Issue