diff --git a/applications/ChatGPT/README.md b/applications/ChatGPT/README.md index dbd5eb770..d26206144 100644 --- a/applications/ChatGPT/README.md +++ b/applications/ChatGPT/README.md @@ -1,5 +1,13 @@ # RLHF - Colossal-AI +## Table of Contents + +- [What is RLHF - Colossal-AI?](#intro) +- [How to Install?](#install) +- [The Plan](#the-plan) +- [How can you partcipate in open source?](#invitation-to-open-source-contribution) +--- +## Intro Implementation of RLHF (Reinforcement Learning with Human Feedback) powered by Colossal-AI. It supports distributed training and offloading, which can fit extremly large models. More details can be found in the [blog](https://www.hpc-ai.tech/blog/colossal-ai-chatgpt).

@@ -20,7 +28,6 @@ Implementation of RLHF (Reinforcement Learning with Human Feedback) powered by C pip install . ``` - ## Usage The main entrypoint is `Trainer`. We only support PPO trainer now. We support many training strategies: @@ -128,14 +135,24 @@ To load optimizer checkpoint: strategy.load_optimizer(actor_optim, 'actor_optim_checkpoint.pt') ``` -## Todo +## The Plan - [x] implement PPO fine-tuning - [x] implement training reward model - [x] support LoRA +- [x] support inference +- [ ] open source the reward model weight +- [ ] support llama from [facebook](https://github.com/facebookresearch/llama) +- [ ] support BoN(best of N sample) - [ ] implement PPO-ptx fine-tuning - [ ] integrate with Ray -- [ ] support more RL paradigms, like Implicit Language Q-Learning (ILQL) +- [ ] support more RL paradigms, like Implicit Language Q-Learning (ILQL), +- [ ] support chain of throught by [langchain](https://github.com/hwchase17/langchain) + +### Real-time progress +You will find our progress in github project broad + +[Open ChatGPT](https://github.com/orgs/hpcaitech/projects/17/views/1) ## Invitation to open-source contribution Referring to the successful attempts of [BLOOM](https://bigscience.huggingface.co/) and [Stable Diffusion](https://en.wikipedia.org/wiki/Stable_Diffusion), any and all developers and partners with computing powers, datasets, models are welcome to join and build an ecosystem with Colossal-AI, making efforts towards the era of big AI models from the starting point of replicating ChatGPT! diff --git a/applications/ChatGPT/examples/README.md b/applications/ChatGPT/examples/README.md index 0a5e504a0..c411c880b 100644 --- a/applications/ChatGPT/examples/README.md +++ b/applications/ChatGPT/examples/README.md @@ -73,14 +73,21 @@ We support naive inference demo after training. python inference.py --pretrain --model ``` +#### data +- [x] [rm-static](https://huggingface.co/datasets/Dahoas/rm-static) +- [x] [hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf) +- [ ] [openai/summarize_from_feedback](https://huggingface.co/datasets/openai/summarize_from_feedback) +- [ ] [openai/webgpt_comparisons](https://huggingface.co/datasets/openai/webgpt_comparisons) +- [ ] [Dahoas/instruct-synthetic-prompt-responses](https://huggingface.co/datasets/Dahoas/instruct-synthetic-prompt-responses) + ## Support Model ### GPT -- [ ] GPT2-S (s) -- [ ] GPT2-M (m) -- [ ] GPT2-L (l) +- [x] GPT2-S (s) +- [x] GPT2-M (m) +- [x] GPT2-L (l) - [ ] GPT2-XL (xl) -- [ ] GPT2-4B (4b) +- [x] GPT2-4B (4b) - [ ] GPT2-6B (6b) - [ ] GPT2-8B (8b) - [ ] GPT2-10B (10b) @@ -99,7 +106,7 @@ python inference.py --pretrain --model - [x] [BLOOM-560m](https://huggingface.co/bigscience/bloom-560m) - [x] [BLOOM-1b1](https://huggingface.co/bigscience/bloom-1b1) - [x] [BLOOM-3b](https://huggingface.co/bigscience/bloom-3b) -- [x] [BLOOM-7b](https://huggingface.co/bigscience/bloomz-7b1) +- [x] [BLOOM-7b](https://huggingface.co/bigscience/bloom-7b1) - [ ] BLOOM-175b ### OPT diff --git a/applications/ChatGPT/requirements.txt b/applications/ChatGPT/requirements.txt index 87f6a52cc..15a960c2c 100644 --- a/applications/ChatGPT/requirements.txt +++ b/applications/ChatGPT/requirements.txt @@ -4,3 +4,4 @@ datasets loralib colossalai>=0.2.4 torch +langchain