From bbf9c827c38c56b6f921d248544ad62ceb133d85 Mon Sep 17 00:00:00 2001
From: Fazzie-Maqianli <55798671+Fazziekey@users.noreply.github.com>
Date: Thu, 2 Mar 2023 15:00:05 +0800
Subject: [PATCH] [ChatGPT] fix README (#2966)

* Update README.md

* fix README

* Update README.md

* Update README.md

---------

Co-authored-by: fastalgo <youyang@cs.berkeley.edu>
Co-authored-by: BlueRum <70618399+ht-zhou@users.noreply.github.com>
---
 applications/ChatGPT/README.md          | 23 ++++++++++++++++++++---
 applications/ChatGPT/examples/README.md | 17 ++++++++++++-----
 applications/ChatGPT/requirements.txt   |  1 +
 3 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/applications/ChatGPT/README.md b/applications/ChatGPT/README.md
index dbd5eb770..d26206144 100644
--- a/applications/ChatGPT/README.md
+++ b/applications/ChatGPT/README.md
@@ -1,5 +1,13 @@
 # RLHF - Colossal-AI
 
+## Table of Contents
+
+- [What is RLHF - Colossal-AI?](#intro)
+- [How to Install?](#install)
+- [The Plan](#the-plan)
+- [How can you partcipate in open source?](#invitation-to-open-source-contribution)
+---
+## Intro
 Implementation of RLHF (Reinforcement Learning with Human Feedback) powered by Colossal-AI. It supports distributed training and offloading, which can fit extremly large models. More details can be found in the [blog](https://www.hpc-ai.tech/blog/colossal-ai-chatgpt).
 
 <p align="center">
@@ -20,7 +28,6 @@ Implementation of RLHF (Reinforcement Learning with Human Feedback) powered by C
 pip install .
 ```
 
-
 ## Usage
 
 The main entrypoint is `Trainer`. We only support PPO trainer now. We support many training strategies:
@@ -128,14 +135,24 @@ To load optimizer checkpoint:
 strategy.load_optimizer(actor_optim, 'actor_optim_checkpoint.pt')
 ```
 
-## Todo
+## The Plan
 
 - [x] implement PPO fine-tuning
 - [x] implement training reward model
 - [x] support LoRA
+- [x] support inference
+- [ ] open source the reward model weight
+- [ ] support llama from [facebook](https://github.com/facebookresearch/llama)
+- [ ] support BoN(best of N sample)
 - [ ] implement PPO-ptx fine-tuning
 - [ ] integrate with Ray
-- [ ] support more RL paradigms, like Implicit Language Q-Learning (ILQL)
+- [ ] support more RL paradigms, like Implicit Language Q-Learning (ILQL),
+- [ ] support chain of throught by [langchain](https://github.com/hwchase17/langchain)
+
+### Real-time progress
+You will find our progress in github project broad
+
+[Open ChatGPT](https://github.com/orgs/hpcaitech/projects/17/views/1)
 
 ## Invitation to open-source contribution
 Referring to the successful attempts of [BLOOM](https://bigscience.huggingface.co/) and [Stable Diffusion](https://en.wikipedia.org/wiki/Stable_Diffusion), any and all developers and partners with computing powers, datasets, models are welcome to join and build an ecosystem with Colossal-AI, making efforts towards the era of big AI models from the starting point of replicating ChatGPT!
diff --git a/applications/ChatGPT/examples/README.md b/applications/ChatGPT/examples/README.md
index 0a5e504a0..c411c880b 100644
--- a/applications/ChatGPT/examples/README.md
+++ b/applications/ChatGPT/examples/README.md
@@ -73,14 +73,21 @@ We support naive inference demo after training.
 python inference.py --pretrain <your actor model path> --model <your model type>
 ```
 
+#### data
+- [x] [rm-static](https://huggingface.co/datasets/Dahoas/rm-static)
+- [x] [hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf)
+- [ ] [openai/summarize_from_feedback](https://huggingface.co/datasets/openai/summarize_from_feedback)
+- [ ] [openai/webgpt_comparisons](https://huggingface.co/datasets/openai/webgpt_comparisons)
+- [ ] [Dahoas/instruct-synthetic-prompt-responses](https://huggingface.co/datasets/Dahoas/instruct-synthetic-prompt-responses)
+
 ## Support Model
 
 ### GPT
-- [ ]  GPT2-S (s)
-- [ ]  GPT2-M (m)
-- [ ]  GPT2-L (l)
+- [x]  GPT2-S (s)
+- [x]  GPT2-M (m)
+- [x]  GPT2-L (l)
 - [ ]  GPT2-XL (xl)
-- [ ]  GPT2-4B (4b)
+- [x]  GPT2-4B (4b)
 - [ ]  GPT2-6B (6b)
 - [ ]  GPT2-8B (8b)
 - [ ]  GPT2-10B (10b)
@@ -99,7 +106,7 @@ python inference.py --pretrain <your actor model path> --model <your model type>
 - [x] [BLOOM-560m](https://huggingface.co/bigscience/bloom-560m)
 - [x] [BLOOM-1b1](https://huggingface.co/bigscience/bloom-1b1)
 - [x] [BLOOM-3b](https://huggingface.co/bigscience/bloom-3b)
-- [x] [BLOOM-7b](https://huggingface.co/bigscience/bloomz-7b1)
+- [x] [BLOOM-7b](https://huggingface.co/bigscience/bloom-7b1)
 - [ ] BLOOM-175b
 
 ### OPT
diff --git a/applications/ChatGPT/requirements.txt b/applications/ChatGPT/requirements.txt
index 87f6a52cc..15a960c2c 100644
--- a/applications/ChatGPT/requirements.txt
+++ b/applications/ChatGPT/requirements.txt
@@ -4,3 +4,4 @@ datasets
 loralib
 colossalai>=0.2.4
 torch
+langchain