ColossalAI/applications/ChatGPT/chatgpt/trainer
BlueRum 7548ca5a54
[chatgpt]Reward Model Training Process update (#3133)
* add normalize function to value_head in bloom rm

* add normalization to value_function in gpt_rm

* add normalization to value_head of opt_rm

* add Anthropic/hh-rlhf dataset

* Update __init__.py

* Add LogExpLoss in RM training

* Update __init__.py

* update rm trainer to use acc as target

* update example/train_rm

* Update train_rm.sh

* code style

* Update README.md

* Update README.md

* add rm test to ci

* fix tokenier

* fix typo

* change batchsize to avoid oom in ci

* Update test_ci.sh
2023-03-20 09:59:06 +08:00
..
callbacks [chatgpt] Add saving ckpt callback for PPO (#2880) 2023-03-07 10:13:25 +08:00
strategies [chatgpt] fix lora save bug (#3099) 2023-03-10 17:58:10 +08:00
__init__.py [app] add chatgpt application (#2698) 2023-02-14 22:17:25 +08:00
base.py [chatgpt] making experience support dp (#2971) 2023-03-03 15:51:19 +08:00
ppo.py [chatgpt] fix trainer generate kwargs (#3166) 2023-03-17 17:31:22 +08:00
rm.py [chatgpt]Reward Model Training Process update (#3133) 2023-03-20 09:59:06 +08:00
utils.py [app] add chatgpt application (#2698) 2023-02-14 22:17:25 +08:00