ColossalAI/applications/ChatGPT/chatgpt
BlueRum 7548ca5a54
[chatgpt]Reward Model Training Process update (#3133)
* add normalize function to value_head in bloom rm

* add normalization to value_function in gpt_rm

* add normalization to value_head of opt_rm

* add Anthropic/hh-rlhf dataset

* Update __init__.py

* Add LogExpLoss in RM training

* Update __init__.py

* update rm trainer to use acc as target

* update example/train_rm

* Update train_rm.sh

* code style

* Update README.md

* Update README.md

* add rm test to ci

* fix tokenier

* fix typo

* change batchsize to avoid oom in ci

* Update test_ci.sh
2023-03-20 09:59:06 +08:00
..
dataset [chatgpt]Reward Model Training Process update (#3133) 2023-03-20 09:59:06 +08:00
experience_maker change nn to models (#3032) 2023-03-07 16:34:22 +08:00
models [chatgpt]Reward Model Training Process update (#3133) 2023-03-20 09:59:06 +08:00
replay_buffer [app] add chatgpt application (#2698) 2023-02-14 22:17:25 +08:00
trainer [chatgpt]Reward Model Training Process update (#3133) 2023-03-20 09:59:06 +08:00
__init__.py [app] add chatgpt application (#2698) 2023-02-14 22:17:25 +08:00