ColossalAI/applications/ChatGPT/chatgpt/models/opt
BlueRum 7548ca5a54
[chatgpt]Reward Model Training Process update (#3133)
* add normalize function to value_head in bloom rm

* add normalization to value_function in gpt_rm

* add normalization to value_head of opt_rm

* add Anthropic/hh-rlhf dataset

* Update __init__.py

* Add LogExpLoss in RM training

* Update __init__.py

* update rm trainer to use acc as target

* update example/train_rm

* Update train_rm.sh

* code style

* Update README.md

* Update README.md

* add rm test to ci

* fix tokenier

* fix typo

* change batchsize to avoid oom in ci

* Update test_ci.sh
2023-03-20 09:59:06 +08:00
..
__init__.py change nn to models (#3032) 2023-03-07 16:34:22 +08:00
opt_actor.py change nn to models (#3032) 2023-03-07 16:34:22 +08:00
opt_critic.py [chatgpt] fix lora support for gpt (#3113) 2023-03-13 10:37:41 +08:00
opt_rm.py [chatgpt]Reward Model Training Process update (#3133) 2023-03-20 09:59:06 +08:00