ColossalAI

Commit Graph

Author	SHA1	Message	Date
BlueRum	7548ca5a54	[chatgpt]Reward Model Training Process update (#3133 ) * add normalize function to value_head in bloom rm * add normalization to value_function in gpt_rm * add normalization to value_head of opt_rm * add Anthropic/hh-rlhf dataset * Update __init__.py * Add LogExpLoss in RM training * Update __init__.py * update rm trainer to use acc as target * update example/train_rm * Update train_rm.sh * code style * Update README.md * Update README.md * add rm test to ci * fix tokenier * fix typo * change batchsize to avoid oom in ci * Update test_ci.sh	2023-03-20 09:59:06 +08:00
Fazzie-Maqianli	c21b11edce	change nn to models (#3032 )	2023-03-07 16:34:22 +08:00

Author

SHA1

Message

Date

BlueRum

7548ca5a54

[chatgpt]Reward Model Training Process update (#3133 )

* add normalize function to value_head in bloom rm

* add normalization to value_function in gpt_rm

* add normalization to value_head of opt_rm

* add Anthropic/hh-rlhf dataset

* Update __init__.py

* Add LogExpLoss in RM training

* Update __init__.py

* update rm trainer to use acc as target

* update example/train_rm

* Update train_rm.sh

* code style

* Update README.md

* Update README.md

* add rm test to ci

* fix tokenier

* fix typo

* change batchsize to avoid oom in ci

* Update test_ci.sh

2023-03-20 09:59:06 +08:00

Fazzie-Maqianli

c21b11edce

change nn to models (#3032 )

2023-03-07 16:34:22 +08:00

2 Commits (7548ca5a54ed117f03247dcb43ec1dd962ae04e0)