ColossalAI/applications/ChatGPT/chatgpt/models
BlueRum 7548ca5a54
[chatgpt]Reward Model Training Process update (#3133)
* add normalize function to value_head in bloom rm

* add normalization to value_function in gpt_rm

* add normalization to value_head of opt_rm

* add Anthropic/hh-rlhf dataset

* Update __init__.py

* Add LogExpLoss in RM training

* Update __init__.py

* update rm trainer to use acc as target

* update example/train_rm

* Update train_rm.sh

* code style

* Update README.md

* Update README.md

* add rm test to ci

* fix tokenier

* fix typo

* change batchsize to avoid oom in ci

* Update test_ci.sh
2023-03-20 09:59:06 +08:00
..
base [chatgpt]add flag of action mask in critic(#3086) 2023-03-10 14:40:14 +08:00
bloom [chatgpt]Reward Model Training Process update (#3133) 2023-03-20 09:59:06 +08:00
gpt [chatgpt]Reward Model Training Process update (#3133) 2023-03-20 09:59:06 +08:00
opt [chatgpt]Reward Model Training Process update (#3133) 2023-03-20 09:59:06 +08:00
__init__.py [chatgpt]Reward Model Training Process update (#3133) 2023-03-20 09:59:06 +08:00
generation.py [chatgpt] fix ppo training hanging problem with gemini (#3162) 2023-03-17 15:41:47 +08:00
generation_utils.py change nn to models (#3032) 2023-03-07 16:34:22 +08:00
lora.py [chatgpt] fix lora save bug (#3099) 2023-03-10 17:58:10 +08:00
loss.py [chatgpt]Reward Model Training Process update (#3133) 2023-03-20 09:59:06 +08:00
utils.py change nn to models (#3032) 2023-03-07 16:34:22 +08:00