ColossalAI

History

BlueRum 7548ca5a54 [chatgpt]Reward Model Training Process update (#3133 ) * add normalize function to value_head in bloom rm * add normalization to value_function in gpt_rm * add normalization to value_head of opt_rm * add Anthropic/hh-rlhf dataset * Update __init__.py * Add LogExpLoss in RM training * Update __init__.py * update rm trainer to use acc as target * update example/train_rm * Update train_rm.sh * code style * Update README.md * Update README.md * add rm test to ci * fix tokenier * fix typo * change batchsize to avoid oom in ci * Update test_ci.sh		2023-03-20 09:59:06 +08:00
..
base	[chatgpt]add flag of action mask in critic(#3086 )	2023-03-10 14:40:14 +08:00
bloom	[chatgpt]Reward Model Training Process update (#3133 )	2023-03-20 09:59:06 +08:00
gpt	[chatgpt]Reward Model Training Process update (#3133 )	2023-03-20 09:59:06 +08:00
opt	[chatgpt]Reward Model Training Process update (#3133 )	2023-03-20 09:59:06 +08:00
__init__.py	[chatgpt]Reward Model Training Process update (#3133 )	2023-03-20 09:59:06 +08:00
generation.py	[chatgpt] fix ppo training hanging problem with gemini (#3162 )	2023-03-17 15:41:47 +08:00
generation_utils.py	change nn to models (#3032 )	2023-03-07 16:34:22 +08:00
lora.py	[chatgpt] fix lora save bug (#3099 )	2023-03-10 17:58:10 +08:00
loss.py	[chatgpt]Reward Model Training Process update (#3133 )	2023-03-20 09:59:06 +08:00
utils.py	change nn to models (#3032 )	2023-03-07 16:34:22 +08:00