Wenhao Chen
|
b03d64d010
|
[chat] refactor trainer class (#4080)
* to: add SLTrainer
* refactor: refactor RMTrainer and SFTTrainer
* fix: fix init file
* feat: remove on_learn_epoch fn as not used
* fix: align with modified gemini arguments
* to: add OnPolicyTrainer
* revert: add _on_learn_epoch fn
* refactor: refactor PPOTrainer
* style: rename PPOTrainer argument
* fix: align with modified PPO arguments
* test: align with modified train_prompts arguments
* chore: modify train_prompts
* docs: align with modified arguments
* fix: remove unnecessary output
* fix: move dataloader to fit fn of SLTrainer
* fix: move dataloader to fit fn of OnPolicyTrainer
* fix: modify usage of prompt and pretrain dataloader
|
2023-06-29 10:48:09 +08:00 |