Wenhao Chen
|
b03d64d010
|
[chat] refactor trainer class (#4080)
* to: add SLTrainer
* refactor: refactor RMTrainer and SFTTrainer
* fix: fix init file
* feat: remove on_learn_epoch fn as not used
* fix: align with modified gemini arguments
* to: add OnPolicyTrainer
* revert: add _on_learn_epoch fn
* refactor: refactor PPOTrainer
* style: rename PPOTrainer argument
* fix: align with modified PPO arguments
* test: align with modified train_prompts arguments
* chore: modify train_prompts
* docs: align with modified arguments
* fix: remove unnecessary output
* fix: move dataloader to fit fn of SLTrainer
* fix: move dataloader to fit fn of OnPolicyTrainer
* fix: modify usage of prompt and pretrain dataloader
|
2023-06-29 10:48:09 +08:00 |
Hongxin Liu
|
2a951955ad
|
[chat] refactor trainer (#3648)
* [chat] ppo trainer remove useless args
* [chat] update examples
* [chat] update benchmark
* [chat] update examples
* [chat] fix sft training with wandb
* [chat] polish docstr
|
2023-04-26 18:11:49 +08:00 |
Yuanchen
|
1ec0d386a9
|
reconstruct chat trainer and fix training script (#3588)
Co-authored-by: Yuanchen Xu <yuanchen.xu00@gmail.com>
|
2023-04-18 16:44:03 +08:00 |
Fazzie-Maqianli
|
b0ce5a1032
|
[Coati] first commit (#3283)
|
2023-03-28 20:25:36 +08:00 |