ColossalAI

History

csric e355144375 [chatgpt] Detached PPO Training (#3195 ) * run the base * working on dist ppo * sync * detached trainer * update detached trainer. no maker update function * facing init problem * 1 maker 1 trainer detached run. but no model update * facing cuda problem * fix save functions * verified maker update * nothing * add ignore * analyize loss issue * remove some debug codes * facing 2m1t stuck issue * 2m1t verified * do not use torchrun * working on 2m2t * working on 2m2t * initialize strategy in ray actor env * facing actor's init order issue * facing ddp model update issue (need unwarp ddp) * unwrap ddp actor * checking 1m2t stuck problem * nothing * set timeout for trainer choosing. It solves the stuck problem! * delete some debug output * rename to sync with upstream * rename to sync with upstream * coati rename * nothing * I am going to detach the replaybuffer from trainer and make it a Ray Actor. Two benefits: 1. support TP trainer. 2. asynchronized buffer operations * experience_maker_holder performs target-revolving _send_experience() instead of length comparison. * move code to ray subfolder * working on pipeline inference * apply comments --------- Co-authored-by: csric <richcsr256@gmail.com>		2023-04-17 14:46:50 +08:00
..
example	[chatgpt] Detached PPO Training (#3195 )	2023-04-17 14:46:50 +08:00
src	[chatgpt] Detached PPO Training (#3195 )	2023-04-17 14:46:50 +08:00
__init__.py	[chatgpt] Detached PPO Training (#3195 )	2023-04-17 14:46:50 +08:00