InternLM

Commit Graph

Author	SHA1	Message	Date
Wenwen Qu	582ee000bd	feat(moe):support zero for expert local dp (#404 ) * support zero for expert local dp * fix above codes: treat optim.zero_world_size and optim.zero_local_rank as list in model_checkpoint.py and test_model_checkpoint.py add overlap and zero check for moe in args_sanity_check(.)	2023-10-09 17:45:26 +08:00
Wenwen Qu	375240e039	feat(moe): add local data parallel support for experts (#376 ) * add local data parallel support for experts * fix model checkpoint for local dp mode of expert * do not set ep size from config	2023-09-28 13:38:02 +08:00
Sun Peng	b7a8af8133	Feat/sync grad use async op (#277 ) * fix/brocast should not in commu stream * fix/brocast should not in commu stream * feat: support allreduce grad using async op * fix bug of async op * use reduceop.avg * use torch flat * delete unused stream * delete unused stream * feat: overap allreduce with memcapy --------- Co-authored-by: yingtongxiong <974106207@qq.com>	2023-09-07 21:51:30 +08:00
huangting4201	db13bc46bc	fix(ci): fix ci train error (#199 )	2023-08-15 20:09:54 +08:00
Sun Peng	ef851d16c6	Feat/optimizer (#194 ) * feat(optimier.py): reduce memory footprint and avoid _check_overflow call * feat(optimier.py): reduce memory footprint and avoid _check_overflow call * feat(optimizer.py): overlap compute norm with allreduce * update var and function name * update function compute norm (#197) Co-authored-by: ChenQiaoling00 <qiaoling_chen@u.nus.edu> * feat(optimizer/hybrid_zero_optim.py): overlap gradients last bucket allreduce and compute norm (#196) * support gradients allreduce and compute norm overlap * fix para set error * remove timer cal_norm for testing * feat(optimizer/hybrid_zero_optim.py): support group global norm * format(lint): fix lint error * feat(optimizer/store.py): update code based on comment --------- Co-authored-by: ChenQiaoling00 <qiaoling_chen@u.nus.edu> Co-authored-by: huangting4201 <1538303371@qq.com>	2023-08-15 18:55:10 +08:00
Sun Peng	fa7337b37b	initial commit	2023-07-06 12:55:23 +08:00

6 Commits (cd53d90db9f26a2e0abd6a4fec255bbedc3ddb53)