Commit Graph

4 Commits (0b1c6c67040238cf5d1649df3c717c4481b320c1)

Author SHA1 Message Date
mwiacx 0b1c6c6704 add pipeline memory balance 2023-09-13 16:55:42 +08:00
huangting4201 db13bc46bc
fix(ci): fix ci train error (#199) 2023-08-15 20:09:54 +08:00
Sun Peng ef851d16c6
Feat/optimizer (#194)
* feat(optimier.py): reduce memory footprint and avoid _check_overflow call

* feat(optimier.py): reduce memory footprint and avoid _check_overflow call

* feat(optimizer.py): overlap compute norm with allreduce

* update var and function name

* update function compute norm (#197)

Co-authored-by: ChenQiaoling00 <qiaoling_chen@u.nus.edu>

* feat(optimizer/hybrid_zero_optim.py): overlap gradients last bucket allreduce and compute norm (#196)

* support gradients allreduce and compute norm overlap

* fix para set error

* remove timer cal_norm for testing

* feat(optimizer/hybrid_zero_optim.py): support group global norm

* format(lint): fix lint error

* feat(optimizer/store.py): update code based on comment

---------

Co-authored-by: ChenQiaoling00 <qiaoling_chen@u.nus.edu>
Co-authored-by: huangting4201 <1538303371@qq.com>
2023-08-15 18:55:10 +08:00
Sun Peng fa7337b37b initial commit 2023-07-06 12:55:23 +08:00