InternLM/internlm/model
Sun Peng ef851d16c6
Feat/optimizer (#194)
* feat(optimier.py): reduce memory footprint and avoid _check_overflow call

* feat(optimier.py): reduce memory footprint and avoid _check_overflow call

* feat(optimizer.py): overlap compute norm with allreduce

* update var and function name

* update function compute norm (#197)

Co-authored-by: ChenQiaoling00 <qiaoling_chen@u.nus.edu>

* feat(optimizer/hybrid_zero_optim.py): overlap gradients last bucket allreduce and compute norm (#196)

* support gradients allreduce and compute norm overlap

* fix para set error

* remove timer cal_norm for testing

* feat(optimizer/hybrid_zero_optim.py): support group global norm

* format(lint): fix lint error

* feat(optimizer/store.py): update code based on comment

---------

Co-authored-by: ChenQiaoling00 <qiaoling_chen@u.nus.edu>
Co-authored-by: huangting4201 <1538303371@qq.com>
2023-08-15 18:55:10 +08:00
..
__init__.py feat(model/metrics.py): support calculating accuracy and perplexity m… (#91) 2023-07-26 16:22:10 +08:00
embedding.py feat(monitor): support monitor and alert (#175) 2023-08-08 11:18:15 +08:00
linear.py feat(monitor): support monitor and alert (#175) 2023-08-08 11:18:15 +08:00
loss.py initial commit 2023-07-06 12:55:23 +08:00
metrics.py feat(*): support not-flash-attn for pp and no-pp (#145) 2023-07-28 16:13:04 +08:00
modeling_internlm.py feat(monitor): support monitor and alert (#175) 2023-08-08 11:18:15 +08:00
multi_head_attention.py feat(*): support sequence_parallel (#180) 2023-08-07 16:42:52 +08:00
norm.py Feat/optimizer (#194) 2023-08-15 18:55:10 +08:00
utils.py Feat/optimizer (#194) 2023-08-15 18:55:10 +08:00