Commit Graph

6 Commits (ff0fa7659f148bb45e3086e4e3b1abecdfb3048a)

Author SHA1 Message Date
huangting4201 ff0fa7659f
feat(monitor): support monitor and alert (#175)
* feat(monitor): support monitor and alert

* feat(monitor.py): fix demo error

* feat(monitor.py): move cmd monitor args to config file

* feat(hybrid_zero_optim.py): if overflow occurs send alert msg

* feat(monitor.py): remove alert msg filter

* feat(monitor.py): optimize class MonitorTracker

* feat(monitor.py): optimize code

* feat(monitor.py): optimize code

* feat(monitor.py): optimize code

* feat(monitor.py): optimize code

* feat(train.py): update print to log

* style(ci): fix lint error

* fix(utils/evaluation.py): remove useless code

* fix(model/modeling_internlm.py): fix lint error

---------

Co-authored-by: huangting4201 <huangting3@sensetime.com>
2023-08-08 11:18:15 +08:00
ytxiong c219065348
feat(*): support sequence_parallel (#180)
* support sequence_parallel for no pipeline

* sequence_parallel does not support no-flash-attn

* support sequence parallel for pipeline

* add memory profiler

* Update 13B.py

* add memory profiler

* fix evaluation bug

* remove some unnecessary code

* remove some unnecessary code

* Update parallel_context.py

* modify the config

* remove memory profiler

* modify the config

* support selective dropout
2023-08-07 16:42:52 +08:00
ytxiong 853becfb6e
feat(*): support fp32 training (#155)
* support float32 training

* fix lint

* add adaptation in model/utils.py

* remove some unnecessary code

* fix lint

* feat(optim): add support for fp32 zero

* Revert "Merge pull request #2 from SolenoidWGT/fp32_zero"

This reverts commit 53fc50b0e5, reversing
changes made to 40f24d0a73.

revert commit

* merge develop

* Update utils.py

* support fp32 in zero optimizer

* modify the dtype

---------

Co-authored-by: wangguoteng.p <wangguoteng925@qq.com>
2023-08-04 16:05:30 +08:00
ytxiong d67be17f96
refactor(*): refactor the code with no-apex (#170)
* support no-apex

* add default for use_apex

* fix lint

* modify the RMSNormTorch

* remove some comments

* remove use_apex parameter

* remove some unnecessary code

* optimize the code including import

* remove the import RMSNorm

* remove warnings
2023-08-03 11:24:12 +08:00
ytxiong 1c397f523f
feat(*): support no apex (#166)
* support no-apex

* add default for use_apex

* fix lint

* modify the RMSNormTorch

* remove some comments

* remove use_apex parameter

* remove some unnecessary code
2023-08-02 20:32:38 +08:00
Sun Peng fa7337b37b initial commit 2023-07-06 12:55:23 +08:00