Commit Graph

6 Commits (5a0b3d5d9a7934ec96d82186db78f4cda6ad6829)

Author SHA1 Message Date
Qu Wenwen f76fd41325 refactor code 2023-09-19 10:57:20 +08:00
Qu Wenwen 98329da327 add fused precision support for norm 2023-09-18 19:02:07 +08:00
huangting4201 53648dc0e9
feat(train.py): support torch profiler (#201)
* feat(train.py): support torch profiling

* feat(train.py): optimize initialize_llm_profile

* feat(train.py): profiling with tp0 and dp0

* move sequence parallel context manager to evalation func

* fix lint

* move the process for type_ids to load_new_batch

* fix lint

---------

Co-authored-by: yingtongxiong <974106207@qq.com>
2023-08-21 15:23:38 +08:00
huangting4201 ff0fa7659f
feat(monitor): support monitor and alert (#175)
* feat(monitor): support monitor and alert

* feat(monitor.py): fix demo error

* feat(monitor.py): move cmd monitor args to config file

* feat(hybrid_zero_optim.py): if overflow occurs send alert msg

* feat(monitor.py): remove alert msg filter

* feat(monitor.py): optimize class MonitorTracker

* feat(monitor.py): optimize code

* feat(monitor.py): optimize code

* feat(monitor.py): optimize code

* feat(monitor.py): optimize code

* feat(train.py): update print to log

* style(ci): fix lint error

* fix(utils/evaluation.py): remove useless code

* fix(model/modeling_internlm.py): fix lint error

---------

Co-authored-by: huangting4201 <huangting3@sensetime.com>
2023-08-08 11:18:15 +08:00
huangting4201 762ab297ee
feat(core/scheduler): support pipeline parallel (#98)
* feat(utils/writer.py): support tensorboard writer

* feat(utils/writer.py): add class comment

* feat(core): support pipeline parallel

* fix(core): fix demo running error

* feat(solver/optimizer): add pp zero optimizer

* fix(solver/optimizer): fix word spelling error

* feat(core/scheduler): add new dir scheduler in core/

* fix(core): fix ci lint error

* feat(solver/optimizer): merge pp and nopp optimizer

* doc(usage.md): update usage doc

* feat(core/scheduler): support post func

* feat(core/scheduler): add dtype para in pp sche and update func get_tensor_shape

* feat(core/scheduler): add _load_micro_batch in base scheduler

* feat(core/scheduler): support optimizer overlap communication in pp scheduler

* feat(core/scheduler): delete data process func code

* feat(core/trainer): schedule pre processing for all schedule

---------

Co-authored-by: 黄婷 <huangting3@CN0014010744M.local>
Co-authored-by: huangting.p <huangting@sensetime.com>
2023-07-24 20:52:09 +08:00
Sun Peng fa7337b37b initial commit 2023-07-06 12:55:23 +08:00