Commit Graph

4 Commits (a45a91bb843cf0b10b8b014a6ef35e695871f91b)

Author SHA1 Message Date
Guoteng a45a91bb84
feat(ckpt): add auto ckpt load and singal quit (#189)
Co-authored-by: wangguoteng.p <wangguoteng925@qq.com>
2023-08-11 17:08:01 +08:00
Guoteng 29d27a6227
feat(ckpt): add async upload and ckpt snapshot (#161)
* use fp16 in instruction (#80)

* delete torch_dtype of README's example code (#100)

* feat(ckpt): support async ckpt upload and ckpt snapshot

---------

Co-authored-by: WRH <12756472+wangruohui@users.noreply.github.com>
Co-authored-by: x54-729 <45304952+x54-729@users.noreply.github.com>
Co-authored-by: wangguoteng.p <wangguoteng925@qq.com>
2023-08-08 13:08:36 +08:00
huangting4201 762ab297ee
feat(core/scheduler): support pipeline parallel (#98)
* feat(utils/writer.py): support tensorboard writer

* feat(utils/writer.py): add class comment

* feat(core): support pipeline parallel

* fix(core): fix demo running error

* feat(solver/optimizer): add pp zero optimizer

* fix(solver/optimizer): fix word spelling error

* feat(core/scheduler): add new dir scheduler in core/

* fix(core): fix ci lint error

* feat(solver/optimizer): merge pp and nopp optimizer

* doc(usage.md): update usage doc

* feat(core/scheduler): support post func

* feat(core/scheduler): add dtype para in pp sche and update func get_tensor_shape

* feat(core/scheduler): add _load_micro_batch in base scheduler

* feat(core/scheduler): support optimizer overlap communication in pp scheduler

* feat(core/scheduler): delete data process func code

* feat(core/trainer): schedule pre processing for all schedule

---------

Co-authored-by: 黄婷 <huangting3@CN0014010744M.local>
Co-authored-by: huangting.p <huangting@sensetime.com>
2023-07-24 20:52:09 +08:00
Sun Peng fa7337b37b initial commit 2023-07-06 12:55:23 +08:00