Commit Graph

5 Commits (1094e0f0d344c04262ee60bef8f2a9bfb660efc4)

Author SHA1 Message Date
アマデウス 297b8baae2
[model checkpoint] add gloo groups for cpu tensor communication (#589) 2022-04-01 10:15:52 +08:00
Liang Bowen ec5086c49c Refactored docstring to google style 2022-03-29 17:17:47 +08:00
Maruyama_Aya e83970e3dc fix format ColossalAI\colossalai\context\process_group_initializer 2022-03-11 15:50:28 +08:00
HELSON 0f8c7f9804
Fixed docstring in colossalai (#171) 2022-01-21 10:44:30 +08:00
ver217 96780e6ee4
Optimize pipeline schedule (#94)
* add pipeline shared module wrapper and update load batch

* added model parallel process group for amp and clip grad (#86)

* added model parallel process group for amp and clip grad

* update amp and clip with model parallel process group

* remove pipeline_prev/next group (#88)

* micro batch offload

* optimize pipeline gpu memory usage

* pipeline can receive tensor shape (#93)

* optimize pipeline gpu memory usage

* fix grad accumulation step counter

* rename classes and functions

Co-authored-by: Frank Lee <somerlee.9@gmail.com>
2021-12-30 15:56:46 +08:00