Commit Graph

11 Commits (55811708902df7d1c3e2557bb7c1117fe1ba07e5)

Author SHA1 Message Date
Jiarui Fang ae7d3f4927
[refactor] move process group from _DistSpec to ColoTensor. (#1203) 2022-07-06 16:15:16 +08:00
Jiarui Fang 060b917daf
[refactor] remove gpc dependency in colotensor's _ops (#1189) 2022-07-04 18:54:37 +08:00
Jiarui Fang 372f791444
[refactor] move chunk and chunkmgr to directory gemini (#1182) 2022-06-29 13:31:02 +08:00
Jiarui Fang 4b9bba8116
[ColoTensor] rename APIs and add output_replicate to ComputeSpec (#1168) 2022-06-24 13:08:54 +08:00
Jiarui Fang f4ef224358
[Tensor] remove ParallelAction, use ComputeSpec instread (#1166) 2022-06-23 17:34:59 +08:00
ver217 8106d7b8c7
[ddp] refactor ColoDDP and ZeroDDP (#1146)
* ColoDDP supports overwriting default process group

* rename ColoDDPV2 to ZeroDDP

* add docstr for ZeroDDP

* polish docstr
2022-06-21 16:35:23 +08:00
ver217 789cad301b
[hotfix] fix param op hook (#1131)
* fix param op hook

* update zero tp test

* fix bugs
2022-06-17 16:12:05 +08:00
ver217 7d14b473f0
[gemini] gemini mgr supports "cpu" placement policy (#1118)
* update gemini mgr

* update chunk

* add docstr

* polish placement policy

* update test chunk

* update test zero

* polish unit test

* remove useless unit test
2022-06-15 15:05:19 +08:00
ver217 1f894e033f
[gemini] zero supports gemini (#1093)
* add placement policy

* add gemini mgr

* update mem stats collector

* update zero

* update zero optim

* fix bugs

* zero optim monitor os

* polish unit test

* polish unit test

* add assert
2022-06-10 14:48:28 +08:00
Jiarui Fang 49832b2344
[refactory] add nn.parallel module (#1068) 2022-06-06 15:34:41 +08:00
ver217 51b9a49655
[zero] add zero optimizer for ColoTensor (#1046)
* add zero optimizer

* torch ok

* unit test ok

* polish code

* fix bugs

* polish unit test

* polish zero optim

* polish colo ddp v2

* refactor folder structure

* add comment

* polish unit test

* polish zero optim

* polish unit test
2022-06-02 12:13:15 +08:00