ver217
|
0c51ff2c13
|
[hotfix] ZeroDDP use new process group (#1333)
* process group supports getting ranks in group
* chunk mgr receives a process group
* update unit test
* fix unit tests
|
2022-07-18 14:14:52 +08:00 |
HELSON
|
d49708ae43
|
[hotfix] fix ddp for unit test test_gpt2 (#1326)
|
2022-07-15 18:19:52 +08:00 |
Jiarui Fang
|
9f10524313
|
[Optimizer] polish the init method of ColoOptimizer (#1310)
|
2022-07-14 16:37:33 +08:00 |
Jiarui Fang
|
1aad903c15
|
[tensor] redistribute among different process groups (#1247)
* make it faster
* [tensor] rename convert_to_dist -> redistribute
* [tensor] ShardSpec and ReplicaSpec
* [tensor] redistribute among diff pgs
* polish code
|
2022-07-12 10:24:05 +08:00 |
Jiarui Fang
|
20da6e48c8
|
[checkpoint] save sharded optimizer states (#1237)
|
2022-07-08 16:33:13 +08:00 |
HELSON
|
f071b500b6
|
[polish] polish __repr__ for ColoTensor, DistSpec, ProcessGroup (#1235)
|
2022-07-08 13:25:57 +08:00 |
Jiarui Fang
|
a98319f023
|
[tensor] torch function return colotensor (#1229)
|
2022-07-07 18:09:18 +08:00 |
HELSON
|
280a81243d
|
[tensor] improve robustness of class 'ProcessGroup' (#1223)
|
2022-07-07 13:55:24 +08:00 |
Jiarui Fang
|
15d988f954
|
[tensor] sharded global process group (#1219)
|
2022-07-07 13:38:48 +08:00 |
Jiarui Fang
|
ae7d3f4927
|
[refactor] move process group from _DistSpec to ColoTensor. (#1203)
|
2022-07-06 16:15:16 +08:00 |
Jiarui Fang
|
b5f25eb32a
|
[Tensor] add cpu group to ddp (#1200)
|
2022-07-05 14:58:28 +08:00 |
Jiarui Fang
|
060b917daf
|
[refactor] remove gpc dependency in colotensor's _ops (#1189)
|
2022-07-04 18:54:37 +08:00 |
Jiarui Fang
|
c463f8adf9
|
[tensor] remove gpc in tensor tests (#1186)
|
2022-06-29 14:08:40 +08:00 |
Jiarui Fang
|
7487215b95
|
[ColoTensor] add independent process group (#1179)
|
2022-06-29 10:03:09 +08:00 |