Commit Graph

19 Commits (1c1fe44305a3d48ee2419389b8a7185bc5f204cf)

Author SHA1 Message Date
Jiarui Fang 1b491ad7de
[doc] update docstring in ProcessGroup (#1468)
2 years ago
Jiarui Fang a1476ea882
[NFC] polish doc style for ColoTensor (#1457)
2 years ago
HELSON c7221cb2d4
[hotfix] adapt ProcessGroup and Optimizer to ColoTensor (#1388)
2 years ago
ver217 828b9e5e0d
[hotfix] fix zero optim save/load state dict (#1381)
2 years ago
HELSON f92c100ddd
[checkpoint] use gather_tensor in checkpoint and update its unit test (#1339)
2 years ago
ver217 0c51ff2c13
[hotfix] ZeroDDP use new process group (#1333)
2 years ago
HELSON d49708ae43
[hotfix] fix ddp for unit test test_gpt2 (#1326)
2 years ago
Jiarui Fang 9f10524313
[Optimizer] polish the init method of ColoOptimizer (#1310)
2 years ago
Jiarui Fang 1aad903c15
[tensor] redistribute among different process groups (#1247)
2 years ago
Jiarui Fang 20da6e48c8
[checkpoint] save sharded optimizer states (#1237)
2 years ago
HELSON f071b500b6
[polish] polish __repr__ for ColoTensor, DistSpec, ProcessGroup (#1235)
2 years ago
Jiarui Fang a98319f023
[tensor] torch function return colotensor (#1229)
2 years ago
HELSON 280a81243d
[tensor] improve robustness of class 'ProcessGroup' (#1223)
2 years ago
Jiarui Fang 15d988f954
[tensor] sharded global process group (#1219)
2 years ago
Jiarui Fang ae7d3f4927
[refactor] move process group from _DistSpec to ColoTensor. (#1203)
2 years ago
Jiarui Fang b5f25eb32a
[Tensor] add cpu group to ddp (#1200)
2 years ago
Jiarui Fang 060b917daf
[refactor] remove gpc dependency in colotensor's _ops (#1189)
2 years ago
Jiarui Fang c463f8adf9
[tensor] remove gpc in tensor tests (#1186)
2 years ago
Jiarui Fang 7487215b95
[ColoTensor] add independent process group (#1179)
2 years ago