Commit Graph

19 Commits (06dccdde449e433d83dc42d7898a2ceed654053c)

Author SHA1 Message Date
Jiarui Fang 1b491ad7de
[doc] update docstring in ProcessGroup (#1468) 2022-08-19 13:41:57 +08:00
Jiarui Fang a1476ea882
[NFC] polish doc style for ColoTensor (#1457) 2022-08-16 09:21:05 +08:00
HELSON c7221cb2d4
[hotfix] adapt ProcessGroup and Optimizer to ColoTensor (#1388) 2022-07-29 19:33:24 +08:00
ver217 828b9e5e0d
[hotfix] fix zero optim save/load state dict (#1381) 2022-07-28 17:19:39 +08:00
HELSON f92c100ddd
[checkpoint] use gather_tensor in checkpoint and update its unit test (#1339) 2022-07-19 14:15:28 +08:00
ver217 0c51ff2c13
[hotfix] ZeroDDP use new process group (#1333)
* process group supports getting ranks in group

* chunk mgr receives a process group

* update unit test

* fix unit tests
2022-07-18 14:14:52 +08:00
HELSON d49708ae43
[hotfix] fix ddp for unit test test_gpt2 (#1326) 2022-07-15 18:19:52 +08:00
Jiarui Fang 9f10524313
[Optimizer] polish the init method of ColoOptimizer (#1310) 2022-07-14 16:37:33 +08:00
Jiarui Fang 1aad903c15
[tensor] redistribute among different process groups (#1247)
* make it faster

* [tensor] rename convert_to_dist -> redistribute

* [tensor] ShardSpec and ReplicaSpec

* [tensor] redistribute among diff pgs

* polish code
2022-07-12 10:24:05 +08:00
Jiarui Fang 20da6e48c8
[checkpoint] save sharded optimizer states (#1237) 2022-07-08 16:33:13 +08:00
HELSON f071b500b6
[polish] polish __repr__ for ColoTensor, DistSpec, ProcessGroup (#1235) 2022-07-08 13:25:57 +08:00
Jiarui Fang a98319f023
[tensor] torch function return colotensor (#1229) 2022-07-07 18:09:18 +08:00
HELSON 280a81243d
[tensor] improve robustness of class 'ProcessGroup' (#1223) 2022-07-07 13:55:24 +08:00
Jiarui Fang 15d988f954
[tensor] sharded global process group (#1219) 2022-07-07 13:38:48 +08:00
Jiarui Fang ae7d3f4927
[refactor] move process group from _DistSpec to ColoTensor. (#1203) 2022-07-06 16:15:16 +08:00
Jiarui Fang b5f25eb32a
[Tensor] add cpu group to ddp (#1200) 2022-07-05 14:58:28 +08:00
Jiarui Fang 060b917daf
[refactor] remove gpc dependency in colotensor's _ops (#1189) 2022-07-04 18:54:37 +08:00
Jiarui Fang c463f8adf9
[tensor] remove gpc in tensor tests (#1186) 2022-06-29 14:08:40 +08:00
Jiarui Fang 7487215b95
[ColoTensor] add independent process group (#1179) 2022-06-29 10:03:09 +08:00