Commit Graph

146 Commits (937f4042533c2e30313d79b8490f9f65a57d94df)

Author SHA1 Message Date
Boyuan Yao cfe2a9bd90
[autoparallel] memory estimation for shape consistency (#2144)
2 years ago
Jiarui Fang 2827f41898
[Gemini] GeminiDPP convert to PyTorch Module. (#2151)
2 years ago
Jiarui Fang e99edfcb51
[NFC] polish comments for Chunk class (#2116)
2 years ago
Jiarui Fang b3b89865e2
[Gemini] ParamOpHook -> ColoParamOpHook (#2080)
2 years ago
YuliangLiu0306 81330b0352
[autoparallel] add experimental permute handler (#2029)
2 years ago
Genghan Zhang d655eea515
[autoparallel] mix gather (#1977)
2 years ago
YuliangLiu0306 36c0f3ea5b
[autoparallel] remove redundancy comm node (#1893)
2 years ago
YuliangLiu0306 49216d7ab1
[autoparallel] fix bugs caused by negative dim key (#1808)
2 years ago
Jiarui Fang 218c75fd9d
[NFC] polish type hint for shape consistency (#1801)
2 years ago
HELSON c6a1a62636
[hotfix] fix zero's incompatibility with checkpoint in torch-1.12 (#1786)
2 years ago
Frank Lee f3f19a5c47
[autoparallel] added matmul handler (#1763)
2 years ago
YuliangLiu0306 b0f7c8bde8
[autoparallel] update CommSpec to CommActions (#1768)
2 years ago
YuliangLiu0306 b4cc59b61e
[autoparallel] add numerical test for node strategies (#1760)
2 years ago
YuliangLiu0306 980ed21723
[autoparallel] shard param and buffer as expected (#1753)
2 years ago
YuliangLiu0306 a4ce180e85
[autoparallel] add sequential order to communication actions (#1735)
2 years ago
Frank Lee 993b8875b6
[autoparallel] handled illegal sharding strategy in shape consistency (#1744)
2 years ago
Frank Lee eee84908d4
[autoparallel] handled illegal sharding strategy (#1728)
2 years ago
YuliangLiu0306 51b89d2202
[autoparallel] runtime_backward_apply (#1720)
2 years ago
Frank Lee 4973157ad7
[autoparallel] added sharding spec conversion for linear handler (#1687)
2 years ago
YuliangLiu0306 3f068d1409
[autoparallel] update CommSpec (#1667)
2 years ago
Frank Lee 154d3ef432
[fix] fixed the collective pattern name for consistency (#1649)
2 years ago
YuliangLiu0306 702dbc5288
[tensor] use communication autograd func (#1617)
2 years ago
Frank Lee 27fe8af60c
[autoparallel] refactored shape consistency to remove redundancy (#1591)
2 years ago
YuliangLiu0306 44c866a3e3
[autoparallel] change the merge node logic (#1533)
2 years ago
YuliangLiu0306 4b03c25f85
[tensor]add 1D device mesh (#1492)
2 years ago
YuliangLiu0306 26a37b5cd5
[autoparallel] Add conv handler to generate strategies and costs info for conv (#1467)
2 years ago
Jiarui Fang 1b491ad7de
[doc] update docstring in ProcessGroup (#1468)
2 years ago
YuliangLiu0306 b73fb7a077
[tensor] support runtime ShardingSpec apply (#1453)
2 years ago
Jiarui Fang 36824a304c
[Doc] add more doc for ColoTensor. (#1458)
2 years ago
Jiarui Fang a1476ea882
[NFC] polish doc style for ColoTensor (#1457)
2 years ago
YuliangLiu0306 0f3042363c
[tensor] shape consistency generate transform path and communication cost (#1435)
2 years ago
Frank Lee ae1b58cd16
[tensor] added linear implementation for the new sharding spec (#1416)
2 years ago
YuliangLiu0306 33f0744d51
[tensor] add shape consistency feature to support auto spec transform (#1418)
2 years ago
YuliangLiu0306 7c96055c68
[tensor]build sharding spec to replace distspec in future. (#1405)
2 years ago
HELSON c7221cb2d4
[hotfix] adapt ProcessGroup and Optimizer to ColoTensor (#1388)
2 years ago
ver217 828b9e5e0d
[hotfix] fix zero optim save/load state dict (#1381)
2 years ago
HELSON 943a96323e
[hotfix] fix no optimizer in save/load (#1363)
2 years ago
ver217 d068af81a3
[doc] update rst and docstring (#1351)
2 years ago
HELSON 7a8702c06d
[colotensor] add Tensor.view op and its unit test (#1343)
2 years ago
HELSON f92c100ddd
[checkpoint] use gather_tensor in checkpoint and update its unit test (#1339)
2 years ago
ver217 0c51ff2c13
[hotfix] ZeroDDP use new process group (#1333)
2 years ago
HELSON d49708ae43
[hotfix] fix ddp for unit test test_gpt2 (#1326)
2 years ago
HELSON 1b41686461
[hotfix] fix unit test test_module_spec (#1321)
2 years ago
Jiarui Fang 85f933b58b
[Optimizer] Remove useless ColoOptimizer (#1312)
2 years ago
Jiarui Fang 9f10524313
[Optimizer] polish the init method of ColoOptimizer (#1310)
2 years ago
HELSON 260a55804a
[hotfix] fix shape error in backward when using ColoTensor (#1298)
2 years ago
Jiarui Fang 556b9b7e1a
[hotfix] Dist Mgr gather torch version (#1284)
2 years ago
ver217 7aadcbd070
hotfix colotensor _scan_for_pg_from_args (#1276)
2 years ago
Jiarui Fang c92f84fcdb
[tensor] distributed checkpointing for parameters (#1240)
2 years ago
Jiarui Fang 1aad903c15
[tensor] redistribute among different process groups (#1247)
2 years ago