780 Commits (a52f62082de0f4b4544ba2d04e909f74123425ce)

Author SHA1 Message Date
YuliangLiu0306 2b7dca44b5
[fx]get communication size between partitions (#1224) 2 years ago
Frank Lee 84f2298a96
[fx] added patches for tracing swin transformer (#1228) 2 years ago
Frank Lee 37fcf96b7f
[fx] fixed timm tracing result misalignment (#1225) 2 years ago
Frank Lee b6cb5a47ad
[fx] added timm model tracing testing (#1221) 2 years ago
Jiarui Fang 15d988f954
[tensor] sharded global process group (#1219) 2 years ago
Frank Lee 11973d892d
[fx] added torchvision model tracing testing (#1216) 2 years ago
Jiarui Fang 52736205d9
[checkpoint] make unitest faster (#1217) 2 years ago
Jiarui Fang f38006ea83
[checkpoint] checkpoint for ColoTensor Model (#1196) 2 years ago
Jiarui Fang ae7d3f4927
[refactor] move process group from _DistSpec to ColoTensor. (#1203) 2 years ago
Frank Lee 5da87ce35d
[fx] added testing for all albert variants (#1211) 2 years ago
Frank Lee 2d13a45a3b
[fx] added testing for all gpt variants (#1210) 2 years ago
YuliangLiu0306 189946c5c4
[fx]add uniform policy (#1208) 2 years ago
Frank Lee 426a279ce7
[fx] added testing for all bert variants (#1207) 2 years ago
Frank Lee f7878f465c
[fx] supported model tracing for huggingface bert (#1201) 2 years ago
Jiarui Fang 060b917daf
[refactor] remove gpc dependency in colotensor's _ops (#1189) 2 years ago
Frank Lee abf6a262dc
[fx] added module patch for pooling layers (#1197) 2 years ago
YuliangLiu0306 63d2a93878
[context]support arbitary module materialization. (#1193) 2 years ago
YuliangLiu0306 2053e138a2
[context]use meta tensor to init model lazily. (#1187) 2 years ago
Frank Lee 2c8c05675d
[fx] patched conv and normalization (#1188) 2 years ago
Frank Lee 6d86f1bc91
[fx] supported data-dependent control flow in model tracing (#1185) 2 years ago
Jiarui Fang c463f8adf9
[tensor] remove gpc in tensor tests (#1186) 2 years ago
Jiarui Fang 372f791444
[refactor] move chunk and chunkmgr to directory gemini (#1182) 2 years ago
ver217 6b2f2ab9bb
[ddp] ColoDDP uses bucket all-reduce (#1177) 2 years ago
Jiarui Fang 7487215b95
[ColoTensor] add independent process group (#1179) 2 years ago
Jiarui Fang 1b657f9ce1
[tensor] revert local view back (#1178) 2 years ago
Jiarui Fang 0dd4e2bbfb
[Tensor] rename some APIs in TensorSpec and Polish view unittest (#1176) 2 years ago
Jiarui Fang aa7bef73d4
[Tensor] distributed view supports inter-process hybrid parallel (#1169) 2 years ago
ver217 9e1daa63d2
[zero] sharded optim supports loading local state dict (#1170) 2 years ago
ver217 561e90493f
[zero] zero optim supports loading local state dict (#1171) 2 years ago
Jiarui Fang 4b9bba8116
[ColoTensor] rename APIs and add output_replicate to ComputeSpec (#1168) 2 years ago
Jiarui Fang f4ef224358
[Tensor] remove ParallelAction, use ComputeSpec instread (#1166) 2 years ago
Jiarui Fang 177c374401
remove gather out in parallel action (#1163) 2 years ago
Jiarui Fang 07f9c781f9
[graph] improve the graph building. (#1157) 2 years ago
ver217 22717a856f
[tensor] add embedding bag op (#1156) 2 years ago
ver217 ae86151968
[tensor] add more element-wise ops (#1155) 2 years ago
ver217 ffa025e120
[tensor] dist spec s2s uses all-to-all (#1136) 2 years ago
Jiarui Fang ff644ee5e4
polish unitest test with titans (#1152) 2 years ago
Jiarui Fang 8cdce0399c
[ColoTensor] improves init functions. (#1150) 2 years ago
ver217 8106d7b8c7
[ddp] refactor ColoDDP and ZeroDDP (#1146) 2 years ago
ver217 d26902645e
[ddp] add save/load state dict for ColoDDP (#1127) 2 years ago
ver217 789cad301b
[hotfix] fix param op hook (#1131) 2 years ago
ver217 f0a954f16d
[ddp] add set_params_to_ignore for ColoDDP (#1122) 2 years ago
YuliangLiu0306 fcf55777dd
[fx]add autoparallel passes (#1121) 2 years ago
Frank Lee 16302a5359
[fx] added unit test for coloproxy (#1119) 2 years ago
ver217 7d14b473f0
[gemini] gemini mgr supports "cpu" placement policy (#1118) 2 years ago
Frank Lee 53297330c0
[test] fixed hybrid parallel test case on 8 GPUs (#1106) 2 years ago
ver217 1f894e033f
[gemini] zero supports gemini (#1093) 2 years ago
Frank Lee 2b2dc1c86b
[pipeline] refactor the pipeline module (#1087) 2 years ago
Frank Lee bad5d4c0a1
[context] support lazy init of module (#1088) 2 years ago
ver217 be01db37c8
[tensor] refactor chunk mgr and impl MemStatsCollectorV2 (#1077) 2 years ago