735 Commits (634eecb98eed79d885144d234654ea4091cabd60)
 

Author SHA1 Message Date
ver217 634eecb98e
mark sanity_check of dist_spec_mgr as staticmethod (#1161) 2 years ago
Ziyue Jiang 955ac912de
remove log (#1160) 2 years ago
ver217 4e67b2a890
fix chunk move device (#1158) 2 years ago
Jiarui Fang 07f9c781f9
[graph] improve the graph building. (#1157) 2 years ago
ver217 22717a856f
[tensor] add embedding bag op (#1156) 2 years ago
ver217 ae86151968
[tensor] add more element-wise ops (#1155) 2 years ago
github-actions[bot] e8c34eedfd
Automated submodule synchronization (#1129) 2 years ago
Frank Lee d415d73286
[workflow] fixed release post workflow (#1154) 2 years ago
ver217 54aabb8da4
[gemini] refactor gemini mgr (#1151) 2 years ago
Frank Lee f8eec98ff5
[tensor] fixed non-serializable colo parameter during model checkpointing (#1153) 2 years ago
ver217 ffa025e120
[tensor] dist spec s2s uses all-to-all (#1136) 2 years ago
Frank Lee c77da0dc81
[workflow] fixed format error in yaml file (#1145) 2 years ago
Jiarui Fang ff644ee5e4
polish unitest test with titans (#1152) 2 years ago
YuliangLiu0306 f1f51990b9
[hotfix]fix some bugs caused by refactored schedule. (#1148) 2 years ago
Jiarui Fang 8cdce0399c
[ColoTensor] improves init functions. (#1150) 2 years ago
ver217 8106d7b8c7
[ddp] refactor ColoDDP and ZeroDDP (#1146) 2 years ago
Frank Lee 0e4e62d30d
[tensor] added __repr__ to spec (#1147) 2 years ago
YuliangLiu0306 70dd88e2ee
[pipeline]add customized policy (#1139) 2 years ago
Frank Lee d1918304bb
[workflow] added workflow to auto draft the release post (#1144) 2 years ago
YuliangLiu0306 18091581c0
[pipeline]support more flexible pipeline (#1138) 2 years ago
ver217 ccf3c58c89
embedding op use gather_out (#1143) 2 years ago
Frank Lee e61dc31b05
[ci] added scripts to auto-generate release post text (#1142) 2 years ago
ver217 6690a61b4d
[hotfix] prevent nested ZeRO (#1140) 2 years ago
Frank Lee 15aab1476e
[zero] avoid zero hook spam by changing log to debug level (#1137) 2 years ago
Frank Lee 73ad05fc8c
[zero] added error message to handle on-the-fly import of torch Module class (#1135) 2 years ago
ver217 e4f555f29a
[optim] refactor fused sgd (#1134) 2 years ago
ver217 d26902645e
[ddp] add save/load state dict for ColoDDP (#1127) 2 years ago
YuliangLiu0306 946dbd629d
[hotfix]fix bugs caused by refactored pipeline (#1133) 2 years ago
ver217 789cad301b
[hotfix] fix param op hook (#1131) 2 years ago
ver217 a1a7899cae
[hotfix] fix zero init ctx numel (#1128) 2 years ago
ver217 f0a954f16d
[ddp] add set_params_to_ignore for ColoDDP (#1122) 2 years ago
YuliangLiu0306 3175bcb4d8
[pipeline]support List of Dict data (#1125) 2 years ago
Frank Lee 91a5999825
[ddp] supported customized torch ddp configuration (#1123) 2 years ago
YuliangLiu0306 fcf55777dd
[fx]add autoparallel passes (#1121) 2 years ago
ver217 e127b4375b
cast colo ddp v2 inputs/outputs (#1120) 2 years ago
Frank Lee 16302a5359
[fx] added unit test for coloproxy (#1119) 2 years ago
ver217 7d14b473f0
[gemini] gemini mgr supports "cpu" placement policy (#1118) 2 years ago
ver217 f99f56dff4
fix colo parameter torch function (#1117) 2 years ago
Frank Lee e1620ddac2
[fx] added coloproxy (#1115) 2 years ago
Frank Lee 6f82ac9bcb
[pipeline] supported more flexible dataflow control for pipeline parallel training (#1108) 2 years ago
Frank Lee 53297330c0
[test] fixed hybrid parallel test case on 8 GPUs (#1106) 2 years ago
github-actions[bot] 85b58093d2
Automated submodule synchronization (#1105) 2 years ago
Frank Lee 74948b095c
[release] update version.txt (#1103) 2 years ago
ver217 895c1c5ee7
[tensor] refactor param op hook (#1097) 2 years ago
YuliangLiu0306 1e9f9c227f
[hotfix]change to fit latest p2p (#1100) 2 years ago
Frank Lee 72bd7c696b
[amp] included dict for type casting of model output (#1102) 2 years ago
Frank Lee 5a9d8ef4d5
[workflow] fixed 8-gpu test workflow (#1101) 2 years ago
Frank Lee 03e52ecba3
[workflow] added regular 8 GPU testing (#1099) 2 years ago
Frank Lee 7f2d2b2b5b
[engine] fixed empty op hook check (#1096) 2 years ago
Frank Lee 14e5b11d7f
[zero] fixed api consistency (#1098) 2 years ago