Commit Graph

15 Commits (develop)

Author SHA1 Message Date
Frank Lee 80eba05b0a
[test] refactor tests with spawn (#3452)
2 years ago
ver217 26b7aac0be
[zero] reorganize zero/gemini folder structure (#3424)
2 years ago
HELSON 527758b2ae
[hotfix] fix a running error in test_colo_checkpoint.py (#1387)
2 years ago
HELSON f92c100ddd
[checkpoint] use gather_tensor in checkpoint and update its unit test (#1339)
2 years ago
Jiarui Fang 9e4c6449b0
[checkpoint] add ColoOptimizer checkpointing (#1316)
2 years ago
Jiarui Fang 85f933b58b
[Optimizer] Remove useless ColoOptimizer (#1312)
2 years ago
Jiarui Fang 9f10524313
[Optimizer] polish the init method of ColoOptimizer (#1310)
2 years ago
Jiarui Fang 3ef3791a3b
[checkpoint] add test for bert and hotfix save bugs (#1297)
2 years ago
Jiarui Fang c92f84fcdb
[tensor] distributed checkpointing for parameters (#1240)
2 years ago
Jiarui Fang 9bcd2fd4af
[tensor] a shorter shard and replicate spec (#1245)
2 years ago
Jiarui Fang 20da6e48c8
[checkpoint] save sharded optimizer states (#1237)
2 years ago
Jiarui Fang 3b500984b1
[tensor] fix some unittests (#1234)
2 years ago
Yi Zhao 04537bf83e
[checkpoint]support generalized scheduler (#1222)
2 years ago
Jiarui Fang 52736205d9
[checkpoint] make unitest faster (#1217)
2 years ago
Jiarui Fang f38006ea83
[checkpoint] checkpoint for ColoTensor Model (#1196)
2 years ago