Commit Graph

15 Commits (f447ca18111c2e37a2f14e7aecc98876dc7e3216)

Author SHA1 Message Date
Frank Lee 80eba05b0a
[test] refactor tests with spawn (#3452)
* [test] added spawn decorator

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code
2023-04-06 14:51:35 +08:00
ver217 26b7aac0be
[zero] reorganize zero/gemini folder structure (#3424)
* [zero] refactor low-level zero folder structure

* [zero] fix legacy zero import path

* [zero] fix legacy zero import path

* [zero] remove useless import

* [zero] refactor gemini folder structure

* [zero] refactor gemini folder structure

* [zero] refactor legacy zero import path

* [zero] refactor gemini folder structure

* [zero] refactor gemini folder structure

* [zero] refactor gemini folder structure

* [zero] refactor legacy zero import path

* [zero] fix test import path

* [zero] fix test

* [zero] fix circular import

* [zero] update import
2023-04-04 13:48:16 +08:00
HELSON 527758b2ae
[hotfix] fix a running error in test_colo_checkpoint.py (#1387) 2022-07-29 15:58:06 +08:00
HELSON f92c100ddd
[checkpoint] use gather_tensor in checkpoint and update its unit test (#1339) 2022-07-19 14:15:28 +08:00
Jiarui Fang 9e4c6449b0
[checkpoint] add ColoOptimizer checkpointing (#1316) 2022-07-15 09:52:55 +08:00
Jiarui Fang 85f933b58b
[Optimizer] Remove useless ColoOptimizer (#1312) 2022-07-14 16:57:48 +08:00
Jiarui Fang 9f10524313
[Optimizer] polish the init method of ColoOptimizer (#1310) 2022-07-14 16:37:33 +08:00
Jiarui Fang 3ef3791a3b
[checkpoint] add test for bert and hotfix save bugs (#1297) 2022-07-14 15:38:18 +08:00
Jiarui Fang c92f84fcdb
[tensor] distributed checkpointing for parameters (#1240) 2022-07-12 15:51:06 +08:00
Jiarui Fang 9bcd2fd4af
[tensor] a shorter shard and replicate spec (#1245) 2022-07-11 15:51:48 +08:00
Jiarui Fang 20da6e48c8
[checkpoint] save sharded optimizer states (#1237) 2022-07-08 16:33:13 +08:00
Jiarui Fang 3b500984b1
[tensor] fix some unittests (#1234) 2022-07-08 14:18:30 +08:00
Yi Zhao 04537bf83e
[checkpoint]support generalized scheduler (#1222) 2022-07-07 18:16:38 +08:00
Jiarui Fang 52736205d9
[checkpoint] make unitest faster (#1217) 2022-07-06 17:39:46 +08:00
Jiarui Fang f38006ea83
[checkpoint] checkpoint for ColoTensor Model (#1196) 2022-07-06 17:22:03 +08:00