Baizhou Zhang
c6f6005990
[checkpointio] Sharded Optimizer Checkpoint for Gemini Plugin ( #4302 )
...
* sharded optimizer checkpoint for gemini plugin
* modify test to reduce testing time
* update doc
* fix bug when keep_gatherd is true under GeminiPlugin
2023-07-21 14:39:01 +08:00
Baizhou Zhang
58913441a1
Next commit [checkpointio] Unsharded Optimizer Checkpoint for Gemini Plugin ( #4141 )
...
* [checkpointio] unsharded optimizer checkpoint for Gemini plugin
* [checkpointio] unsharded optimizer checkpoint for Gemini using all_gather
2023-07-07 16:33:06 +08:00
Frank Lee
c4b1b65931
[test] fixed tests failed due to dtensor change ( #4082 )
...
* [test] fixed tests failed due to dtensor change
* polish code
2023-07-04 16:05:01 +08:00
Baizhou Zhang
0bb0b481b4
[gemini] fix argument naming during chunk configuration searching
2023-06-25 13:34:15 +08:00
Hongxin Liu
3c07a2846e
[plugin] a workaround for zero plugins' optimizer checkpoint ( #3780 )
...
* [test] refactor torch ddp checkpoint test
* [plugin] update low level zero optim checkpoint
* [plugin] update gemini optim checkpoint
2023-05-19 19:42:31 +08:00
jiangmingyan
20068ba188
[booster] add tests for ddp and low level zero's checkpointio ( #3715 )
...
* [booster] update tests for booster
* [booster] update tests for booster
* [booster] update tests for booster
* [booster] update tests for booster
* [booster] update tests for booster
* [booster] update booster tutorials#3717, fix recursive check
2023-05-10 12:17:02 +08:00