Hongxin Liu
a39a5c66fe
Merge branch 'main' into feature/shardformer
2023-09-04 23:43:13 +08:00
Baizhou Zhang
c9625dbb63
[shardformer] support sharded optimizer checkpointIO of HybridParallelPlugin ( #4540 )
...
* implement sharded optimizer saving
* add more param info
* finish implementation of sharded optimizer saving
* fix bugs in optimizer sharded saving
* add pp+zero test
* param group loading
* greedy loading of optimizer
* fix bug when loading
* implement optimizer sharded saving
* add optimizer test & arrange checkpointIO utils
* fix gemini sharding state_dict
* add verbose option
* add loading of master params
* fix typehint
* fix master/working mapping in fp16 amp
2023-08-31 14:50:47 +08:00
Hongxin Liu
27061426f7
[gemini] improve compatibility and add static placement policy ( #4479 )
...
* [gemini] remove distributed-related part from colotensor (#4379 )
* [gemini] remove process group dependency
* [gemini] remove tp part from colo tensor
* [gemini] patch inplace op
* [gemini] fix param op hook and update tests
* [test] remove useless tests
* [test] remove useless tests
* [misc] fix requirements
* [test] fix model zoo
* [test] fix model zoo
* [test] fix model zoo
* [test] fix model zoo
* [test] fix model zoo
* [misc] update requirements
* [gemini] refactor gemini optimizer and gemini ddp (#4398 )
* [gemini] update optimizer interface
* [gemini] renaming gemini optimizer
* [gemini] refactor gemini ddp class
* [example] update gemini related example
* [example] update gemini related example
* [plugin] fix gemini plugin args
* [test] update gemini ckpt tests
* [gemini] fix checkpoint io
* [example] fix opt example requirements
* [example] fix opt example
* [example] fix opt example
* [example] fix opt example
* [gemini] add static placement policy (#4443 )
* [gemini] add static placement policy
* [gemini] fix param offload
* [test] update gemini tests
* [plugin] update gemini plugin
* [plugin] update gemini plugin docstr
* [misc] fix flash attn requirement
* [test] fix gemini checkpoint io test
* [example] update resnet example result (#4457 )
* [example] update bert example result (#4458 )
* [doc] update gemini doc (#4468 )
* [example] update gemini related examples (#4473 )
* [example] update gpt example
* [example] update dreambooth example
* [example] update vit
* [example] update opt
* [example] update palm
* [example] update vit and opt benchmark
* [hotfix] fix bert in model zoo (#4480 )
* [hotfix] fix bert in model zoo
* [test] remove chatglm gemini test
* [test] remove sam gemini test
* [test] remove vit gemini test
* [hotfix] fix opt tutorial example (#4497 )
* [hotfix] fix opt tutorial example
* [hotfix] fix opt tutorial example
2023-08-24 09:29:25 +08:00
Baizhou Zhang
6ccecc0c69
[gemini] fix tensor storage cleaning in state dict collection ( #4396 )
2023-08-10 15:36:46 +08:00
Baizhou Zhang
c6f6005990
[checkpointio] Sharded Optimizer Checkpoint for Gemini Plugin ( #4302 )
...
* sharded optimizer checkpoint for gemini plugin
* modify test to reduce testing time
* update doc
* fix bug when keep_gatherd is true under GeminiPlugin
2023-07-21 14:39:01 +08:00
Baizhou Zhang
58913441a1
Next commit [checkpointio] Unsharded Optimizer Checkpoint for Gemini Plugin ( #4141 )
...
* [checkpointio] unsharded optimizer checkpoint for Gemini plugin
* [checkpointio] unsharded optimizer checkpoint for Gemini using all_gather
2023-07-07 16:33:06 +08:00
Hongxin Liu
ae02d4e4f7
[bf16] add bf16 support ( #3882 )
...
* [bf16] add bf16 support for fused adam (#3844 )
* [bf16] fused adam kernel support bf16
* [test] update fused adam kernel test
* [test] update fused adam test
* [bf16] cpu adam and hybrid adam optimizers support bf16 (#3860 )
* [bf16] implement mixed precision mixin and add bf16 support for low level zero (#3869 )
* [bf16] add mixed precision mixin
* [bf16] low level zero optim support bf16
* [text] update low level zero test
* [text] fix low level zero grad acc test
* [bf16] add bf16 support for gemini (#3872 )
* [bf16] gemini support bf16
* [test] update gemini bf16 test
* [doc] update gemini docstring
* [bf16] add bf16 support for plugins (#3877 )
* [bf16] add bf16 support for legacy zero (#3879 )
* [zero] init context support bf16
* [zero] legacy zero support bf16
* [test] add zero bf16 test
* [doc] add bf16 related docstring for legacy zero
2023-06-05 15:58:31 +08:00
Hongxin Liu
173dad0562
[misc] add verbose arg for zero and op builder ( #3552 )
...
* [misc] add print verbose
* [gemini] add print verbose
* [zero] add print verbose for low level
* [misc] add print verbose for op builder
2023-04-17 11:25:35 +08:00
YH
bcf0cbcbe7
[doc] Add docs for clip args in zero optim ( #3504 )
2023-04-10 11:11:28 +08:00
ver217
26b7aac0be
[zero] reorganize zero/gemini folder structure ( #3424 )
...
* [zero] refactor low-level zero folder structure
* [zero] fix legacy zero import path
* [zero] fix legacy zero import path
* [zero] remove useless import
* [zero] refactor gemini folder structure
* [zero] refactor gemini folder structure
* [zero] refactor legacy zero import path
* [zero] refactor gemini folder structure
* [zero] refactor gemini folder structure
* [zero] refactor gemini folder structure
* [zero] refactor legacy zero import path
* [zero] fix test import path
* [zero] fix test
* [zero] fix circular import
* [zero] update import
2023-04-04 13:48:16 +08:00