Baizhou Zhang
d8ceeac14e
[hotfix] fix typo in hybrid parallel io ( #4697 )
2023-09-12 17:32:19 +08:00
Baizhou Zhang
38ccb8b1a3
[shardformer] support from_pretrained when loading model with HybridParallelPlugin ( #4575 )
...
* hybrid plugin support huggingface from_pretrained
* add huggingface compatibility tests
* add folder cleaning
* fix bugs
2023-09-01 17:40:01 +08:00
Baizhou Zhang
c9625dbb63
[shardformer] support sharded optimizer checkpointIO of HybridParallelPlugin ( #4540 )
...
* implement sharded optimizer saving
* add more param info
* finish implementation of sharded optimizer saving
* fix bugs in optimizer sharded saving
* add pp+zero test
* param group loading
* greedy loading of optimizer
* fix bug when loading
* implement optimizer sharded saving
* add optimizer test & arrange checkpointIO utils
* fix gemini sharding state_dict
* add verbose option
* add loading of master params
* fix typehint
* fix master/working mapping in fp16 amp
2023-08-31 14:50:47 +08:00
Baizhou Zhang
44eab2b27f
[shardformer] support sharded checkpoint IO for models of HybridParallelPlugin ( #4506 )
...
* add APIs
* implement save_sharded_model
* add test for hybrid checkpointio
* implement naive loading for sharded model
* implement efficient sharded model loading
* open a new file for hybrid checkpoint_io
* small fix
* fix circular importing
* fix docstring
* arrange arguments and apis
* small fix
2023-08-25 22:04:57 +08:00