* [shardformer] support lazy init
* [shardformer] linear support lazy init
* [shardformer] embedding support lazy init
* [shardformer] norm support lazy init
* [shardformer] fused linear support lazy init
* [test] update shardformer test layer
* [test] shardformer with lazy init fit ddp
* [lazy] hotfix deepcopy of param
* [shardformer] fix bert policy and update test
* [shardformer] fix bloom policy and update test
* [shardformer] fix opt policy and update test
* [shardformer] fix t5 policy and update test
* [shardformer] fix gpt2 policy and update test
* [shardformer] fix llama policy and update test
* add pipeline policy and bert forward to be done
* add bertmodel pipeline forward and make tests
* add Bert_Policy and test for policy
* update formatting
* update formatting
* update the code
* fix bugs
* fix name confilt
* add bloom model and policy ,revise the base class of policy
* revise
* revision
* add bert_for_pretraining
* add bert_for_pretraining forward and policy
* fix typos
* cancel warning
* change the imediate output to default dict
* change the default output of get_shared_params
* add pipeline policy and bert forward to be done
* add bertmodel pipeline forward and make tests
* add Bert_Policy and test for policy
* update formatting
* update formatting
* update the code
* fix bugs
* fix name confilt
* add bloom model and policy ,revise the base class of policy
* revise
* revision
* add bert_for_pretraining
* add pipeline policy and bert forward to be done
* add bertmodel pipeline forward and make tests
* add Bert_Policy and test for policy
* update formatting
* update formatting
* update the code
* fix bugs
* fix name confilt
* add pipeline policy and bert forward to be done
* add bertmodel pipeline forward and make tests
* add Bert_Policy and test for policy
* update formatting
* update formatting
* update the code
* fix bugs
* fix name confilt
* refactor low level zero
* fix zero2 and support cpu offload
* avg gradient and modify unit test
* refactor grad store, support layer drop
* refactor bucket store, support grad accumulation
* fix and update unit test of zero and ddp
* compatible with tp, ga and unit test
* fix memory leak and polish
* add zero layer drop unittest
* polish code
* fix import err in unit test
* support diffenert comm dtype, modify docstring style
* polish code
* test padding and fix
* fix unit test of low level zero
* fix pad recording in bucket store
* support some models
* polish
* sharded optimizer checkpoint for gemini plugin
* modify test to reduce testing time
* update doc
* fix bug when keep_gatherd is true under GeminiPlugin
* first v of vit shardformer
* keep vit
* update
* vit shard add vitattention vitlayer
* update num head shard para
* finish test for vit
* add new_model_class & postprocess
* add vit readme
* delete old files & fix the conflict
* fix sth
* add layernorm to bert
* add layernorm test
* add layernorm test with load state dict
* add use_mixedfusedLN in shard config
* refactor policy to support fused_layernorm