Commit Graph

13 Commits (5f8c0a0ac3b52a71b664c3e36dd1a8cef40f428d)

Author SHA1 Message Date
flybird11111 a0ad587c24
[shardformer] refactor embedding resize (#5603)
7 months ago
digger yu 5e1c93d732
[hotfix] fix typo change MoECheckpintIO to MoECheckpointIO (#5335)
9 months ago
Hongxin Liu eb4f2d90f9
[llama] polish training script and fix optim ckpt (#5368)
10 months ago
Hongxin Liu ffffc32dc7
[checkpointio] fix gemini and hybrid parallel optim checkpoint (#5347)
10 months ago
Elsa Granger b2ad0d9e8f
[pipeline,shardformer] Fix p2p efficiency in pipeline, allow skipping loading weight not in weight_map when `strict=False`, fix llama flash attention forward, add flop estimation by megatron in llama benchmark (#5017)
1 year ago
Baizhou Zhang 64a08b2dc3
[checkpointio] support unsharded checkpointIO for hybrid parallel (#4774)
1 year ago
Baizhou Zhang c0a033700c
[shardformer] fix master param sync for hybrid plugin/rewrite unwrapping logic (#4758)
1 year ago
Hongxin Liu 079bf3cb26
[misc] update pre-commit and run all files (#4752)
1 year ago
flybird11111 4c4482f3ad
[example] llama2 add fine-tune example (#4673)
1 year ago
Baizhou Zhang d8ceeac14e
[hotfix] fix typo in hybrid parallel io (#4697)
1 year ago
Baizhou Zhang 38ccb8b1a3
[shardformer] support from_pretrained when loading model with HybridParallelPlugin (#4575)
1 year ago
Baizhou Zhang c9625dbb63
[shardformer] support sharded optimizer checkpointIO of HybridParallelPlugin (#4540)
1 year ago
Baizhou Zhang 44eab2b27f
[shardformer] support sharded checkpoint IO for models of HybridParallelPlugin (#4506)
1 year ago