Commit Graph

112 Commits (725fbd2ed067f9c58ac04670377d3e6f2a96fe00)

Author SHA1 Message Date
Hongxin Liu 641b1ee71a
[devops] remove post commit ci (#5566)
8 months ago
Zhongkai Zhao 8e412a548e
[shardformer] Sequence Parallelism Optimization (#5533)
8 months ago
Wenhao Chen e614aa34f3
[shardformer, pipeline] add `gradient_checkpointing_ratio` and heterogenous shard policy for llama (#5508)
8 months ago
Wenhao Chen bb0a668fee
[hotfix] set return_outputs=False in examples and polish code (#5404)
8 months ago
flybird11111 5e16bf7980
[shardformer] fix gathering output when using tensor parallelism (#5431)
8 months ago
Hongxin Liu f2e8b9ef9f
[devops] fix compatibility (#5444)
9 months ago
digger yu 5e1c93d732
[hotfix] fix typo change MoECheckpintIO to MoECheckpointIO (#5335)
9 months ago
flybird11111 29695cf70c
[example]add gpt2 benchmark example script. (#5295)
9 months ago
QinLuo bf34c6fef6
[fsdp] impl save/load shard model/optimizer (#5357)
9 months ago
Frank Lee efef43b53c
Merge pull request #5372 from hpcaitech/exp/mixtral
10 months ago
Hongxin Liu da39d21b71 [moe] support mixtral (#5309)
10 months ago
Xuanlei Zhao 7d8e0338a4 [moe] init mixtral impl
10 months ago
Hongxin Liu 6c0fa7b9a8
[llama] fix dataloader for hybrid parallel (#5358)
10 months ago
Wenhao Chen 1c790c0877
[fix] remove unnecessary dp_size assert (#5351)
10 months ago
ver217 148469348a Merge branch 'main' into sync/npu
10 months ago
flybird11111 46e091651b
[shardformer] hybridparallelplugin support gradients accumulation. (#5246)
10 months ago
flybird11111 e830ef917d
[ci] fix shardformer tests. (#5255)
11 months ago
Frank Lee 9102d655ab
[hotfix] removed unused flag (#5242)
11 months ago
Hongxin Liu d202cc28c0
[npu] change device to accelerator api (#5239)
11 months ago
flybird11111 365671be10
fix-test (#5210)
11 months ago
Wenhao Chen d799a3088f
[pipeline]: add p2p fallback order and fix interleaved pp deadlock (#5214)
11 months ago
Wenhao Chen 4fa689fca1
[pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp (#5134)
11 months ago
flybird11111 21aa5de00b
[gemini] hotfix NaN loss while using Gemini + tensor_parallel (#5150)
12 months ago
flybird11111 2a2ec49aa7
[plugin]fix 3d checkpoint load when booster boost without optimizer. (#5135)
1 year ago
github-actions[bot] d10ee42f68
[format] applied code formatting on changed files in pull request 5088 (#5127)
1 year ago
Wenhao Chen 7172459e74
[shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert (#5088)
1 year ago
Xuanlei Zhao 3acbf6d496
[npu] add npu support for hybrid plugin and llama (#5090)
1 year ago
github-actions[bot] 8921a73c90
[format] applied code formatting on changed files in pull request 5067 (#5072)
1 year ago
Hongxin Liu e5ce4c8ea6
[npu] add npu support for gemini and zero (#5067)
1 year ago
flybird11111 3e02154710
[gemini] gemini support extra-dp (#5043)
1 year ago
flybird11111 576a2f7b10
[gemini] gemini support tensor parallelism. (#4942)
1 year ago
Xuanlei Zhao f71e63b0f3
[moe] support optimizer checkpoint (#5015)
1 year ago
littsk 1a3315e336
[hotfix] Add layer norm gradients all-reduce for sequence parallel (#4926)
1 year ago
Xuanlei Zhao dc003c304c
[moe] merge moe into main (#4978)
1 year ago
Baizhou Zhang c040d70aa0
[hotfix] fix the bug of repeatedly storing param group (#4951)
1 year ago
Baizhou Zhang 21ba89cab6
[gemini] support gradient accumulation (#4869)
1 year ago
Zhongkai Zhao a0684e7bd6
[feature] support no master weights option for low level zero plugin (#4816)
1 year ago
littsk 83b52c56cd
[feature] Add clip_grad_norm for hybrid_parallel_plugin (#4837)
1 year ago
Hongxin Liu df63564184
[gemini] support amp o3 for gemini (#4872)
1 year ago
shaoyuw c97a3523db fix: typo in comment of low_level_zero plugin
1 year ago
Hongxin Liu 4965c0dabd
[lazy] support from_pretrained (#4801)
1 year ago
Baizhou Zhang a2db75546d
[doc] polish shardformer doc (#4779)
1 year ago
Baizhou Zhang c0a033700c
[shardformer] fix master param sync for hybrid plugin/rewrite unwrapping logic (#4758)
1 year ago
Hongxin Liu 079bf3cb26
[misc] update pre-commit and run all files (#4752)
1 year ago
Xuanlei Zhao ac2797996b
[shardformer] add custom policy in hybrid parallel plugin (#4718)
1 year ago
Baizhou Zhang f911d5b09d
[doc] Add user document for Shardformer (#4702)
1 year ago
Baizhou Zhang d8ceeac14e
[hotfix] fix typo in hybrid parallel io (#4697)
1 year ago
Baizhou Zhang 1d454733c4
[doc] Update booster user documents. (#4669)
1 year ago
Baizhou Zhang 660eed9124
[pipeline] set optimizer to optional in execute_pipeline (#4630)
1 year ago
Hongxin Liu fae6c92ead
Merge branch 'main' into feature/shardformer
1 year ago