121 Commits (bd38fe6b912379080673a43d77fd3bdf0e5c852e)

Author SHA1 Message Date
Edenzzzz 43995ee436
[Feature] Distributed optimizers: Lamb, Galore, CAME and Adafactor (#5694) 6 months ago
flybird11111 77ec773388
[zero]remove registered gradients hooks (#5687) 7 months ago
Hongxin Liu 7f8b16635b
[misc] refactor launch API and tensor constructor (#5666) 7 months ago
linsj20 91fa553775 [Feature] qlora support (#5586) 7 months ago
flybird11111 8954a0c2e2 [LowLevelZero] low level zero support lora (#5153) 7 months ago
Baizhou Zhang 14b0d4c7e5 [lora] add lora APIs for booster, support lora for TorchDDP (#4981) 7 months ago
Hongxin Liu 1b387ca9fe
[shardformer] refactor pipeline grad ckpt config (#5646) 7 months ago
Hongxin Liu 4de4e31818
[exampe] update llama example (#5626) 7 months ago
flybird11111 a0ad587c24
[shardformer] refactor embedding resize (#5603) 7 months ago
Hongxin Liu 641b1ee71a
[devops] remove post commit ci (#5566) 8 months ago
Zhongkai Zhao 8e412a548e
[shardformer] Sequence Parallelism Optimization (#5533) 8 months ago
Wenhao Chen e614aa34f3
[shardformer, pipeline] add `gradient_checkpointing_ratio` and heterogenous shard policy for llama (#5508) 8 months ago
Wenhao Chen bb0a668fee
[hotfix] set return_outputs=False in examples and polish code (#5404) 8 months ago
flybird11111 5e16bf7980
[shardformer] fix gathering output when using tensor parallelism (#5431) 8 months ago
Hongxin Liu f2e8b9ef9f
[devops] fix compatibility (#5444) 8 months ago
digger yu 5e1c93d732
[hotfix] fix typo change MoECheckpintIO to MoECheckpointIO (#5335) 9 months ago
flybird11111 29695cf70c
[example]add gpt2 benchmark example script. (#5295) 9 months ago
QinLuo bf34c6fef6
[fsdp] impl save/load shard model/optimizer (#5357) 9 months ago
Hongxin Liu da39d21b71 [moe] support mixtral (#5309) 10 months ago
Xuanlei Zhao 7d8e0338a4 [moe] init mixtral impl 10 months ago
Hongxin Liu 6c0fa7b9a8
[llama] fix dataloader for hybrid parallel (#5358) 10 months ago
Wenhao Chen 1c790c0877
[fix] remove unnecessary dp_size assert (#5351) 10 months ago
flybird11111 46e091651b
[shardformer] hybridparallelplugin support gradients accumulation. (#5246) 10 months ago
flybird11111 e830ef917d
[ci] fix shardformer tests. (#5255) 11 months ago
Frank Lee 9102d655ab
[hotfix] removed unused flag (#5242) 11 months ago
Hongxin Liu d202cc28c0
[npu] change device to accelerator api (#5239) 11 months ago
flybird11111 365671be10
fix-test (#5210) 11 months ago
Wenhao Chen d799a3088f
[pipeline]: add p2p fallback order and fix interleaved pp deadlock (#5214) 11 months ago
Wenhao Chen 4fa689fca1
[pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp (#5134) 11 months ago
flybird11111 21aa5de00b
[gemini] hotfix NaN loss while using Gemini + tensor_parallel (#5150) 12 months ago
flybird11111 2a2ec49aa7
[plugin]fix 3d checkpoint load when booster boost without optimizer. (#5135) 12 months ago
github-actions[bot] d10ee42f68
[format] applied code formatting on changed files in pull request 5088 (#5127) 12 months ago
Wenhao Chen 7172459e74
[shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert (#5088) 1 year ago
Xuanlei Zhao 3acbf6d496
[npu] add npu support for hybrid plugin and llama (#5090) 1 year ago
github-actions[bot] 8921a73c90
[format] applied code formatting on changed files in pull request 5067 (#5072) 1 year ago
Hongxin Liu e5ce4c8ea6
[npu] add npu support for gemini and zero (#5067) 1 year ago
flybird11111 3e02154710
[gemini] gemini support extra-dp (#5043) 1 year ago
flybird11111 576a2f7b10
[gemini] gemini support tensor parallelism. (#4942) 1 year ago
Xuanlei Zhao f71e63b0f3
[moe] support optimizer checkpoint (#5015) 1 year ago
littsk 1a3315e336
[hotfix] Add layer norm gradients all-reduce for sequence parallel (#4926) 1 year ago
Xuanlei Zhao dc003c304c
[moe] merge moe into main (#4978) 1 year ago
Baizhou Zhang c040d70aa0
[hotfix] fix the bug of repeatedly storing param group (#4951) 1 year ago
Baizhou Zhang 21ba89cab6
[gemini] support gradient accumulation (#4869) 1 year ago
Zhongkai Zhao a0684e7bd6
[feature] support no master weights option for low level zero plugin (#4816) 1 year ago
littsk 83b52c56cd
[feature] Add clip_grad_norm for hybrid_parallel_plugin (#4837) 1 year ago
Hongxin Liu df63564184
[gemini] support amp o3 for gemini (#4872) 1 year ago
shaoyuw c97a3523db fix: typo in comment of low_level_zero plugin 1 year ago
Hongxin Liu 4965c0dabd
[lazy] support from_pretrained (#4801) 1 year ago
Baizhou Zhang a2db75546d
[doc] polish shardformer doc (#4779) 1 year ago
Baizhou Zhang c0a033700c
[shardformer] fix master param sync for hybrid plugin/rewrite unwrapping logic (#4758) 1 year ago