Commit Graph

1727 Commits (feat/moe)

Author SHA1 Message Date
ver217 63ee6fffe6 Merge branch 'main' into exp/mixtral
11 months ago
ver217 ce1cff26bd Merge branch 'main' into exp/mixtral
11 months ago
Elsa Granger d565df3821
[pipeline] A more general _communicate in p2p (#5062)
11 months ago
Wenhao Chen 196b85368b [pipeline]: add p2p fallback order and fix interleaved pp deadlock (#5214)
11 months ago
Wenhao Chen 931d0e0731 [pipeline]: support arbitrary batch size in forward_only mode (#5201)
11 months ago
Wenhao Chen 1810b9100f [pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp (#5134)
11 months ago
digger yu b0b53a171c
[nfc] fix typo colossalai/shardformer/ (#5133)
11 months ago
Xuanlei Zhao 6b69f3085b update
11 months ago
flybird11111 451e9142b8
fix flash attn (#5209)
11 months ago
flybird11111 365671be10
fix-test (#5210)
11 months ago
Xuanlei Zhao 8ca8cf8ec3 update optim
11 months ago
Wenhao Chen d799a3088f
[pipeline]: add p2p fallback order and fix interleaved pp deadlock (#5214)
11 months ago
Wenhao Chen 3c0d82b19b
[pipeline]: support arbitrary batch size in forward_only mode (#5201)
11 months ago
Xuanlei Zhao f037583bd2 update train
11 months ago
flybird11111 02d2328a04
support linear accumulation fusion (#5199)
11 months ago
Xuanlei Zhao c1c6af6368 update
11 months ago
Xuanlei Zhao ccad7014c6 update optim
11 months ago
Xuanlei Zhao 44014faa67 fix optim
11 months ago
Xuanlei Zhao 0a3aae509b update utils and fwd bwd
11 months ago
Wenhao Chen 4fa689fca1
[pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp (#5134)
11 months ago
Xuanlei Zhao 7c5b1a585f update
11 months ago
Xuanlei Zhao f66469e209 update
12 months ago
flybird11111 79718fae04
[shardformer] llama support DistCrossEntropy (#5176)
12 months ago
flybird11111 21aa5de00b
[gemini] hotfix NaN loss while using Gemini + tensor_parallel (#5150)
12 months ago
flybird11111 3dbbf83f1c
fix (#5158)
12 months ago
flybird11111 2a2ec49aa7
[plugin]fix 3d checkpoint load when booster boost without optimizer. (#5135)
1 year ago
github-actions[bot] d10ee42f68
[format] applied code formatting on changed files in pull request 5088 (#5127)
1 year ago
Wenhao Chen 7172459e74
[shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert (#5088)
1 year ago
アマデウス 126cf180bc
[hotfix] fixed memory usage of shardformer module replacement (#5122)
1 year ago
Xuanlei Zhao 68fcaa2225
remove duplicate import (#5100)
1 year ago
Xuanlei Zhao 3acbf6d496
[npu] add npu support for hybrid plugin and llama (#5090)
1 year ago
flybird11111 aae496631c
[shardformer]fix flash attention, when mask is casual, just don't unpad it (#5084)
1 year ago
Zhongkai Zhao 75af66cd81
[Hotfix] Fix model policy matching strategy in ShardFormer (#5064)
1 year ago
flybird11111 4ccb9ded7d
[gemini]fix gemini optimzer, saving Shardformer in Gemini got list assignment index out of range (#5085)
1 year ago
Jun Gao dce05da535
fix thrust-transform-reduce error (#5078)
1 year ago
Hongxin Liu 1cd7efc520
[inference] refactor examples and fix schedule (#5077)
1 year ago
Bin Jia 4e3959d316
[hotfix/hybridengine] Fix init model with random parameters in benchmark (#5074)
1 year ago
github-actions[bot] 8921a73c90
[format] applied code formatting on changed files in pull request 5067 (#5072)
1 year ago
Xu Kai fb103cfd6e
[inference] update examples and engine (#5073)
1 year ago
Bin Jia 0c7d8bebd5
[hotfix/hybridengine] fix bug when tp*pp size = 1 (#5069)
1 year ago
Hongxin Liu e5ce4c8ea6
[npu] add npu support for gemini and zero (#5067)
1 year ago
Cuiqing Li (李崔卿) bce919708f
[Kernels]added flash-decoidng of triton (#5063)
1 year ago
Xu Kai fd6482ad8c
[inference] Refactor inference architecture (#5057)
1 year ago
Wenhao Chen 3c08f17348
[hotfix]: modify create_ep_hierarchical_group and add test (#5032)
1 year ago
flybird11111 97cd0cd559
[shardformer] fix llama error when transformers upgraded. (#5055)
1 year ago
flybird11111 3e02154710
[gemini] gemini support extra-dp (#5043)
1 year ago
Elsa Granger b2ad0d9e8f
[pipeline,shardformer] Fix p2p efficiency in pipeline, allow skipping loading weight not in weight_map when `strict=False`, fix llama flash attention forward, add flop estimation by megatron in llama benchmark (#5017)
1 year ago
Cuiqing Li (李崔卿) 28052a71fb
[Kernels]Update triton kernels into 2.1.0 (#5046)
1 year ago
Zhongkai Zhao 70885d707d
[hotfix] Suport extra_kwargs in ShardConfig (#5031)
1 year ago
flybird11111 576a2f7b10
[gemini] gemini support tensor parallelism. (#4942)
1 year ago