Commit Graph

1745 Commits (4b8312c08e8d05a5f41453d63c8671aab601ed1c)

Author SHA1 Message Date
flybird11111 0a25e16e46
[shardformer]gather llama logits (#5398)
9 months ago
QinLuo bf34c6fef6
[fsdp] impl save/load shard model/optimizer (#5357)
9 months ago
Stephan Kölker 5d380a1a21
[hotfix] Fix wrong import in meta_registry (#5392)
9 months ago
Hongxin Liu 7303801854
[llama] fix training and inference scripts (#5384)
9 months ago
Frank Lee efef43b53c
Merge pull request #5372 from hpcaitech/exp/mixtral
10 months ago
Frank Lee 4c03347fc7
Merge pull request #5377 from hpcaitech/example/llama-npu
10 months ago
ver217 06db94fbc9 [moe] fix tests
10 months ago
Hongxin Liu da39d21b71 [moe] support mixtral (#5309)
10 months ago
Hongxin Liu c904d2ae99 [moe] update capacity computing (#5253)
10 months ago
Xuanlei Zhao 7d8e0338a4 [moe] init mixtral impl
10 months ago
Hongxin Liu c53ddda88f
[lr-scheduler] fix load state dict and add test (#5369)
10 months ago
Hongxin Liu eb4f2d90f9
[llama] polish training script and fix optim ckpt (#5368)
10 months ago
Hongxin Liu 6c0fa7b9a8
[llama] fix dataloader for hybrid parallel (#5358)
10 months ago
Hongxin Liu 2dd01e3a14
[gemini] fix param op hook when output is tuple (#5355)
10 months ago
Wenhao Chen 1c790c0877
[fix] remove unnecessary dp_size assert (#5351)
10 months ago
Hongxin Liu ffffc32dc7
[checkpointio] fix gemini and hybrid parallel optim checkpoint (#5347)
10 months ago
digger yu 71321a07cf
fix typo change dosen't to doesn't (#5308)
10 months ago
flybird11111 388179f966
[tests] fix t5 test. (#5322)
10 months ago
FrankLeeeee 087d0cb1fc [accelerator] fixed npu api
10 months ago
Frank Lee 8823cc4831
Merge pull request #5310 from hpcaitech/feature/npu
10 months ago
Frank Lee 7cfed5f076
[feat] refactored extension module (#5298)
10 months ago
digger yu bce9499ed3
fix some typo (#5307)
10 months ago
ver217 148469348a Merge branch 'main' into sync/npu
11 months ago
flybird11111 46e091651b
[shardformer] hybridparallelplugin support gradients accumulation. (#5246)
11 months ago
Wenhao Chen ef4f0ee854
[hotfix]: add pp sanity check and fix mbs arg (#5268)
11 months ago
binmakeswell c174c4fc5f
[doc] fix doc typo (#5256)
11 months ago
flybird11111 e830ef917d
[ci] fix shardformer tests. (#5255)
11 months ago
Frank Lee 9102d655ab
[hotfix] removed unused flag (#5242)
11 months ago
Hongxin Liu d202cc28c0
[npu] change device to accelerator api (#5239)
11 months ago
Elsa Granger d565df3821
[pipeline] A more general _communicate in p2p (#5062)
11 months ago
Xuanlei Zhao dd2c28a323
[npu] use extension for op builder (#5172)
11 months ago
digger yu b0b53a171c
[nfc] fix typo colossalai/shardformer/ (#5133)
11 months ago
flybird11111 451e9142b8
fix flash attn (#5209)
11 months ago
flybird11111 365671be10
fix-test (#5210)
11 months ago
Wenhao Chen d799a3088f
[pipeline]: add p2p fallback order and fix interleaved pp deadlock (#5214)
11 months ago
Wenhao Chen 3c0d82b19b
[pipeline]: support arbitrary batch size in forward_only mode (#5201)
11 months ago
flybird11111 02d2328a04
support linear accumulation fusion (#5199)
11 months ago
Wenhao Chen 4fa689fca1
[pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp (#5134)
11 months ago
flybird11111 79718fae04
[shardformer] llama support DistCrossEntropy (#5176)
12 months ago
flybird11111 21aa5de00b
[gemini] hotfix NaN loss while using Gemini + tensor_parallel (#5150)
12 months ago
flybird11111 3dbbf83f1c
fix (#5158)
12 months ago
flybird11111 2a2ec49aa7
[plugin]fix 3d checkpoint load when booster boost without optimizer. (#5135)
1 year ago
Xuanlei Zhao d6df19bae7
[npu] support triangle attention for llama (#5130)
1 year ago
Frank Lee f4e72c9992
[accelerator] init the accelerator module (#5129)
1 year ago
github-actions[bot] d10ee42f68
[format] applied code formatting on changed files in pull request 5088 (#5127)
1 year ago
Wenhao Chen 7172459e74
[shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert (#5088)
1 year ago
アマデウス 126cf180bc
[hotfix] fixed memory usage of shardformer module replacement (#5122)
1 year ago
Xuanlei Zhao 68fcaa2225
remove duplicate import (#5100)
1 year ago
Xuanlei Zhao 3acbf6d496
[npu] add npu support for hybrid plugin and llama (#5090)
1 year ago
flybird11111 aae496631c
[shardformer]fix flash attention, when mask is casual, just don't unpad it (#5084)
1 year ago