1746 Commits (822241a99cca799e1fca250ff2fb7f54ea0f8dcd)

Author SHA1 Message Date
flybird11111 29695cf70c
[example]add gpt2 benchmark example script. (#5295) 9 months ago
flybird11111 0a25e16e46
[shardformer]gather llama logits (#5398) 9 months ago
QinLuo bf34c6fef6
[fsdp] impl save/load shard model/optimizer (#5357) 9 months ago
Stephan Kölker 5d380a1a21
[hotfix] Fix wrong import in meta_registry (#5392) 9 months ago
Hongxin Liu 7303801854
[llama] fix training and inference scripts (#5384) 9 months ago
ver217 06db94fbc9 [moe] fix tests 10 months ago
Hongxin Liu da39d21b71 [moe] support mixtral (#5309) 10 months ago
Hongxin Liu c904d2ae99 [moe] update capacity computing (#5253) 10 months ago
Xuanlei Zhao 7d8e0338a4 [moe] init mixtral impl 10 months ago
Hongxin Liu c53ddda88f
[lr-scheduler] fix load state dict and add test (#5369) 10 months ago
Hongxin Liu eb4f2d90f9
[llama] polish training script and fix optim ckpt (#5368) 10 months ago
Hongxin Liu 6c0fa7b9a8
[llama] fix dataloader for hybrid parallel (#5358) 10 months ago
Hongxin Liu 2dd01e3a14
[gemini] fix param op hook when output is tuple (#5355) 10 months ago
Wenhao Chen 1c790c0877
[fix] remove unnecessary dp_size assert (#5351) 10 months ago
Hongxin Liu ffffc32dc7
[checkpointio] fix gemini and hybrid parallel optim checkpoint (#5347) 10 months ago
digger yu 71321a07cf
fix typo change dosen't to doesn't (#5308) 10 months ago
flybird11111 388179f966
[tests] fix t5 test. (#5322) 10 months ago
FrankLeeeee 087d0cb1fc [accelerator] fixed npu api 10 months ago
Frank Lee 7cfed5f076
[feat] refactored extension module (#5298) 10 months ago
digger yu bce9499ed3
fix some typo (#5307) 10 months ago
flybird11111 46e091651b
[shardformer] hybridparallelplugin support gradients accumulation. (#5246) 10 months ago
Wenhao Chen ef4f0ee854
[hotfix]: add pp sanity check and fix mbs arg (#5268) 10 months ago
binmakeswell c174c4fc5f
[doc] fix doc typo (#5256) 11 months ago
flybird11111 e830ef917d
[ci] fix shardformer tests. (#5255) 11 months ago
Frank Lee 9102d655ab
[hotfix] removed unused flag (#5242) 11 months ago
Hongxin Liu d202cc28c0
[npu] change device to accelerator api (#5239) 11 months ago
Elsa Granger d565df3821
[pipeline] A more general _communicate in p2p (#5062) 11 months ago
Xuanlei Zhao dd2c28a323
[npu] use extension for op builder (#5172) 11 months ago
digger yu b0b53a171c
[nfc] fix typo colossalai/shardformer/ (#5133) 11 months ago
flybird11111 451e9142b8
fix flash attn (#5209) 11 months ago
flybird11111 365671be10
fix-test (#5210) 11 months ago
Wenhao Chen d799a3088f
[pipeline]: add p2p fallback order and fix interleaved pp deadlock (#5214) 11 months ago
Wenhao Chen 3c0d82b19b
[pipeline]: support arbitrary batch size in forward_only mode (#5201) 11 months ago
flybird11111 02d2328a04
support linear accumulation fusion (#5199) 11 months ago
Wenhao Chen 4fa689fca1
[pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp (#5134) 11 months ago
flybird11111 79718fae04
[shardformer] llama support DistCrossEntropy (#5176) 12 months ago
flybird11111 21aa5de00b
[gemini] hotfix NaN loss while using Gemini + tensor_parallel (#5150) 12 months ago
flybird11111 3dbbf83f1c
fix (#5158) 12 months ago
flybird11111 2a2ec49aa7
[plugin]fix 3d checkpoint load when booster boost without optimizer. (#5135) 12 months ago
Xuanlei Zhao d6df19bae7
[npu] support triangle attention for llama (#5130) 12 months ago
Frank Lee f4e72c9992
[accelerator] init the accelerator module (#5129) 12 months ago
github-actions[bot] d10ee42f68
[format] applied code formatting on changed files in pull request 5088 (#5127) 12 months ago
Wenhao Chen 7172459e74
[shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert (#5088) 1 year ago
アマデウス 126cf180bc
[hotfix] fixed memory usage of shardformer module replacement (#5122) 1 year ago
Xuanlei Zhao 68fcaa2225
remove duplicate import (#5100) 1 year ago
Xuanlei Zhao 3acbf6d496
[npu] add npu support for hybrid plugin and llama (#5090) 1 year ago
flybird11111 aae496631c
[shardformer]fix flash attention, when mask is casual, just don't unpad it (#5084) 1 year ago
Zhongkai Zhao 75af66cd81
[Hotfix] Fix model policy matching strategy in ShardFormer (#5064) 1 year ago
flybird11111 4ccb9ded7d
[gemini]fix gemini optimzer, saving Shardformer in Gemini got list assignment index out of range (#5085) 1 year ago
Jun Gao dce05da535
fix thrust-transform-reduce error (#5078) 1 year ago