2042 Commits (8241c0c054b38a109ed3ce7be1052a1e600b8471)

Author SHA1 Message Date
Runyu Lu aabc9fb6aa [feat] add use_cuda_kernel option 8 months ago
flybird11111 5e16bf7980
[shardformer] fix gathering output when using tensor parallelism (#5431) 8 months ago
Runyu Lu 6e30248683 [fix] tmp for test 8 months ago
Runyu Lu ae24b4f025 diverse tests 8 months ago
Runyu Lu 1821a6dab0 [fix] pytest and fix dyn grid bug 8 months ago
yuehuayingxueluo f366a5ea1f
[Inference/kernel]Add Fused Rotary Embedding and KVCache Memcopy CUDA Kernel (#5418) 8 months ago
Hongxin Liu f2e8b9ef9f
[devops] fix compatibility (#5444) 8 months ago
digger yu 385e85afd4
[hotfix] fix typo s/keywrods/keywords etc. (#5429) 9 months ago
Runyu Lu 633e95b301 [doc] add doc 9 months ago
Runyu Lu 9dec66fad6 [fix] multi graphs capture error 9 months ago
Runyu Lu b2c0d9ff2b [fix] multi graphs capture error 9 months ago
Steve Luo f7aecc0c6b
feat rmsnorm cuda kernel and add unittest, benchmark script (#5417) 9 months ago
Runyu Lu cefaeb5fdd [feat] cuda graph support and refactor non-functional api 9 months ago
digger yu 5e1c93d732
[hotfix] fix typo change MoECheckpintIO to MoECheckpointIO (#5335) 9 months ago
digger yu 049121d19d
[hotfix] fix typo change enabel to enable under colossalai/shardformer/ (#5317) 9 months ago
digger yu 16c96d4d8c
[hotfix] fix typo change _descrption to _description (#5331) 9 months ago
Hongxin Liu 070df689e6
[devops] fix extention building (#5427) 9 months ago
flybird11111 29695cf70c
[example]add gpt2 benchmark example script. (#5295) 9 months ago
yuehuayingxueluo 600881a8ea
[Inference]Add CUDA KVCache Kernel (#5406) 9 months ago
flybird11111 0a25e16e46
[shardformer]gather llama logits (#5398) 9 months ago
QinLuo bf34c6fef6
[fsdp] impl save/load shard model/optimizer (#5357) 9 months ago
yuehuayingxueluo bc1da87366
[Fix/Inference] Fix format of input prompts and input model in inference engine (#5395) 9 months ago
yuehuayingxueluo 2a718c8be8
Optimized the execution interval time between cuda kernels caused by view and memcopy (#5390) 9 months ago
Jianghai 730103819d
[Inference]Fused kv copy into rotary calculation (#5383) 9 months ago
Stephan Kölker 5d380a1a21
[hotfix] Fix wrong import in meta_registry (#5392) 9 months ago
Yuanheng Zhao b21aac5bae
[Inference] Optimize and Refactor Inference Batching/Scheduling (#5367) 9 months ago
Hongxin Liu 7303801854
[llama] fix training and inference scripts (#5384) 9 months ago
yuehuayingxueluo 8c69debdc7
[Inference]Support vllm testing in benchmark scripts (#5379) 10 months ago
Frank Lee 9afa52061f
[inference] refactored config (#5376) 10 months ago
ver217 06db94fbc9 [moe] fix tests 10 months ago
Hongxin Liu da39d21b71 [moe] support mixtral (#5309) 10 months ago
Hongxin Liu c904d2ae99 [moe] update capacity computing (#5253) 10 months ago
Xuanlei Zhao 7d8e0338a4 [moe] init mixtral impl 10 months ago
Jianghai 1f8c7e7046
[Inference] User Experience: update the logic of default tokenizer and generation config. (#5337) 10 months ago
yuehuayingxueluo 6fb4bcbb24
[Inference/opt] Fused KVCahce Memcopy (#5374) 10 months ago
Frank Lee 58740b5f68
[inference] added inference template (#5375) 10 months ago
Frank Lee 8106ede07f
Revert "[Inference] Adapt to Fused rotary (#5348)" (#5373) 10 months ago
Jianghai 9f4ab2eb92
[Inference] Adapt to Fused rotary (#5348) 10 months ago
yuehuayingxueluo 35382a7fbf
[Inference]Fused the gate and up proj in mlp,and optimized the autograd process. (#5365) 10 months ago
Yuanheng Zhao 1dedb57747
[Fix/Infer] Remove unused deps and revise requirements (#5341) 10 months ago
Hongxin Liu c53ddda88f
[lr-scheduler] fix load state dict and add test (#5369) 10 months ago
Hongxin Liu eb4f2d90f9
[llama] polish training script and fix optim ckpt (#5368) 10 months ago
Hongxin Liu 6c0fa7b9a8
[llama] fix dataloader for hybrid parallel (#5358) 10 months ago
Hongxin Liu 2dd01e3a14
[gemini] fix param op hook when output is tuple (#5355) 10 months ago
yuehuayingxueluo 631862f339
[Inference]Optimize generation process of inference engine (#5356) 10 months ago
yuehuayingxueluo 21ad4a27f9
[Inference/opt]Optimize the mid tensor of RMS Norm (#5350) 10 months ago
Wenhao Chen 1c790c0877
[fix] remove unnecessary dp_size assert (#5351) 10 months ago
Frank Lee 027aa1043f
[doc] updated inference readme (#5343) 10 months ago
Frank Lee db1a763307
[inference] removed redundancy init_batch (#5353) 10 months ago
Hongxin Liu ffffc32dc7
[checkpointio] fix gemini and hybrid parallel optim checkpoint (#5347) 10 months ago