1835 Commits (4bb5d8923a6e85a0f89a483f15933698635a9f9c)

Author SHA1 Message Date
Yuanheng Zhao 4bb5d8923a
[Fix/Inference] Remove unused and non-functional functions (#5543) 8 months ago
yuehuayingxueluo 04aca9e55b
[Inference/Kernel]Add get_cos_and_sin Kernel (#5528) 8 months ago
傅剑寒 e6496dd371
[Inference] Optimize request handler of llama (#5512) 8 months ago
Runyu Lu 6251d68dc9
[fix] PR #5354 (#5501) 8 months ago
yuehuayingxueluo 87079cffe8
[Inference]Support FP16/BF16 Flash Attention 2 And Add high_precision Flag To Rotary Embedding (#5461) 8 months ago
Runyu Lu ff4998c6f3 [fix] remove unused comment 8 months ago
Runyu Lu 5b017d6324 [fix] 8 months ago
Runyu Lu 4eafe0c814 [fix] unused option 8 months ago
Runyu Lu aabc9fb6aa [feat] add use_cuda_kernel option 8 months ago
Runyu Lu 6e30248683 [fix] tmp for test 8 months ago
Runyu Lu ae24b4f025 diverse tests 8 months ago
Runyu Lu 1821a6dab0 [fix] pytest and fix dyn grid bug 8 months ago
yuehuayingxueluo f366a5ea1f
[Inference/kernel]Add Fused Rotary Embedding and KVCache Memcopy CUDA Kernel (#5418) 8 months ago
Runyu Lu 633e95b301 [doc] add doc 9 months ago
Runyu Lu 9dec66fad6 [fix] multi graphs capture error 9 months ago
Runyu Lu b2c0d9ff2b [fix] multi graphs capture error 9 months ago
Steve Luo f7aecc0c6b
feat rmsnorm cuda kernel and add unittest, benchmark script (#5417) 9 months ago
Runyu Lu cefaeb5fdd [feat] cuda graph support and refactor non-functional api 9 months ago
yuehuayingxueluo 600881a8ea
[Inference]Add CUDA KVCache Kernel (#5406) 9 months ago
flybird11111 0a25e16e46
[shardformer]gather llama logits (#5398) 9 months ago
QinLuo bf34c6fef6
[fsdp] impl save/load shard model/optimizer (#5357) 9 months ago
yuehuayingxueluo bc1da87366
[Fix/Inference] Fix format of input prompts and input model in inference engine (#5395) 9 months ago
yuehuayingxueluo 2a718c8be8
Optimized the execution interval time between cuda kernels caused by view and memcopy (#5390) 9 months ago
Jianghai 730103819d
[Inference]Fused kv copy into rotary calculation (#5383) 9 months ago
Stephan Kölker 5d380a1a21
[hotfix] Fix wrong import in meta_registry (#5392) 9 months ago
Yuanheng Zhao b21aac5bae
[Inference] Optimize and Refactor Inference Batching/Scheduling (#5367) 9 months ago
Hongxin Liu 7303801854
[llama] fix training and inference scripts (#5384) 9 months ago
yuehuayingxueluo 8c69debdc7
[Inference]Support vllm testing in benchmark scripts (#5379) 10 months ago
Frank Lee 9afa52061f
[inference] refactored config (#5376) 10 months ago
ver217 06db94fbc9 [moe] fix tests 10 months ago
Hongxin Liu da39d21b71 [moe] support mixtral (#5309) 10 months ago
Hongxin Liu c904d2ae99 [moe] update capacity computing (#5253) 10 months ago
Xuanlei Zhao 7d8e0338a4 [moe] init mixtral impl 10 months ago
Jianghai 1f8c7e7046
[Inference] User Experience: update the logic of default tokenizer and generation config. (#5337) 10 months ago
yuehuayingxueluo 6fb4bcbb24
[Inference/opt] Fused KVCahce Memcopy (#5374) 10 months ago
Frank Lee 58740b5f68
[inference] added inference template (#5375) 10 months ago
Frank Lee 8106ede07f
Revert "[Inference] Adapt to Fused rotary (#5348)" (#5373) 10 months ago
Jianghai 9f4ab2eb92
[Inference] Adapt to Fused rotary (#5348) 10 months ago
yuehuayingxueluo 35382a7fbf
[Inference]Fused the gate and up proj in mlp,and optimized the autograd process. (#5365) 10 months ago
Yuanheng Zhao 1dedb57747
[Fix/Infer] Remove unused deps and revise requirements (#5341) 10 months ago
Hongxin Liu c53ddda88f
[lr-scheduler] fix load state dict and add test (#5369) 10 months ago
Hongxin Liu eb4f2d90f9
[llama] polish training script and fix optim ckpt (#5368) 10 months ago
Hongxin Liu 6c0fa7b9a8
[llama] fix dataloader for hybrid parallel (#5358) 10 months ago
Hongxin Liu 2dd01e3a14
[gemini] fix param op hook when output is tuple (#5355) 10 months ago
yuehuayingxueluo 631862f339
[Inference]Optimize generation process of inference engine (#5356) 10 months ago
yuehuayingxueluo 21ad4a27f9
[Inference/opt]Optimize the mid tensor of RMS Norm (#5350) 10 months ago
Wenhao Chen 1c790c0877
[fix] remove unnecessary dp_size assert (#5351) 10 months ago
Frank Lee 027aa1043f
[doc] updated inference readme (#5343) 10 months ago
Frank Lee db1a763307
[inference] removed redundancy init_batch (#5353) 10 months ago
Hongxin Liu ffffc32dc7
[checkpointio] fix gemini and hybrid parallel optim checkpoint (#5347) 10 months ago