Commit Graph

21 Commits (04863a9b144fc7dd46a57d2c7b0cf2f4b351ffb6)

Author SHA1 Message Date
yuehuayingxueluo 12f10d5b0b
[Fix/Inference]Fix CUDA Rotary Rmbedding GQA (#5623)
7 months ago
Yuanheng Zhao 5d4c1fe8f5
[Fix/Inference] Fix GQA Triton and Support Llama3 (#5624)
7 months ago
Steve Luo ccf72797e3
feat baichuan2 rmsnorm whose hidden size equals to 5120 (#5611)
7 months ago
Steve Luo be396ad6cc
[Inference/Kernel] Add Paged Decoding kernel, sequence split within the same thread block (#5531)
7 months ago
Yuanheng Zhao e60d430cf5 [Fix] resolve conflicts of rebasing feat/speculative-decoding (#5557)
8 months ago
Yuanheng Zhao a37f82629d [Inference/SpecDec] Add Speculative Decoding Implementation (#5423)
8 months ago
Yuanheng Zhao d63c469f45 [Infer] Revise and Adapt Triton Kernels for Spec-Dec (#5401)
8 months ago
yuehuayingxueluo 04aca9e55b
[Inference/Kernel]Add get_cos_and_sin Kernel (#5528)
8 months ago
yuehuayingxueluo 87079cffe8
[Inference]Support FP16/BF16 Flash Attention 2 And Add high_precision Flag To Rotary Embedding (#5461)
8 months ago
yuehuayingxueluo f366a5ea1f
[Inference/kernel]Add Fused Rotary Embedding and KVCache Memcopy CUDA Kernel (#5418)
9 months ago
Steve Luo f7aecc0c6b
feat rmsnorm cuda kernel and add unittest, benchmark script (#5417)
9 months ago
xs_courtesy 95c21498d4 add silu_and_mul for infer
9 months ago
yuehuayingxueluo 0aa27f1961
[Inference]Move benchmark-related code to the example directory. (#5408)
9 months ago
yuehuayingxueluo 600881a8ea
[Inference]Add CUDA KVCache Kernel (#5406)
9 months ago
Yuanheng Zhao 19061188c3
[Infer/Fix] Fix Dependency in test - RMSNorm kernel (#5399)
9 months ago
yuehuayingxueluo 2a718c8be8
Optimized the execution interval time between cuda kernels caused by view and memcopy (#5390)
9 months ago
Jianghai 730103819d
[Inference]Fused kv copy into rotary calculation (#5383)
9 months ago
yuehuayingxueluo 6fb4bcbb24
[Inference/opt] Fused KVCahce Memcopy (#5374)
10 months ago
Frank Lee 8106ede07f
Revert "[Inference] Adapt to Fused rotary (#5348)" (#5373)
10 months ago
Jianghai 9f4ab2eb92
[Inference] Adapt to Fused rotary (#5348)
10 months ago
Frank Lee e76acbb076
[inference] moved ops tests to test_infer (#5354)
10 months ago