13 Commits (a8d459f99a1d415fc843327e4dafce19ecee1f3e)

Author SHA1 Message Date
Steve Luo 7806842f2d
add paged-attetionv2: support seq length split across thread block (#5707) 6 months ago
Yuanheng Zhao 55cc7f3df7
[Fix] Fix Inference Example, Tests, and Requirements (#5688) 7 months ago
Yuanheng Zhao 8754abae24 [Fix] Fix & Update Inference Tests (compatibility w/ main) 7 months ago
Yuanheng Zhao 537a3cbc4d
[kernel] Support New KCache Layout - Triton Kernel (#5677) 7 months ago
Steve Luo 5cd75ce4c7
[Inference/Kernel] refactor kvcache manager and rotary_embedding and kvcache_memcpy oper… (#5663) 7 months ago
Yuanheng Zhao 5be590b99e
[kernel] Support new KCache Layout - Context Attention Triton Kernel (#5658) 7 months ago
Steve Luo a8fd3b0342
[Inference/Kernel] Optimize paged attention: Refactor key cache layout (#5643) 7 months ago
Steve Luo ccf72797e3
feat baichuan2 rmsnorm whose hidden size equals to 5120 (#5611) 7 months ago
Steve Luo be396ad6cc
[Inference/Kernel] Add Paged Decoding kernel, sequence split within the same thread block (#5531) 7 months ago
yuehuayingxueluo f366a5ea1f
[Inference/kernel]Add Fused Rotary Embedding and KVCache Memcopy CUDA Kernel (#5418) 8 months ago
Steve Luo f7aecc0c6b
feat rmsnorm cuda kernel and add unittest, benchmark script (#5417) 9 months ago
yuehuayingxueluo 0aa27f1961
[Inference]Move benchmark-related code to the example directory. (#5408) 9 months ago
yuehuayingxueluo 600881a8ea
[Inference]Add CUDA KVCache Kernel (#5406) 9 months ago