Commit Graph

6 Commits (90cd5227a348dfe506e95b2e49f2a8dcd34fdbca)

Author SHA1 Message Date
Steve Luo ccf72797e3
feat baichuan2 rmsnorm whose hidden size equals to 5120 (#5611)
7 months ago
Steve Luo be396ad6cc
[Inference/Kernel] Add Paged Decoding kernel, sequence split within the same thread block (#5531)
8 months ago
yuehuayingxueluo f366a5ea1f
[Inference/kernel]Add Fused Rotary Embedding and KVCache Memcopy CUDA Kernel (#5418)
9 months ago
Steve Luo f7aecc0c6b
feat rmsnorm cuda kernel and add unittest, benchmark script (#5417)
9 months ago
yuehuayingxueluo 0aa27f1961
[Inference]Move benchmark-related code to the example directory. (#5408)
9 months ago
yuehuayingxueluo 600881a8ea
[Inference]Add CUDA KVCache Kernel (#5406)
9 months ago