5 Commits (main)

Author SHA1 Message Date
Steve Luo 7806842f2d
add paged-attetionv2: support seq length split across thread block (#5707) 6 months ago
傅剑寒 50104ab340
[Inference/Feat] Add convert_fp8 op for fp8 test in the future (#5706) 7 months ago
Steve Luo 5cd75ce4c7
[Inference/Kernel] refactor kvcache manager and rotary_embedding and kvcache_memcpy oper… (#5663) 7 months ago
Steve Luo a8fd3b0342
[Inference/Kernel] Optimize paged attention: Refactor key cache layout (#5643) 7 months ago
傅剑寒 279300dc5f
[Inference/Refactor] Refactor compilation mechanism and unified multi hw (#5613) 7 months ago
Steve Luo be396ad6cc
[Inference/Kernel] Add Paged Decoding kernel, sequence split within the same thread block (#5531) 7 months ago
pre-commit-ci[bot] d78817539e [pre-commit.ci] auto fixes from pre-commit.com hooks 8 months ago
yuehuayingxueluo 04aca9e55b
[Inference/Kernel]Add get_cos_and_sin Kernel (#5528) 8 months ago
yuehuayingxueluo 87079cffe8
[Inference]Support FP16/BF16 Flash Attention 2 And Add high_precision Flag To Rotary Embedding (#5461) 8 months ago
yuehuayingxueluo f366a5ea1f
[Inference/kernel]Add Fused Rotary Embedding and KVCache Memcopy CUDA Kernel (#5418) 8 months ago
xs_courtesy 095c070a6e refactor code 9 months ago
Steve Luo f7aecc0c6b
feat rmsnorm cuda kernel and add unittest, benchmark script (#5417) 9 months ago
xs_courtesy 95c21498d4 add silu_and_mul for infer 9 months ago
yuehuayingxueluo 600881a8ea
[Inference]Add CUDA KVCache Kernel (#5406) 9 months ago