Commit Graph

12 Commits (e0c68ab6d3d64f401208d6ec66815995cee233c3)

Author SHA1 Message Date
pre-commit-ci[bot] 7c2f79fa98
[pre-commit.ci] pre-commit autoupdate (#5572)
5 months ago
傅剑寒 121d7ad629
[Inference] Delete duplicated copy_vector (#5716)
6 months ago
Steve Luo 7806842f2d
add paged-attetionv2: support seq length split across thread block (#5707)
6 months ago
傅剑寒 50104ab340
[Inference/Feat] Add convert_fp8 op for fp8 test in the future (#5706)
7 months ago
傅剑寒 1ace1065e6
[Inference/Feat] Add quant kvcache support for decode_kv_cache_memcpy (#5686)
7 months ago
Steve Luo 725fbd2ed0
[Inference] Remove unnecessary float4_ and rename float8_ to float8 (#5679)
7 months ago
傅剑寒 ef8e4ffe31
[Inference/Feat] Add kvcache quant support for fused_rotary_embedding_cache_copy (#5680)
7 months ago
Steve Luo 5cd75ce4c7
[Inference/Kernel] refactor kvcache manager and rotary_embedding and kvcache_memcpy oper… (#5663)
7 months ago
傅剑寒 808ee6e4ad
[Inference/Feat] Feat quant kvcache step2 (#5674)
7 months ago
傅剑寒 8ccb6714e7
[Inference/Feat] Add kvcache quantization support for FlashDecoding (#5656)
7 months ago
Steve Luo a8fd3b0342
[Inference/Kernel] Optimize paged attention: Refactor key cache layout (#5643)
7 months ago
傅剑寒 279300dc5f
[Inference/Refactor] Refactor compilation mechanism and unified multi hw (#5613)
7 months ago