Commit Graph

6 Commits (375e356a1632bb242efd3dd51bcdcb57de0ea293)

Author SHA1 Message Date
Steve Luo 725fbd2ed0
[Inference] Remove unnecessary float4_ and rename float8_ to float8 (#5679)
7 months ago
傅剑寒 9df016fc45
[Inference] Fix quant bits order (#5681)
7 months ago
傅剑寒 ef8e4ffe31
[Inference/Feat] Add kvcache quant support for fused_rotary_embedding_cache_copy (#5680)
7 months ago
傅剑寒 808ee6e4ad
[Inference/Feat] Feat quant kvcache step2 (#5674)
7 months ago
傅剑寒 8ccb6714e7
[Inference/Feat] Add kvcache quantization support for FlashDecoding (#5656)
7 months ago
傅剑寒 279300dc5f
[Inference/Refactor] Refactor compilation mechanism and unified multi hw (#5613)
7 months ago