傅剑寒
ef8e4ffe31
[Inference/Feat] Add kvcache quant support for fused_rotary_embedding_cache_copy ( #5680 )
2024-04-30 18:33:53 +08:00
Steve Luo
5cd75ce4c7
[Inference/Kernel] refactor kvcache manager and rotary_embedding and kvcache_memcpy oper… ( #5663 )
...
* refactor kvcache manager and rotary_embedding and kvcache_memcpy operator
* refactor decode_kv_cache_memcpy
* enable alibi in pagedattention
2024-04-30 15:52:23 +08:00
傅剑寒
808ee6e4ad
[Inference/Feat] Feat quant kvcache step2 ( #5674 )
2024-04-30 11:26:36 +08:00
傅剑寒
8ccb6714e7
[Inference/Feat] Add kvcache quantization support for FlashDecoding ( #5656 )
2024-04-26 19:40:37 +08:00
傅剑寒
279300dc5f
[Inference/Refactor] Refactor compilation mechanism and unified multi hw ( #5613 )
...
* refactor compilation mechanism and unified multi hw
* fix file path bug
* add init.py to make pybind a module to avoid relative path error caused by softlink
* delete duplicated micros
* fix micros bug in gcc
2024-04-24 14:17:54 +08:00