ColossalAI/examples/inference
Steve Luo 5cd75ce4c7
[Inference/Kernel] refactor kvcache manager and rotary_embedding and kvcache_memcpy oper… (#5663)
* refactor kvcache manager and rotary_embedding and kvcache_memcpy operator

* refactor decode_kv_cache_memcpy

* enable alibi in pagedattention
2024-04-30 15:52:23 +08:00
..
benchmark_ops [Inference/Kernel] refactor kvcache manager and rotary_embedding and kvcache_memcpy oper… (#5663) 2024-04-30 15:52:23 +08:00
benchmark_llama.py [Fix/Inference]Fix vllm benchmark (#5630) 2024-04-24 14:51:36 +08:00
benchmark_llama3.py [Fix/Inference]Fix vllm benchmark (#5630) 2024-04-24 14:51:36 +08:00
llama_generation.py [example] Update Llama Inference example (#5629) 2024-04-23 22:23:07 +08:00
run_benchmark.sh [Fix/Inference]Fix vllm benchmark (#5630) 2024-04-24 14:51:36 +08:00