ColossalAI/colossalai/inference/kv_cache
Yuanheng Zhao 5f98a9d68a
[Infer] Optimize Blocked KVCache And Kernels Using It (#5325)
* revise shape of kvcache (context attn kernel)

* revise shape of kvcache (flash decoding kernel)

* revise shape of kvcache (kvcache copy) and attn func

* init of kvcache in kvcache manager

* revise llama modeling

* revise block size retrieval

* use torch for rms_norm benchmarking

* revise block size retrieval
2024-01-30 16:06:09 +08:00
..
__init__.py [Inference] Add CacheBlock and KV-Cache Manager (#5156) 2024-01-11 13:39:29 +00:00
block_cache.py [Inference] Add CacheBlock and KV-Cache Manager (#5156) 2024-01-11 13:39:29 +00:00
kvcache_manager.py [Infer] Optimize Blocked KVCache And Kernels Using It (#5325) 2024-01-30 16:06:09 +08:00