ColossalAI

History

Yuanheng Zhao 5f98a9d68a [Infer] Optimize Blocked KVCache And Kernels Using It (#5325 ) * revise shape of kvcache (context attn kernel) * revise shape of kvcache (flash decoding kernel) * revise shape of kvcache (kvcache copy) and attn func * init of kvcache in kvcache manager * revise llama modeling * revise block size retrieval * use torch for rms_norm benchmarking * revise block size retrieval	2024-01-30 16:06:09 +08:00
..
test_attention.py	[Infer] Optimize Blocked KVCache And Kernels Using It (#5325 )	2024-01-30 16:06:09 +08:00

[Infer] Optimize Blocked KVCache And Kernels Using It (#5325 )

* revise shape of kvcache (context attn kernel)

* revise shape of kvcache (flash decoding kernel)

* revise shape of kvcache (kvcache copy) and attn func

* init of kvcache in kvcache manager

* revise llama modeling

* revise block size retrieval

* use torch for rms_norm benchmarking

* revise block size retrieval

2024-01-30 16:06:09 +08:00

test_attention.py

[Infer] Optimize Blocked KVCache And Kernels Using It (#5325 )

2024-01-30 16:06:09 +08:00