ColossalAI

Commit Graph

Author	SHA1	Message	Date
Yuanheng Zhao	5f98a9d68a	[Infer] Optimize Blocked KVCache And Kernels Using It (#5325 ) * revise shape of kvcache (context attn kernel) * revise shape of kvcache (flash decoding kernel) * revise shape of kvcache (kvcache copy) and attn func * init of kvcache in kvcache manager * revise llama modeling * revise block size retrieval * use torch for rms_norm benchmarking * revise block size retrieval	2024-01-30 16:06:09 +08:00
yuehuayingxueluo	e8f0642f28	[Inference]Add Nopadding Llama Modeling (#5327 ) * add nopadding llama modeling * add nopadding_llama.py * rm unused codes * fix bugs in test_xine_copy.py * fix code style	2024-01-30 10:31:46 +08:00

Author

SHA1

Message

Date

Yuanheng Zhao

5f98a9d68a

[Infer] Optimize Blocked KVCache And Kernels Using It (#5325 )

* revise shape of kvcache (context attn kernel)

* revise shape of kvcache (flash decoding kernel)

* revise shape of kvcache (kvcache copy) and attn func

* init of kvcache in kvcache manager

* revise llama modeling

* revise block size retrieval

* use torch for rms_norm benchmarking

* revise block size retrieval

2024-01-30 16:06:09 +08:00

yuehuayingxueluo

e8f0642f28

[Inference]Add Nopadding Llama Modeling (#5327 )

* add nopadding llama modeling

* add nopadding_llama.py

* rm unused codes

* fix bugs in test_xine_copy.py

* fix code style

2024-01-30 10:31:46 +08:00

2 Commits (f8e456d20295af52665ca06a21f9fd8b468204d7)