ColossalAI

Commit Graph

Author	SHA1	Message	Date
Yuanheng Zhao	5f98a9d68a	[Infer] Optimize Blocked KVCache And Kernels Using It (#5325 ) * revise shape of kvcache (context attn kernel) * revise shape of kvcache (flash decoding kernel) * revise shape of kvcache (kvcache copy) and attn func * init of kvcache in kvcache manager * revise llama modeling * revise block size retrieval * use torch for rms_norm benchmarking * revise block size retrieval	2024-01-30 16:06:09 +08:00
Jianghai	e545a871b8	[Hotfix] Fix accuracy and align attention method api with Triton kernel (#5229 ) * fix accuracy * alignment in attention * fix attention * fix * fix bugs * fix bugs * fix bugs	2024-01-11 13:46:14 +00:00
Jianghai	bfd9b1b494	[Inference] Pytorch Attention func, pad&nopad input support (#5219 ) * add attn * add attention test * fix attn forward * fix decoding	2024-01-11 13:44:06 +00:00

Author

SHA1

Message

Date

Yuanheng Zhao

5f98a9d68a

[Infer] Optimize Blocked KVCache And Kernels Using It (#5325 )

* revise shape of kvcache (context attn kernel)

* revise shape of kvcache (flash decoding kernel)

* revise shape of kvcache (kvcache copy) and attn func

* init of kvcache in kvcache manager

* revise llama modeling

* revise block size retrieval

* use torch for rms_norm benchmarking

* revise block size retrieval

2024-01-30 16:06:09 +08:00

Jianghai

e545a871b8

[Hotfix] Fix accuracy and align attention method api with Triton kernel (#5229 )

* fix accuracy

* alignment in attention

* fix attention

* fix

* fix bugs

* fix bugs

* fix bugs

2024-01-11 13:46:14 +00:00

Jianghai

bfd9b1b494

[Inference] Pytorch Attention func, pad&nopad input support (#5219 )

* add attn

* add attention test

* fix attn forward

* fix decoding

2024-01-11 13:44:06 +00:00

3 Commits (912e24b2aaf4acda0e2b9a45a7d4327fbfc8bd39)