Commit Graph

3 Commits (912e24b2aaf4acda0e2b9a45a7d4327fbfc8bd39)

Author SHA1 Message Date
Yuanheng Zhao 5f98a9d68a
[Infer] Optimize Blocked KVCache And Kernels Using It (#5325)
* revise shape of kvcache (context attn kernel)

* revise shape of kvcache (flash decoding kernel)

* revise shape of kvcache (kvcache copy) and attn func

* init of kvcache in kvcache manager

* revise llama modeling

* revise block size retrieval

* use torch for rms_norm benchmarking

* revise block size retrieval
2024-01-30 16:06:09 +08:00
Jianghai e545a871b8 [Hotfix] Fix accuracy and align attention method api with Triton kernel (#5229)
* fix accuracy

* alignment in attention

* fix attention

* fix

* fix bugs

* fix bugs

* fix bugs
2024-01-11 13:46:14 +00:00
Jianghai bfd9b1b494 [Inference] Pytorch Attention func, pad&nopad input support (#5219)
* add attn

* add attention test

* fix attn forward

* fix decoding
2024-01-11 13:44:06 +00:00