Commit Graph

4 Commits (c0948aff9777dd18ed5d6adf10671e03e438f6eb)

Author SHA1 Message Date
Yuanheng Zhao 8754abae24 [Fix] Fix & Update Inference Tests (compatibility w/ main) 2024-05-05 16:28:56 +00:00
Yuanheng Zhao 537a3cbc4d
[kernel] Support New KCache Layout - Triton Kernel (#5677)
* kvmemcpy triton for new kcache layout

* revise tests for new kcache layout

* naive triton flash decoding - new kcache layout

* rotary triton kernel - new kcache layout

* remove redundancy - triton decoding

* remove redundancy - triton kvcache copy

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-05-03 17:20:45 +08:00
Steve Luo 5cd75ce4c7
[Inference/Kernel] refactor kvcache manager and rotary_embedding and kvcache_memcpy oper… (#5663)
* refactor kvcache manager and rotary_embedding and kvcache_memcpy operator

* refactor decode_kv_cache_memcpy

* enable alibi in pagedattention
2024-04-30 15:52:23 +08:00
yuehuayingxueluo 600881a8ea
[Inference]Add CUDA KVCache Kernel (#5406)
* add cuda KVCache kernel

* annotation benchmark_kvcache_copy

* add use cuda

* fix import path

* move benchmark scripts to example/

* rm benchmark codes in test_kv_cache_memcpy.py

* rm redundancy codes

* rm redundancy codes

* pr was modified according to the review
2024-02-28 14:36:50 +08:00