Commit Graph

2 Commits (8e606ecc7e89ffed80537e89a27bb1eb6759f4bc)

Author SHA1 Message Date
Yuanheng Zhao 1513f20f4d [kernel] Add flash decoding triton kernel for blocked kv cache (#5249)
* add flash decoding unpad triton kernel

* rename flash decoding kernel

* add kernel testing (draft)

* revise pytest

* support kv group (GQA)

* (trivial) fix api and pytest

* (trivial) func renaming

* (trivial) func/file renaming

* refactor pytest for attention

* (trivial) format and consistent vars of context/decode attn

* (trivial) remove test redundancy
2024-01-11 13:46:14 +00:00
Yuanheng Zhao 07b5283b6a [kernel] Add triton kernel for context attention (FAv2) without padding (#5192)
* add context attn unpadded triton kernel

* test compatibility

* kv cache copy (testing)

* fix k/v cache copy

* fix kv cache copy and test

* fix boundary of block ptrs

* add support for GQA/MQA and testing

* fix import statement

---------

Co-authored-by: Round Heng <yuanhengzhao@Rounds-MacBook-Pro.local>
2024-01-11 13:39:56 +00:00