ColossalAI/colossalai/kernel
Yuanheng Zhao 5d4c1fe8f5
[Fix/Inference] Fix GQA Triton and Support Llama3 (#5624)
* [fix] GQA calling of flash decoding triton

* fix kv cache alloc shape

* fix rotary triton - GQA

* fix sequence max length assigning

* Sequence max length logic

* fix scheduling and spec-dec

* skip without import error

* fix pytest - skip without ImportError

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-04-23 13:09:55 +08:00
..
jit [npu] change device to accelerator api (#5239) 2024-01-09 10:20:05 +08:00
triton [Fix/Inference] Fix GQA Triton and Support Llama3 (#5624) 2024-04-23 13:09:55 +08:00
__init__.py [feat] refactored extension module (#5298) 2024-01-25 17:01:48 +08:00
extensions [feat] refactored extension module (#5298) 2024-01-25 17:01:48 +08:00
kernel_loader.py [Fix] resolve conflicts of merging main 2024-04-08 16:21:47 +08:00