ColossalAI

History

Yuanheng Zhao 5d4c1fe8f5 [Fix/Inference] Fix GQA Triton and Support Llama3 (#5624 ) * [fix] GQA calling of flash decoding triton * fix kv cache alloc shape * fix rotary triton - GQA * fix sequence max length assigning * Sequence max length logic * fix scheduling and spec-dec * skip without import error * fix pytest - skip without ImportError --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>		2024-04-23 13:09:55 +08:00
..
jit	[npu] change device to accelerator api (#5239 )	2024-01-09 10:20:05 +08:00
triton	[Fix/Inference] Fix GQA Triton and Support Llama3 (#5624 )	2024-04-23 13:09:55 +08:00
__init__.py	[feat] refactored extension module (#5298 )	2024-01-25 17:01:48 +08:00
extensions	[feat] refactored extension module (#5298 )	2024-01-25 17:01:48 +08:00
kernel_loader.py	[Fix] resolve conflicts of merging main	2024-04-08 16:21:47 +08:00