ColossalAI

History

Yuanheng Zhao 5d4c1fe8f5 [Fix/Inference] Fix GQA Triton and Support Llama3 (#5624 ) * [fix] GQA calling of flash decoding triton * fix kv cache alloc shape * fix rotary triton - GQA * fix sequence max length assigning * Sequence max length logic * fix scheduling and spec-dec * skip without import error * fix pytest - skip without ImportError --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>		2024-04-23 13:09:55 +08:00
..
__init__.py	[doc] updated inference readme (#5343 )	2024-02-02 14:31:10 +08:00
engine.py	[Fix/Inference] Fix GQA Triton and Support Llama3 (#5624 )	2024-04-23 13:09:55 +08:00
plugin.py	[Feat]Tensor Model Parallel Support For Inference (#5563 )	2024-04-18 16:56:46 +08:00
request_handler.py	[Fix/Inference] Fix GQA Triton and Support Llama3 (#5624 )	2024-04-23 13:09:55 +08:00