ColossalAI

History

Yuanheng Zhao 5d4c1fe8f5 [Fix/Inference] Fix GQA Triton and Support Llama3 (#5624 ) * [fix] GQA calling of flash decoding triton * fix kv cache alloc shape * fix rotary triton - GQA * fix sequence max length assigning * Sequence max length logic * fix scheduling and spec-dec * skip without import error * fix pytest - skip without ImportError --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>		7 months ago
..
__init__.py	[Infer] Revise and Adapt Triton Kernels for Spec-Dec (#5401 )	8 months ago
context_attn_unpad.py	Optimized the execution interval time between cuda kernels caused by view and memcopy (#5390 )	9 months ago
flash_decoding.py	[Inference/SpecDec] Add Speculative Decoding Implementation (#5423 )	8 months ago
fused_rotary_embedding.py	[Inference]Fused the gate and up proj in mlp，and optimized the autograd process. (#5365 )	10 months ago
kvcache_copy.py	[Infer] Revise and Adapt Triton Kernels for Spec-Dec (#5401 )	8 months ago
llama_act_combine_kernel.py	[devops] remove post commit ci (#5566 )	8 months ago
no_pad_rotary_embedding.py	[Fix/Inference] Fix GQA Triton and Support Llama3 (#5624 )	7 months ago
qkv_matmul_kernel.py	[misc] update pre-commit and run all files (#4752 )	1 year ago
rms_layernorm.py	[fix] multi graphs capture error	9 months ago
rotary_cache_copy.py	[Inference]Fused the gate and up proj in mlp，and optimized the autograd process. (#5365 )	10 months ago
softmax.py	[misc] update pre-commit and run all files (#4752 )	1 year ago