ColossalAI

History

Yuanheng Zhao 537a3cbc4d [kernel] Support New KCache Layout - Triton Kernel (#5677 ) * kvmemcpy triton for new kcache layout * revise tests for new kcache layout * naive triton flash decoding - new kcache layout * rotary triton kernel - new kcache layout * remove redundancy - triton decoding * remove redundancy - triton kvcache copy * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>		2024-05-03 17:20:45 +08:00
..
__init__.py	[Infer] Revise and Adapt Triton Kernels for Spec-Dec (#5401 )	2024-04-10 11:07:51 +08:00
context_attn_unpad.py	[kernel] Support New KCache Layout - Triton Kernel (#5677 )	2024-05-03 17:20:45 +08:00
flash_decoding.py	[kernel] Support New KCache Layout - Triton Kernel (#5677 )	2024-05-03 17:20:45 +08:00
fused_rotary_embedding.py	[Inference]Fused the gate and up proj in mlp，and optimized the autograd process. (#5365 )	2024-02-06 19:38:25 +08:00
kvcache_copy.py	[kernel] Support New KCache Layout - Triton Kernel (#5677 )	2024-05-03 17:20:45 +08:00
llama_act_combine_kernel.py	[devops] remove post commit ci (#5566 )	2024-04-08 15:09:40 +08:00
no_pad_rotary_embedding.py	[kernel] Support New KCache Layout - Triton Kernel (#5677 )	2024-05-03 17:20:45 +08:00
qkv_matmul_kernel.py	[misc] update pre-commit and run all files (#4752 )	2023-09-19 14:20:26 +08:00
rms_layernorm.py	[fix] multi graphs capture error	2024-03-11 10:49:31 +08:00
rotary_cache_copy.py	[Inference]Fused the gate and up proj in mlp，and optimized the autograd process. (#5365 )	2024-02-06 19:38:25 +08:00
softmax.py	[misc] update pre-commit and run all files (#4752 )	2023-09-19 14:20:26 +08:00