ColossalAI/colossalai/kernel/triton
Yuanheng Zhao fa85e02b3b
[kernel] Add KV cache copy kernel during decoding (#5261)
* add kv copy triton kernel during decoding stage

* add pytest and fix kernel

* fix test utilities

* revise kernel config

* add benchmark for kvcache copy
2024-01-15 17:37:20 +08:00
..
__init__.py [kernel] Add KV cache copy kernel during decoding (#5261) 2024-01-15 17:37:20 +08:00
context_attn_unpad.py [kernel] Add flash decoding triton kernel for blocked kv cache (#5249) 2024-01-11 13:46:14 +00:00
custom_autotune.py add autotune (#4822) 2023-09-28 13:47:35 +08:00
flash_decoding.py [kernel] Add flash decoding triton kernel for blocked kv cache (#5249) 2024-01-11 13:46:14 +00:00
fused_layernorm.py [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
gptq_triton.py [inference] add reference and fix some bugs (#4937) 2023-10-20 13:39:34 +08:00
kvcache_copy.py [kernel] Add KV cache copy kernel during decoding (#5261) 2024-01-15 17:37:20 +08:00
llama_act_combine_kernel.py [moe] merge moe into main (#4978) 2023-11-02 02:21:24 +00:00
no_pad_rotary_embedding.py [Inference] Kernel: no pad rotary embedding (#5252) 2024-01-11 13:46:14 +00:00
qkv_matmul_kernel.py [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
softmax.py [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00