You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
ColossalAI/colossalai/kernel/triton
yuehuayingxueluo 6fb4bcbb24
[Inference/opt] Fused KVCahce Memcopy (#5374)
10 months ago
..
__init__.py [inference]Optimize the usage of the mid tensors space in flash attn (#5304) 10 months ago
context_attn_unpad.py [Infer] Optimize Blocked KVCache And Kernels Using It (#5325) 10 months ago
custom_autotune.py add autotune (#4822) 1 year ago
flash_decoding.py [Inference]Fused the gate and up proj in mlp,and optimized the autograd process. (#5365) 10 months ago
fused_rotary_embedding.py [Inference]Fused the gate and up proj in mlp,and optimized the autograd process. (#5365) 10 months ago
gptq_triton.py [inference] add reference and fix some bugs (#4937) 1 year ago
kvcache_copy.py [Inference/opt] Fused KVCahce Memcopy (#5374) 10 months ago
llama_act_combine_kernel.py [moe] merge moe into main (#4978) 1 year ago
no_pad_rotary_embedding.py Revert "[Inference] Adapt to Fused rotary (#5348)" (#5373) 10 months ago
qkv_matmul_kernel.py
rms_layernorm.py [Inference]Fused the gate and up proj in mlp,and optimized the autograd process. (#5365) 10 months ago
rotary_cache_copy.py [Inference]Fused the gate and up proj in mlp,and optimized the autograd process. (#5365) 10 months ago
softmax.py