ColossalAI/colossalai/kernel/triton
Jianghai 9f4ab2eb92
[Inference] Adapt to Fused rotary (#5348)
* revise rotary embedding

* remove useless print

* adapt

* fix

* add

* fix

* modeling

* fix

* fix

* fix
2024-02-07 11:36:04 +08:00
..
__init__.py [inference]Optimize the usage of the mid tensors space in flash attn (#5304) 2024-01-26 14:00:10 +08:00
context_attn_unpad.py [Infer] Optimize Blocked KVCache And Kernels Using It (#5325) 2024-01-30 16:06:09 +08:00
custom_autotune.py add autotune (#4822) 2023-09-28 13:47:35 +08:00
flash_decoding.py [Inference]Fused the gate and up proj in mlp,and optimized the autograd process. (#5365) 2024-02-06 19:38:25 +08:00
fused_rotary_embedding.py [Inference]Fused the gate and up proj in mlp,and optimized the autograd process. (#5365) 2024-02-06 19:38:25 +08:00
gptq_triton.py [inference] add reference and fix some bugs (#4937) 2023-10-20 13:39:34 +08:00
kvcache_copy.py [Inference] Adapt to Fused rotary (#5348) 2024-02-07 11:36:04 +08:00
llama_act_combine_kernel.py [moe] merge moe into main (#4978) 2023-11-02 02:21:24 +00:00
no_pad_rotary_embedding.py [Inference] Adapt to Fused rotary (#5348) 2024-02-07 11:36:04 +08:00
qkv_matmul_kernel.py [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
rms_layernorm.py [Inference]Fused the gate and up proj in mlp,and optimized the autograd process. (#5365) 2024-02-06 19:38:25 +08:00
rotary_cache_copy.py [Inference]Fused the gate and up proj in mlp,and optimized the autograd process. (#5365) 2024-02-06 19:38:25 +08:00
softmax.py [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00