ColossalAI

History

Yuanheng Zhao 07b5283b6a [kernel] Add triton kernel for context attention (FAv2) without padding (#5192 ) * add context attn unpadded triton kernel * test compatibility * kv cache copy (testing) * fix k/v cache copy * fix kv cache copy and test * fix boundary of block ptrs * add support for GQA/MQA and testing * fix import statement --------- Co-authored-by: Round Heng <yuanhengzhao@Rounds-MacBook-Pro.local>		2024-01-11 13:39:56 +00:00
..
__init__.py	[kernel] Add triton kernel for context attention (FAv2) without padding (#5192 )	2024-01-11 13:39:56 +00:00
context_attn_unpad.py	[kernel] Add triton kernel for context attention (FAv2) without padding (#5192 )	2024-01-11 13:39:56 +00:00
custom_autotune.py	add autotune (#4822 )	2023-09-28 13:47:35 +08:00
fused_layernorm.py	[misc] update pre-commit and run all files (#4752 )	2023-09-19 14:20:26 +08:00
gptq_triton.py	[inference] add reference and fix some bugs (#4937 )	2023-10-20 13:39:34 +08:00
llama_act_combine_kernel.py	[moe] merge moe into main (#4978 )	2023-11-02 02:21:24 +00:00
qkv_matmul_kernel.py	[misc] update pre-commit and run all files (#4752 )	2023-09-19 14:20:26 +08:00
softmax.py	[misc] update pre-commit and run all files (#4752 )	2023-09-19 14:20:26 +08:00