ColossalAI

History

Yuanheng Zhao 3da9993b0d [Kernel/Fix] Revise flash attention triton kernel API and add benchmark (#5301 ) * fix decoding kernel pytest * revise and add triton context attn benchmark		2024-01-23 17:16:02 +08:00
..
__init__.py	[kernel/fix] Performance Optimization for Decoding Kernel and Benchmarking (#5274 )	2024-01-19 15:47:16 +08:00
context_attn_unpad.py	[Kernel/Fix] Revise flash attention triton kernel API and add benchmark (#5301 )	2024-01-23 17:16:02 +08:00
custom_autotune.py	add autotune (#4822 )	2023-09-28 13:47:35 +08:00
flash_decoding.py	[Kernel/Fix] Revise flash attention triton kernel API and add benchmark (#5301 )	2024-01-23 17:16:02 +08:00
flash_decoding_utils.py	[kernel/fix] Performance Optimization for Decoding Kernel and Benchmarking (#5274 )	2024-01-19 15:47:16 +08:00
gptq_triton.py	[inference] add reference and fix some bugs (#4937 )	2023-10-20 13:39:34 +08:00
kvcache_copy.py	[inference] Adapted to Rotary Embedding and RMS Norm (#5283 )	2024-01-22 10:55:34 +08:00
llama_act_combine_kernel.py	[moe] merge moe into main (#4978 )	2023-11-02 02:21:24 +00:00
no_pad_rotary_embedding.py	[Inference] Kernel: no pad rotary embedding (#5252 )	2024-01-11 13:46:14 +00:00
qkv_matmul_kernel.py	[misc] update pre-commit and run all files (#4752 )	2023-09-19 14:20:26 +08:00
rms_layernorm.py	[kernel] Add RMSLayerNorm triton kernel (#5262 )	2024-01-18 10:21:03 +08:00
softmax.py	[misc] update pre-commit and run all files (#4752 )	2023-09-19 14:20:26 +08:00