ColossalAI

History

yuehuayingxueluo 4f28cb43c0 [inference]Optimize the usage of the mid tensors space in flash attn (#5304 ) * opt flash attn * opt tmp tensor * fix benchmark_llama * fix code style * fix None logic for output tensor * fix adapted to get_xine_cache * add comment * fix ci bugs * fix some codes * rm duplicated codes * rm duplicated codes * fix code style * add _get_dtype in config.py		10 months ago
..
__init__.py	[inference]Optimize the usage of the mid tensors space in flash attn (#5304 )	10 months ago
context_attn_unpad.py	[inference]Optimize the usage of the mid tensors space in flash attn (#5304 )	10 months ago
custom_autotune.py	add autotune (#4822 )	1 year ago
flash_decoding.py	[inference]Optimize the usage of the mid tensors space in flash attn (#5304 )	10 months ago
fused_rotary_embedding.py	[Inference]Add fused rotary kernel and get cos cache kernel (#5302 )	10 months ago
gptq_triton.py	[inference] add reference and fix some bugs (#4937 )	1 year ago
kvcache_copy.py	[inference] Adapted to Rotary Embedding and RMS Norm (#5283 )	10 months ago
llama_act_combine_kernel.py	[moe] merge moe into main (#4978 )	1 year ago
no_pad_rotary_embedding.py	[Inference]Add fused rotary kernel and get cos cache kernel (#5302 )	10 months ago
qkv_matmul_kernel.py	[misc] update pre-commit and run all files (#4752 )	1 year ago
rms_layernorm.py	[kernel] Add RMSLayerNorm triton kernel (#5262 )	10 months ago
rotary_cache_copy.py	[Inference]Add fused rotary kernel and get cos cache kernel (#5302 )	10 months ago
softmax.py	[misc] update pre-commit and run all files (#4752 )	1 year ago