ColossalAI

History

Yuanheng Zhao 6e487e7d3c [kernel/fix] Performance Optimization for Decoding Kernel and Benchmarking (#5274 ) * prevent re-creating intermediate tensors * add singleton class holding intermediate values * fix triton kernel api * add benchmark in pytest * fix kernel api and add benchmark * revise flash decoding triton kernel in/out shapes * fix calling of triton kernel in modeling * fix pytest: extract to util functions		2024-01-19 15:47:16 +08:00
..
kernel_utils.py	[kernel/fix] Performance Optimization for Decoding Kernel and Benchmarking (#5274 )	2024-01-19 15:47:16 +08:00
test_context_attn_unpad.py	[kernel/fix] Performance Optimization for Decoding Kernel and Benchmarking (#5274 )	2024-01-19 15:47:16 +08:00
test_decoding_attn.py	[kernel/fix] Performance Optimization for Decoding Kernel and Benchmarking (#5274 )	2024-01-19 15:47:16 +08:00
test_kvcache_copy.py	[kernel] Revise KVCache copy triton kernel API (#5273 )	2024-01-16 14:41:02 +08:00
test_llama_act_combine.py	[moe] merge moe into main (#4978 )	2023-11-02 02:21:24 +00:00
test_rmsnorm_triton.py	[kernel] Add RMSLayerNorm triton kernel (#5262 )	2024-01-18 10:21:03 +08:00
test_rotary_embdding_unpad.py	[Inference] Kernel: no pad rotary embedding (#5252 )	2024-01-11 13:46:14 +00:00
test_softmax.py	[misc] update pre-commit and run all files (#4752 )	2023-09-19 14:20:26 +08:00