ColossalAI

History

yuehuayingxueluo e8f0642f28 [Inference]Add Nopadding Llama Modeling (#5327 ) * add nopadding llama modeling * add nopadding_llama.py * rm unused codes * fix bugs in test_xine_copy.py * fix code style		2024-01-30 10:31:46 +08:00
..
kernel_utils.py	[kernel/fix] Performance Optimization for Decoding Kernel and Benchmarking (#5274 )	2024-01-19 15:47:16 +08:00
test_context_attn_unpad.py	[Kernel/Fix] Revise flash attention triton kernel API and add benchmark (#5301 )	2024-01-23 17:16:02 +08:00
test_decoding_attn.py	[inference]Optimize the usage of the mid tensors space in flash attn (#5304 )	2024-01-26 14:00:10 +08:00
test_fused_rotary_embedding.py	[Inference]Add fused rotary kernel and get cos cache kernel (#5302 )	2024-01-24 16:20:42 +08:00
test_kvcache_copy.py	[kernel] Revise KVCache copy triton kernel API (#5273 )	2024-01-16 14:41:02 +08:00
test_llama_act_combine.py	[moe] merge moe into main (#4978 )	2023-11-02 02:21:24 +00:00
test_rmsnorm_triton.py	[Inference] Update rms norm kernel, benchmark with vLLM (#5315 )	2024-01-29 10:22:33 +08:00
test_rotary_embdding_unpad.py	[Inference] Benchmarking rotary embedding and add a fetch function (#5277 )	2024-01-23 12:11:53 +08:00
test_softmax.py	[misc] update pre-commit and run all files (#4752 )	2023-09-19 14:20:26 +08:00
test_xine_copy.py	[Inference]Add Nopadding Llama Modeling (#5327 )	2024-01-30 10:31:46 +08:00