ColossalAI

History

yuehuayingxueluo 249644c23b [Inference]Repalce Attention layer and MLP layer by shardformer to optimize the weight transpose operation，add fused_qkv and fused linear_add (#5340 ) * add fused qkv * replace attn and mlp by shardformer * fix bugs in mlp * add docstrings * fix test_inference_engine.py * add optimize unbind * add fused_addmm * rm squeeze(1) * refactor codes * fix ci bugs * rename ShardFormerLlamaMLP and ShardFormerLlamaAttention * Removed the dependency on LlamaFlashAttention2 * rollback test_inference_engine.py		2024-02-01 15:49:39 +08:00
..
kernel_utils.py	[Inference]Repalce Attention layer and MLP layer by shardformer to optimize the weight transpose operation，add fused_qkv and fused linear_add (#5340 )	2024-02-01 15:49:39 +08:00
test_context_attn_unpad.py	[Infer] Optimize Blocked KVCache And Kernels Using It (#5325 )	2024-01-30 16:06:09 +08:00
test_decoding_attn.py	[Inference]Repalce Attention layer and MLP layer by shardformer to optimize the weight transpose operation，add fused_qkv and fused linear_add (#5340 )	2024-02-01 15:49:39 +08:00
test_fused_rotary_embedding.py	[Inference]Add fused rotary kernel and get cos cache kernel (#5302 )	2024-01-24 16:20:42 +08:00
test_kvcache_copy.py	[Infer] Optimize Blocked KVCache And Kernels Using It (#5325 )	2024-01-30 16:06:09 +08:00
test_rmsnorm_triton.py	[Infer] Optimize Blocked KVCache And Kernels Using It (#5325 )	2024-01-30 16:06:09 +08:00
test_rotary_embdding_unpad.py	[Inference] Kernel Fusion, fused copy kv cache into rotary embedding (#5336 )	2024-01-31 16:31:29 +08:00
test_xine_copy.py	[Inference] Kernel Fusion, fused copy kv cache into rotary embedding (#5336 )	2024-01-31 16:31:29 +08:00