ColossalAI/tests/test_infer_ops/triton
yuehuayingxueluo 249644c23b
[Inference]Repalce Attention layer and MLP layer by shardformer to optimize the weight transpose operation,add fused_qkv and fused linear_add (#5340)
* add fused qkv

* replace attn and mlp by shardformer

* fix bugs in mlp

* add docstrings

* fix test_inference_engine.py

* add optimize unbind

* add fused_addmm

* rm squeeze(1)

* refactor codes

* fix ci bugs

* rename ShardFormerLlamaMLP and ShardFormerLlamaAttention

* Removed the dependency on LlamaFlashAttention2

* rollback test_inference_engine.py
2024-02-01 15:49:39 +08:00
..
kernel_utils.py [Inference]Repalce Attention layer and MLP layer by shardformer to optimize the weight transpose operation,add fused_qkv and fused linear_add (#5340) 2024-02-01 15:49:39 +08:00
test_context_attn_unpad.py [Infer] Optimize Blocked KVCache And Kernels Using It (#5325) 2024-01-30 16:06:09 +08:00
test_decoding_attn.py [Inference]Repalce Attention layer and MLP layer by shardformer to optimize the weight transpose operation,add fused_qkv and fused linear_add (#5340) 2024-02-01 15:49:39 +08:00
test_fused_rotary_embedding.py [Inference]Add fused rotary kernel and get cos cache kernel (#5302) 2024-01-24 16:20:42 +08:00
test_kvcache_copy.py [Infer] Optimize Blocked KVCache And Kernels Using It (#5325) 2024-01-30 16:06:09 +08:00
test_rmsnorm_triton.py [Infer] Optimize Blocked KVCache And Kernels Using It (#5325) 2024-01-30 16:06:09 +08:00
test_rotary_embdding_unpad.py [Inference] Kernel Fusion, fused copy kv cache into rotary embedding (#5336) 2024-01-31 16:31:29 +08:00
test_xine_copy.py [Inference] Kernel Fusion, fused copy kv cache into rotary embedding (#5336) 2024-01-31 16:31:29 +08:00