mirror of https://github.com/hpcaitech/ColossalAI
![]() * add fused qkv * replace attn and mlp by shardformer * fix bugs in mlp * add docstrings * fix test_inference_engine.py * add optimize unbind * add fused_addmm * rm squeeze(1) * refactor codes * fix ci bugs * rename ShardFormerLlamaMLP and ShardFormerLlamaAttention * Removed the dependency on LlamaFlashAttention2 * rollback test_inference_engine.py |
||
---|---|---|
.. | ||
__init__.py | ||
context_attn_unpad.py | ||
custom_autotune.py | ||
flash_decoding.py | ||
fused_rotary_embedding.py | ||
gptq_triton.py | ||
kvcache_copy.py | ||
llama_act_combine_kernel.py | ||
no_pad_rotary_embedding.py | ||
qkv_matmul_kernel.py | ||
rms_layernorm.py | ||
rotary_cache_copy.py | ||
softmax.py |