ColossalAI

History

yuehuayingxueluo 249644c23b [Inference]Repalce Attention layer and MLP layer by shardformer to optimize the weight transpose operation，add fused_qkv and fused linear_add (#5340 ) * add fused qkv * replace attn and mlp by shardformer * fix bugs in mlp * add docstrings * fix test_inference_engine.py * add optimize unbind * add fused_addmm * rm squeeze(1) * refactor codes * fix ci bugs * rename ShardFormerLlamaMLP and ShardFormerLlamaAttention * Removed the dependency on LlamaFlashAttention2 * rollback test_inference_engine.py		2024-02-01 15:49:39 +08:00
..
jit	[npu] change device to accelerator api (#5239 )	2024-01-09 10:20:05 +08:00
triton	[Inference]Repalce Attention layer and MLP layer by shardformer to optimize the weight transpose operation，add fused_qkv and fused linear_add (#5340 )	2024-02-01 15:49:39 +08:00
__init__.py	[feat] refactored extension module (#5298 )	2024-01-25 17:01:48 +08:00
extensions	[feat] refactored extension module (#5298 )	2024-01-25 17:01:48 +08:00
kernel_loader.py	[feat] refactored extension module (#5298 )	2024-01-25 17:01:48 +08:00