ColossalAI/colossalai/shardformer/modeling
flybird1111 7a3dfd0c64 [shardformer] update shardformer to use flash attention 2 (#4392)
* cherry-pick flash attention 2

cherry-pick flash attention 2

* [shardformer] update shardformer to use flash attention 2

[shardformer] update shardformer to use flash attention 2, fix

[shardformer] update shardformer to use flash attention 2, fix

[shardformer] update shardformer to use flash attention 2, fix
2023-08-15 23:25:14 +08:00
..
chatglm2_6b [pipeline] add chatglm (#4363) 2023-08-15 23:25:14 +08:00
__init__.py [shardformer] added development protocol for standardization (#4149) 2023-07-04 16:05:01 +08:00
bert.py [Shardformer] Merge flash attention branch to pipeline branch (#4362) 2023-08-15 23:25:14 +08:00
blip2.py [shardformer] update shardformer to use flash attention 2 (#4392) 2023-08-15 23:25:14 +08:00
bloom.py [Shardformer] Merge flash attention branch to pipeline branch (#4362) 2023-08-15 23:25:14 +08:00
chatglm.py [shardformer] update shardformer to use flash attention 2 (#4392) 2023-08-15 23:25:14 +08:00
gpt2.py [shardformer] update shardformer to use flash attention 2 (#4392) 2023-08-15 23:25:14 +08:00
jit.py [Shardformer] Merge flash attention branch to pipeline branch (#4362) 2023-08-15 23:25:14 +08:00
llama.py [shardformer] update shardformer to use flash attention 2 (#4392) 2023-08-15 23:25:14 +08:00
opt.py [shardformer] update shardformer to use flash attention 2 (#4392) 2023-08-15 23:25:14 +08:00
sam.py [Shardformer] Merge flash attention branch to pipeline branch (#4362) 2023-08-15 23:25:14 +08:00
t5.py [pipeline] rewrite t5 tests & support multi-tensor transmitting in pipeline (#4388) 2023-08-15 23:25:14 +08:00
vit.py [shardformer] update shardformer to use flash attention 2 (#4392) 2023-08-15 23:25:14 +08:00
whisper.py [shardformer] update shardformer to use flash attention 2 (#4392) 2023-08-15 23:25:14 +08:00