ColossalAI/colossalai/shardformer/modeling
Bin Jia 424629fea0
[shardformer/sequence parallel] Cherry pick commit to new branch (#4450)
* [shardformer/sequence parallel] Support sequence parallel for gpt2 (#4384)

* [sequence parallel] add sequence parallel linear col/row support (#4336)

* add sequence parallel linear col/row support

* add annotation

* add annotation

* add support for gpt2 fused qkv linear layer

* support sequence parallel in GPT2

* add docstring and note

* add requirments

* remove unused flash-attb

* modify flash attn test

* modify flash attn setting

* modify flash attn code

* add assert before divide, rename forward function

* [shardformer/test] fix gpt2 test with seq-parallel

* [shardformer/sequence parallel] Overlap input gather and grad computation during col backward (#4401)

* overlap gather input / grad computing during col backward

* modify test for overlap

* simplify code

* fix code and modify cuda stream synchronize

* [shardformer/sequence parallel] polish code
2023-08-16 15:41:20 +08:00
..
chatglm2_6b [pipeline] add chatglm (#4363) 2023-08-15 23:25:14 +08:00
__init__.py [shardformer] added development protocol for standardization (#4149) 2023-07-04 16:05:01 +08:00
bert.py [misc] resolve code factor issues (#4433) 2023-08-15 23:25:14 +08:00
blip2.py [shardformer] update shardformer to use flash attention 2 (#4392) 2023-08-15 23:25:14 +08:00
bloom.py [misc] resolve code factor issues (#4433) 2023-08-15 23:25:14 +08:00
chatglm.py [misc] resolve code factor issues (#4433) 2023-08-15 23:25:14 +08:00
gpt2.py [misc] resolve code factor issues (#4433) 2023-08-15 23:25:14 +08:00
gpt2_seq.py [shardformer/sequence parallel] Cherry pick commit to new branch (#4450) 2023-08-16 15:41:20 +08:00
jit.py [Shardformer] Merge flash attention branch to pipeline branch (#4362) 2023-08-15 23:25:14 +08:00
llama.py [misc] resolve code factor issues (#4433) 2023-08-15 23:25:14 +08:00
opt.py [misc] resolve code factor issues (#4433) 2023-08-15 23:25:14 +08:00
sam.py [Shardformer] Merge flash attention branch to pipeline branch (#4362) 2023-08-15 23:25:14 +08:00
t5.py [misc] resolve code factor issues (#4433) 2023-08-15 23:25:14 +08:00
vit.py [misc] resolve code factor issues (#4433) 2023-08-15 23:25:14 +08:00
whisper.py [shardformer] update shardformer to use flash attention 2 (#4392) 2023-08-15 23:25:14 +08:00