ColossalAI

Commit Graph

Author	SHA1	Message	Date
flybird11111	a27e0bb494	[shardformer] bert support sequence parallel. (#4455 ) * [shardformer] bert support sequence parallel [shardformer] bert support sequence parallel [shardformer] bert support sequence parallel [shardformer] bert support sequence parallel [shardformer] bert support sequence parallel [shardformer] bert support sequence parallel [shardformer] bert support sequence parallel [shardformer] bert support sequence parallel [shardformer] bert support sequence parallel * [shardformer] bert support sequence parallel [shardformer] bert support sequence parallel [shardformer] bert support sequence parallel * [shardformer] bert support sequence parallel	2023-08-18 18:04:55 +08:00
flybird11111	0ecd71e041	[shardformer] bloom support sequence parallel (#4465 ) [shardformer] bloom support sequence parallel	2023-08-18 15:34:18 +08:00
Bin Jia	7c8be77081	[shardformer/sequence parallel] support gpt2 seq parallel with pp/dp/tp (#4460 ) * support gpt2 seq parallel with pp/dp/tp * fix a bug when waiting for stream done * delete unused gpt2_seq file	2023-08-18 11:21:53 +08:00
LuGY	a78daf6180	[shardformer] support interleaved pipeline (#4448 ) * support interleaved pipeline * fix unit test * remove virtual stage test in stage mgr * add droped type hint and updated bwd	2023-08-16 19:29:03 +08:00
Baizhou Zhang	6ef33f75aa	[shardformer] support DDP in HybridPlugin/add tp+dp tests (#4446 ) * support DDP for HybridPlugin/add tp+dp tests * add docstring for HybridParallelPlugin	2023-08-16 16:11:57 +08:00
Bin Jia	424629fea0	[shardformer/sequence parallel] Cherry pick commit to new branch (#4450 ) * [shardformer/sequence parallel] Support sequence parallel for gpt2 (#4384) * [sequence parallel] add sequence parallel linear col/row support (#4336) * add sequence parallel linear col/row support * add annotation * add annotation * add support for gpt2 fused qkv linear layer * support sequence parallel in GPT2 * add docstring and note * add requirments * remove unused flash-attb * modify flash attn test * modify flash attn setting * modify flash attn code * add assert before divide, rename forward function * [shardformer/test] fix gpt2 test with seq-parallel * [shardformer/sequence parallel] Overlap input gather and grad computation during col backward (#4401) * overlap gather input / grad computing during col backward * modify test for overlap * simplify code * fix code and modify cuda stream synchronize * [shardformer/sequence parallel] polish code	2023-08-16 15:41:20 +08:00
github-actions[bot]	d20dceb9a3	[format] applied code formatting on changed files in pull request 4441 (#4445 ) Co-authored-by: github-actions <github-actions@github.com>	2023-08-16 10:47:23 +08:00
ver217	5d4efdf58f	[shardformer] fix import	2023-08-15 23:25:14 +08:00
ver217	73a4144b91	[shardformer] fix embedding	2023-08-15 23:25:14 +08:00
Hongxin Liu	172f7fa3cf	[misc] resolve code factor issues (#4433 )	2023-08-15 23:25:14 +08:00
flybird11111	108e54a0b4	[shardformer]update t5 tests for using all optimizations. (#4407 ) * [shardformer] gpt2 tests fix [shardformer] test all optimizations (#4399) [shardformer] test all optimizations [shardformer] test all optimizations [shardformer] test all optimizations [shardformer] gpt2 tests fix * [shardformer]update t5 to use all optimizations	2023-08-15 23:25:14 +08:00
flybird11111	1edc9b5fb3	[shardformer] update tests for all optimization (#4413 ) [shardformer] update tests for all optimization	2023-08-15 23:25:14 +08:00
Baizhou Zhang	7711bd524a	[shardformer] rewrite tests for opt/bloom/llama/vit/chatglm (#4395 ) * rewrite opt tests * rewrite llama tests * rewrite bloom & vit tests * rewrite chatglm tests * fix LinearCol for classfiers * add judge for other tp layers, fix lazy init in util	2023-08-15 23:25:14 +08:00
flybird1111	d2cd48e0be	[shardformer] test all optimizations (#4399 ) [shardformer] test all optimizations [shardformer] test all optimizations [shardformer] test all optimizations	2023-08-15 23:25:14 +08:00
flybird1111	7a3dfd0c64	[shardformer] update shardformer to use flash attention 2 (#4392 ) * cherry-pick flash attention 2 cherry-pick flash attention 2 * [shardformer] update shardformer to use flash attention 2 [shardformer] update shardformer to use flash attention 2, fix [shardformer] update shardformer to use flash attention 2, fix [shardformer] update shardformer to use flash attention 2, fix	2023-08-15 23:25:14 +08:00
Baizhou Zhang	ed4c448488	[pipeline] rewrite t5 tests & support multi-tensor transmitting in pipeline (#4388 ) * fix remaining t5 bugs/rewrite t5 tests * fix multi-tensor communication in pipeline * rearrange test_config * fix keyerror in sync_shared_params * fix get_held_layers & Randomnizer, complete t5 tests * erase printing * fix get_held_layers through modifying _release_unheld_layers * fix _get_recursive_held_layers bug	2023-08-15 23:25:14 +08:00
flybird1111	906426cb44	[Shardformer] Merge flash attention branch to pipeline branch (#4362 ) * [shardformer] supported flash attention test dependency (#4158) * [shardformer] fix flash attention utils test (#4180) * [shardformer] opt support flash attention (#4163) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] move to modeling * [shardformer] move to modeling * [shardformer] add performance benchmark of shardformer (#4175) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] benchmark fix * [shardformer] benchmark fix * [shardformer] llama support flash attention (#4185) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] move to modeling * [shardformer] move to modeling * [shardformer] llama support flash attention * [shardformer] llama support flash attention * [shardformer] Move the import statement for xformer outside the forward function. * [shardformer] gpt2 support flash attention. (#4191) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] move to modeling * [shardformer] move to modeling * [shardformer] gpt2 support flash attention * [shardformer] gpt2 support flash attention * [shardformer] gpt2 support flash attention * [shardformer] bloom support flash attention (#4188) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] move to modeling * [shardformer] move to modeling * [shardformer] bloom suport flash attention * [shardformer] add assert to sequence length * [shardformer] fix * [shardformer] fix * [shardformer] fix * [shardformer] bert support flash attention. (#4206) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] move to modeling * [shardformer] move to modeling * [shardformer] bert support flash attention * [shardformer] t5 support flash attention. (#4216) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] move to modeling * [shardformer] move to modeling * [shardformer] t5 support flash attention * [shardformer] t5 support flash attention * fix typo * fix typo * fix typo * fix typo * fix typo * fix typo * [shardformer] support 'paddedcausal' type of attention mask in Coloattention. (#4215) * added padded causal attn mask type for ColoAttention * [shardformer]t5 flash attention fix (#4239) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] move to modeling * [shardformer] move to modeling * [shardformer] t5 flash attention fix * [shardformer] update gpt2 to use coloattention. (#4234) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] move to modeling * [shardformer] move to modeling * [shardformer] update gpt2 to use coloattention * [shardformer] update gpt2 to use coloattention * [shardformer] update gpt2 to use coloattention * [shardformer] update gpt2 to use coloattention * [shardformer] update gpt2 * [shardformer] update opt and llama to use coloattention. (#4226) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] move to modeling * [shardformer] move to modeling * update opt to use coloattention * [shardformer]update opt to use coloattention * [shardformer]update opt to use coloattention * [shardformer]update opt to use coloattention * [shardformer]update opt to use coloattention * [shardformer]update opt to use coloattention * [shardformer]update opt to use coloattention * [shardformer]update opt * [shardformer] shardformer support jit fused operator. (#4236) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] move to modeling * [shardformer] move to modeling * [shardformer] bloom support jit fused operator * [shardformer] bloom support jit fused operator * [shardformer] bloom support jit fused operator * [shardformer] t5 support jit fused operator * [shardformer] t5 support jit fused operator * [shardformer] t5 support jit fused operator * [shardformer] add roadmap of flash attention * [shardformer] add roadmap of flash attention * [shardformer] add roadmap of flash attention * [shardformer] add type hint to 'self' param of forward * [shardformer] merge feature/shardformer-models branch to feature/flash-attention-shardformer branch. (#4290) * Feature/vit support (#4182) * [shardformer] added tests * [shardformer] vit test finish and support * fix attention dropout * [shardformer] support SAM (#4231) * 1.support sam 2.add fused qkv for nn.Linear * update utils support set element in list * overtwrite SamVisionAttention foward to use DropoutForParallelInput * remove unused code * [shardformer] support whisper (#4212) * support whisper * fix bug in vocabembedding * support downstream model of whisper * update readme * Feature/chatglm (#4240) * [shardformer] added tests * [shardformer] vit test finish and support * [shardformer] chatglm ready * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] chatglm shard without mlp sharding * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] fix chatglm configuration with pre-commit --------- Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com> Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com> * [shardformer] whisper support flash attention (#4301) * Feature/vit support (#4182) * [shardformer] added tests * [shardformer] vit test finish and support * fix attention dropout * [shardformer] support SAM (#4231) * 1.support sam 2.add fused qkv for nn.Linear * update utils support set element in list * overtwrite SamVisionAttention foward to use DropoutForParallelInput * remove unused code * [shardformer] support whisper (#4212) * support whisper * fix bug in vocabembedding * support downstream model of whisper * update readme * Feature/chatglm (#4240) * [shardformer] added tests * [shardformer] vit test finish and support * [shardformer] chatglm ready * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] chatglm shard without mlp sharding * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] fix chatglm configuration with pre-commit * [shardformer] whisper support flash attention * [shardformer] whisper support flash attention * [shardformer]whisper support jit operator --------- Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com> Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com> * [shardformer] sam support flash attention (#4316) * Feature/vit support (#4182) * [shardformer] added tests * [shardformer] vit test finish and support * fix attention dropout * [shardformer] support SAM (#4231) * 1.support sam 2.add fused qkv for nn.Linear * update utils support set element in list * overtwrite SamVisionAttention foward to use DropoutForParallelInput * remove unused code * [shardformer] support whisper (#4212) * support whisper * fix bug in vocabembedding * support downstream model of whisper * update readme * Feature/chatglm (#4240) * [shardformer] added tests * [shardformer] vit test finish and support * [shardformer] chatglm ready * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] chatglm shard without mlp sharding * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] fix chatglm configuration with pre-commit * [shardformer] sam support flash attention --------- Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com> Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com> * [shardformer] merge blip2/chatglm (#4321) * Feature/vit support (#4182) * [shardformer] added tests * [shardformer] vit test finish and support * fix attention dropout * [shardformer] support SAM (#4231) * 1.support sam 2.add fused qkv for nn.Linear * update utils support set element in list * overtwrite SamVisionAttention foward to use DropoutForParallelInput * remove unused code * [shardformer] support whisper (#4212) * support whisper * fix bug in vocabembedding * support downstream model of whisper * update readme * Feature/chatglm (#4240) * [shardformer] added tests * [shardformer] vit test finish and support * [shardformer] chatglm ready * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] chatglm shard without mlp sharding * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] fix chatglm configuration with pre-commit * [shardformer] added tests * [shardformer] vit test finish and support * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit * [shardformer] support Blip2 (#4243) * support base blip2 * add support for downstream blip2 model * update readme * add forward injection * skip not compatible models test * fix test for gemini and low_level_zero_pugin --------- Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com> Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com> Co-authored-by: klhhhhh <1412841649@qq.com> * [shardformer] blip2 support flash attention and jit operator (#4325) * Feature/vit support (#4182) * [shardformer] added tests * [shardformer] vit test finish and support * fix attention dropout * [shardformer] support SAM (#4231) * 1.support sam 2.add fused qkv for nn.Linear * update utils support set element in list * overtwrite SamVisionAttention foward to use DropoutForParallelInput * remove unused code * [shardformer] support whisper (#4212) * support whisper * fix bug in vocabembedding * support downstream model of whisper * update readme * Feature/chatglm (#4240) * [shardformer] added tests * [shardformer] vit test finish and support * [shardformer] chatglm ready * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] chatglm shard without mlp sharding * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] fix chatglm configuration with pre-commit * [shardformer] added tests * [shardformer] vit test finish and support * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit * [shardformer] support Blip2 (#4243) * support base blip2 * add support for downstream blip2 model * update readme * add forward injection * skip not compatible models test * fix test for gemini and low_level_zero_pugin * [shardformer] blip2 support flash attention and jit operator * [shardformer] blip2 support flash attention and jit operator * [shardformer] blip2 support flash attention and jit operator --------- Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com> Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com> Co-authored-by: klhhhhh <1412841649@qq.com> * [shardformer] chatglm support flash attention and jit operator (#4330) * Feature/vit support (#4182) * [shardformer] added tests * [shardformer] vit test finish and support * fix attention dropout * [shardformer] support SAM (#4231) * 1.support sam 2.add fused qkv for nn.Linear * update utils support set element in list * overtwrite SamVisionAttention foward to use DropoutForParallelInput * remove unused code * [shardformer] support whisper (#4212) * support whisper * fix bug in vocabembedding * support downstream model of whisper * update readme * Feature/chatglm (#4240) * [shardformer] added tests * [shardformer] vit test finish and support * [shardformer] chatglm ready * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] chatglm shard without mlp sharding * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] fix chatglm configuration with pre-commit * [shardformer] added tests * [shardformer] vit test finish and support * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit * [shardformer] support Blip2 (#4243) * support base blip2 * add support for downstream blip2 model * update readme * add forward injection * skip not compatible models test * fix test for gemini and low_level_zero_pugin * [shardformer] chatglm support flash attention and jit operator * [shardformer] chatglm support flash attention and jit operator * [shardformer] chatglm support flash attention and jit operator * [shardformer] chatglm support flash attention and jit operator --------- Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com> Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com> Co-authored-by: klhhhhh <1412841649@qq.com> * [shardformer] vit support flash attention and jit operator (#4334) * Feature/vit support (#4182) * [shardformer] added tests * [shardformer] vit test finish and support * fix attention dropout * [shardformer] support SAM (#4231) * 1.support sam 2.add fused qkv for nn.Linear * update utils support set element in list * overtwrite SamVisionAttention foward to use DropoutForParallelInput * remove unused code * [shardformer] support whisper (#4212) * support whisper * fix bug in vocabembedding * support downstream model of whisper * update readme * Feature/chatglm (#4240) * [shardformer] added tests * [shardformer] vit test finish and support * [shardformer] chatglm ready * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] chatglm shard without mlp sharding * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] fix chatglm configuration with pre-commit * [shardformer] added tests * [shardformer] vit test finish and support * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit * [shardformer] support Blip2 (#4243) * support base blip2 * add support for downstream blip2 model * update readme * add forward injection * skip not compatible models test * fix test for gemini and low_level_zero_pugin * [shardformer] vit support flash attention and jit operator * [shardformer] vit support flash attention and jit operator --------- Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com> Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com> Co-authored-by: klhhhhh <1412841649@qq.com> * [pipeline] merge flash attention branch * [pipeline] merge flash attention branch * [pipeline] merge flash attention branch * [pipeline] fix conflict * [pipeline] fix conflict * Merge branch 'feature/pipeline' into feature/pipeline * Merge branch 'feature/pipeline' into feature/pipeline * Merge branch 'feature/pipeline' into feature/pipeline * activate checks * activate checks * activate checks * activate checks * activate checks * activate checks * activate checks * activate checks * fix flash attention tests * gemini ignore whisper * fix vit * fix xformers import handle --------- Co-authored-by: Frank Lee <somerlee.9@gmail.com> Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com> Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com> Co-authored-by: klhhhhh <1412841649@qq.com>	2023-08-15 23:25:14 +08:00
Jianghai	a88e92251d	[pipeline] add chatglm (#4363 ) * add pipeline policy and bert forward to be done * add bertmodel pipeline forward and make tests * add Bert_Policy and test for policy * update formatting * update formatting * update the code * fix bugs * fix name confilt * add bloom model and policy ,revise the base class of policy * revise * revision * add bert_for_pretraining * add bert_for_pretraining forward and policy * fix typos * cancel warning * change the imediate output to default dict * change the default output of get_shared_params * add chatglm * add * chatglm * chatglm * finish chatglm * deletes * fix rmsnorm * chatglm * fix chatglm shard * init	2023-08-15 23:25:14 +08:00
Baizhou Zhang	b1feeced8e	[shardformer] add util functions for shardformer tests/fix sync_shared_param (#4366 ) * add util functions for shardformer tests & rewrite gpt2 test * fix shared_params & embedding/merging * fix precision	2023-08-15 23:25:14 +08:00
FoolPlayer	726541afe2	update some module with new api version	2023-08-15 23:25:14 +08:00
FoolPlayer	879301d0da	[shardformer] support Blip2 (#4243 ) * support base blip2 * add support for downstream blip2 model * update readme * add forward injection * skip not compatible models test * fix test for gemini and low_level_zero_pugin	2023-08-15 23:25:14 +08:00
klhhhhh	8120eca0c0	[shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit	2023-08-15 23:25:14 +08:00
klhhhhh	91850fe984	[shardformer] register without auto policy	2023-08-15 23:25:14 +08:00
klhhhhh	1a29e8fc29	[shardformer] polish chatglm code	2023-08-15 23:25:14 +08:00
klhhhhh	8620009dd7	[sharformer] add first version of policy of chatglm	2023-08-15 23:25:14 +08:00
Kun Lin	ed34bb1310	Feature/chatglm (#4240 ) * [shardformer] added tests * [shardformer] vit test finish and support * [shardformer] chatglm ready * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] chatglm shard without mlp sharding * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] fix chatglm configuration with pre-commit	2023-08-15 23:25:14 +08:00
FoolPlayer	9ee4ebea83	[shardformer] support whisper (#4212 ) * support whisper * fix bug in vocabembedding * support downstream model of whisper * update readme	2023-08-15 23:25:14 +08:00
FoolPlayer	dd2bf02679	[shardformer] support SAM (#4231 ) * 1.support sam 2.add fused qkv for nn.Linear * update utils support set element in list * overtwrite SamVisionAttention foward to use DropoutForParallelInput * remove unused code	2023-08-15 23:25:14 +08:00
Kun Lin	c59d7aca09	Feature/vit support (#4182 ) * [shardformer] added tests * [shardformer] vit test finish and support * fix attention dropout	2023-08-15 23:25:14 +08:00
Baizhou Zhang	0ceec8f9a9	[pipeline] support fp32 for HybridPlugin/merge shardformer test and pipeline test into one file (#4354 ) * add naive optimizer for 3DPlugin/refactor gpt2 shardformer test * merge tests of PP/DP/TP combinations into one test file * fix bug when sync grad for dp in HybridPlugin * update supported precisions for 3DPlugin/fix bug when shifting tp_degree * improve the passing of lazy_init * modify lazy_init/use sync_shared_params	2023-08-15 23:25:14 +08:00
Jianghai	f13954cd58	[pipeline] refactor test pipeline and remove useless utils in pipeline (#4324 ) * refactor tests * refactor bloom model * finish policy tests * refactor tests * fix test pure pipeline * remove test pipeline and cutdown launch process * refactor tests * refactor bloom model * finish policy tests * refactor tests * fix test pure pipeline * remove test pipeline and cutdown launch process	2023-08-15 23:25:14 +08:00
Baizhou Zhang	da3cef27ad	[pipeline] fix return_dict/fix pure_pipeline_test (#4331 )	2023-08-15 23:25:14 +08:00
Hongxin Liu	261eab02fb	[plugin] add 3d parallel plugin (#4295 ) * [amp] add mixed precision optimizer * [plugin] add 3d parallel plugin * [booster] support pipeline * [plugin] 3d parallel plugin support clip grad norm * [shardformer] fix sharder and add plugin test * [plugin] rename 3d parallel plugin * [ci] support testmon core pkg change detection (#4305) * [hotfix] debug testmon * [hotfix] fix llama * [hotfix] fix p2p bugs * [hotfix] fix requirements	2023-08-15 23:25:14 +08:00
FoolPlayer	b3f5d7a3ba	[shardformer] support pipeline base vit model (#4284 ) * Feature/vit support (#4182) * [shardformer] added tests * [shardformer] vit test finish and support * fix attention dropout * support base vit pipeline * support vit downstream model * fix vit shard test * modify hidden states return type --------- Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com>	2023-08-15 23:25:14 +08:00
Baizhou Zhang	083d7da33d	[pipeline] add pipeline support for all T5 models (#4310 ) * complete policy for T5Model & T5ForConditionalGeneration * modify function signature in forwards * add forward for T5model * add forward for T5ForConditionalGeneration * fix a bug * fix hidden_states transporting in decoder * fix the passing of encoder_outputs	2023-08-15 23:25:14 +08:00
Jianghai	d0807122e2	[pipeline] test pure pipeline process using llama (#4218 ) * bloom policy * llama pipeline forward and tests * fix the output and attention_mask * fix name * bind argument to policy * Revert "bloom policy" This reverts commit `8dee68a0a2`. This policy should be revert and copied to feature/bloom * revert the bloom changes * cancel unneeded inputs * gpt * finish llama * causal lm and sequence classification * revision * add pure pipeline test * fixed version * fixed version * pure pipeline	2023-08-15 23:25:14 +08:00
Baizhou Zhang	36e546b2cc	[pipeline] add pipeline support for T5Stack/T5EncoderModel (#4300 ) * modify t5 policy & add test * pipeline stage distribution for t5 * complete t5 base policy * t5 stack: halfway * modify gpt2 pipeline test * complete pipeline forward for T5Stack/T5EncoderModel * fix docstring * move t5 util tests to test_pipeline	2023-08-15 23:25:14 +08:00
Jianghai	18ebcf406a	[pipeline] reformat for unified design (#4283 ) * bert_reformat * reformat * reformat * fix a typo * format * format * fix bug	2023-08-15 23:25:14 +08:00
Jianghai	0a8f3c851a	[hotfix] fix opt pipeline (#4293 ) * opt forward and test * pause * finish opt model pipeline * finish opt pipeline * opt forward and test * pause * finish opt model pipeline * finish opt pipeline * fix opt * set transformers version * refactor the test pipeline * fix bug	2023-08-15 23:25:14 +08:00
Jianghai	d8408d185c	[pipeline] OPT model pipeline (#4258 ) * opt forward and test * pause * finish opt model pipeline * finish opt pipeline * opt forward and test * pause * finish opt model pipeline * finish opt pipeline * fix opt * set transformers version * refactor the test pipeline	2023-08-15 23:25:14 +08:00
Baizhou Zhang	b774d5ea0f	[pipeline] refactor gpt2 pipeline forwards (#4287 ) * move gpt2 pipeline forwards to modeling folder * check pipeline status when adding replacing policy * fix typehint * fix arguments processing in gpt2_model_forward	2023-08-15 23:25:14 +08:00
Hongxin Liu	d921ce8391	[shardformer] support inplace sharding (#4251 ) * [shardformer] embedding support inplace sharding * [shardformer] linear support inplace sharding * [shardformer] layernorm support inplace sharding * [shardformer] qkv support inplace sharding * [test] update shardformer layer test * [shardformer] fix shared param sharding * [shardformer] fix bert policy * [shardformer] fix bloom policy * [shardformer] fix llama policy * [shardformer] fix opt policy * [shardformer] fix t5 policy * [shardformer] fix fused qkv linear * [shardformer] fix bugs * force sync * [test] fix bugs * [test] fix transformer version	2023-08-15 23:25:14 +08:00
Baizhou Zhang	2a2eacfaf1	[pipeline] support shardformer for GPT2ForQuestionAnswering & complete pipeline support for GPT2 (#4245 ) * change for transformers loggers * add forward for GPT2ForQuestionAnswering * fix assert * fix torchrec test	2023-08-15 23:25:14 +08:00
Jianghai	34f0e34a4c	[pipeline] finish bloom models pipeline and tests (#4223 ) * bloom policy * llama pipeline forward and tests * fix the output and attention_mask * fix name * bind argument to policy * finish bloom model * test shard gpt2 * clear cache * support all bloom models * add bloom models policies * finish bloom pipeline and tests * add set pipeline * finish bloom	2023-08-15 23:25:14 +08:00
Jianghai	e7cc62d735	[pipeline] All bert models (#4233 ) * bloom policy * llama pipeline forward and tests * fix the output and attention_mask * fix name * bind argument to policy * Revert "bloom policy" This reverts commit `8dee68a0a2`. This policy should be revert and copied to feature/bloom * revert the bloom changes * cancel unneeded inputs * gpt * finish llama * causal lm and sequence classification * revision * add pure pipeline test * finish some bert models * finish all bert models * finish bert tests * fix bugs * fix bugs * fix test pipeline * fix data gen for qa * update the set pipeline forward * shared params * fix bugs	2023-08-15 23:25:14 +08:00
Baizhou Zhang	a14d352088	[pipeline] add pipeline forward for variants of gpt2 (#4238 ) * add forward for GPTLMHeadModel * add test for gpt_lm * arranging get_held_layers method * arrange forward replacement * add forward for GPT2ForTokenClassification * add forward for GPT2ForSequenceClassification * fix test_shard_gpt2.py * add GPT2DoubleHeadsmodel & fix bugs * add id checking in get_shared_params	2023-08-15 23:25:14 +08:00
Hongxin Liu	7e4de520e1	[shardformer] fix base policy (#4229 )	2023-08-15 23:25:14 +08:00
Baizhou Zhang	208ac8f2ba	[pipeline] Add Pipeline Forward for GPT2Model Shardformer (#4224 ) * * fix typehint & docstring in sharder.py * * update pipeline forward for GPT2Model * * add test for pipeline forward of GPT2Model * * add cache cleaning in gpt2 test * * change assert to raise command	2023-08-15 23:25:14 +08:00
Jianghai	37d22f6878	[pipeline] add bloom model pipeline (#4210 ) * bloom policy * llama pipeline forward and tests * fix the output and attention_mask * fix name * bind argument to policy * finish bloom model * test shard gpt2 * clear cache	2023-08-15 23:25:14 +08:00
Jianghai	31bcf867ae	[pipeline] Llama causal lm and llama for sequence classification pipeline (#4208 ) * bloom policy * llama pipeline forward and tests * fix the output and attention_mask * fix name * bind argument to policy * Revert "bloom policy" This reverts commit `8dee68a0a2`. This policy should be revert and copied to feature/bloom * revert the bloom changes * cancel unneeded inputs * gpt * finish llama * causal lm and sequence classification * revision	2023-08-15 23:25:14 +08:00
Jianghai	1622031058	[pipeline] Llama pipeline (#4205 ) * bloom policy * llama pipeline forward and tests * fix the output and attention_mask * fix name * bind argument to policy * Revert "bloom policy" This reverts commit `8dee68a0a2`. This policy should be revert and copied to feature/bloom * revert the bloom changes * cancel unneeded inputs * gpt	2023-08-15 23:25:14 +08:00
Jianghai	1094e0f0d3	[pipeline] Bert pipeline for shardformer and its tests (#4197 ) * add pipeline forward * complete pipeline forward check * fix bert forward without pipeline * fix comments * discard useless line * add todo * clean prints * fix distribute layers	2023-08-15 23:25:14 +08:00
Hongxin Liu	890774b2fb	[shardformer] support lazy init (#4202 ) * [shardformer] support lazy init * [shardformer] linear support lazy init * [shardformer] embedding support lazy init * [shardformer] norm support lazy init * [shardformer] fused linear support lazy init * [test] update shardformer test layer * [test] shardformer with lazy init fit ddp * [lazy] hotfix deepcopy of param * [shardformer] fix bert policy and update test * [shardformer] fix bloom policy and update test * [shardformer] fix opt policy and update test * [shardformer] fix t5 policy and update test * [shardformer] fix gpt2 policy and update test * [shardformer] fix llama policy and update test	2023-08-15 23:25:14 +08:00
Jianghai	f3bcc292c8	[pipeline] move bert related pipeline components to shardformer (#4187 ) * move bert related pipeline components to shardformer * fix bugs * revision * fix bert model tests * fix bert_lm_head model tests * fix tests * fix tests * done checks * skip bloom	2023-08-15 23:25:14 +08:00
Jianghai	c5ea728016	[pipeline] add bert_for_pretraining bert_lmhead forward and policy (#4172 ) * add pipeline policy and bert forward to be done * add bertmodel pipeline forward and make tests * add Bert_Policy and test for policy * update formatting * update formatting * update the code * fix bugs * fix name confilt * add bloom model and policy ,revise the base class of policy * revise * revision * add bert_for_pretraining * add bert_for_pretraining forward and policy * fix typos * cancel warning * change the imediate output to default dict * change the default output of get_shared_params	2023-08-15 23:25:14 +08:00
ver217	d35bd7d0e6	[shardformer] fix type hint	2023-08-15 23:25:14 +08:00
ver217	1ed3f8a24f	[shardformer] rename policy file name	2023-08-15 23:25:14 +08:00
ver217	b0b8ad2823	[pipeline] update shardformer docstring	2023-08-15 23:25:14 +08:00
ver217	59f6f573f1	[pipeline] update shardformer policy	2023-08-15 23:25:14 +08:00
Jianghai	90a65ea682	[pipeline] build bloom model and policy , revise the base class of policy (#4161 ) * add pipeline policy and bert forward to be done * add bertmodel pipeline forward and make tests * add Bert_Policy and test for policy * update formatting * update formatting * update the code * fix bugs * fix name confilt * add bloom model and policy ,revise the base class of policy * revise * revision * add bert_for_pretraining	2023-08-15 23:25:14 +08:00
Jianghai	e8e7e49243	[pipeline]add pipeline policy and bert forward (#4130 ) * add pipeline policy and bert forward to be done * add bertmodel pipeline forward and make tests * add Bert_Policy and test for policy * update formatting * update formatting * update the code * fix bugs * fix name confilt	2023-08-15 23:25:14 +08:00
Hongxin Liu	f51ce1bc8e	[pipeline] refactor 1f1b schedule (#4115 ) * [api] update optimizer wrapper to fit pipeline * [pipeline] add base schedule * [pipeline] add 1f1b schedule * [test] add pipeline schedule utils test * [pipeline] fix import	2023-08-15 23:25:14 +08:00
Hongxin Liu	45fdc9b42c	[pipeline] implement p2p communication (#4100 ) * [pipeline] add p2p communication * [test] add p2p communication test * [test] add rerun decorator * [test] rename to avoid conflict	2023-08-15 23:25:14 +08:00
Hongxin Liu	422544222f	[pipeline] add stage manager (#4093 ) * [pipeline] add stage manager * [test] add pipeline stage manager test * [pipeline] add docstring for stage manager	2023-08-15 23:25:14 +08:00
Hongxin Liu	5e1a9d48dd	[cluster] add process group mesh (#4039 ) * [cluster] add process group mesh * [test] add process group mesh test * force sync	2023-08-15 23:25:14 +08:00
LuGY	d86ddd9b29	[hotfix] fix unsafe async comm in zero (#4404 ) * improve stablility of zero * fix wrong index * add record stream	2023-08-11 15:09:24 +08:00
Baizhou Zhang	6ccecc0c69	[gemini] fix tensor storage cleaning in state dict collection (#4396 )	2023-08-10 15:36:46 +08:00
binmakeswell	089c365fa0	[doc] add Series A Funding and NeurIPS news (#4377 ) * [doc] add Series A Funding and NeurIPS news * [kernal] fix mha kernal * [CI] skip moe * [CI] fix requirements	2023-08-04 17:42:07 +08:00
flybird1111	38b792aab2	[coloattention] fix import error (#4380 ) fixed an import error	2023-08-04 16:28:41 +08:00
flybird1111	25c57b9fb4	[fix] coloattention support flash attention 2 (#4347 ) Improved ColoAttention interface to support flash attention 2. Solved #4322	2023-08-04 13:46:22 +08:00
Hongxin Liu	16bf4c0221	[test] remove useless tests (#4359 ) * [test] remove legacy zero test * [test] remove lazy distribute test * [test] remove outdated checkpoint io	2023-08-01 18:52:14 +08:00
LuGY	03654c0ce2	fix localhost measurement (#4320 )	2023-08-01 10:14:00 +08:00
LuGY	45b08f08cb	[zero] optimize the optimizer step time (#4221 ) * optimize the optimizer step time * fix corner case * polish * replace all-reduce with all-gather * set comm device to cuda	2023-07-31 22:13:29 +08:00
LuGY	1a49a5ea00	[zero] support shard optimizer state dict of zero (#4194 ) * support shard optimizer of zero * polish code * support sync grad manually	2023-07-31 22:13:29 +08:00
LuGY	dd7cc58299	[zero] add state dict for low level zero (#4179 ) * add state dict for zero * fix unit test * polish	2023-07-31 22:13:29 +08:00
LuGY	c668801d36	[zero] allow passing process group to zero12 (#4153 ) * allow passing process group to zero12 * union tp-zero and normal-zero * polish code	2023-07-31 22:13:29 +08:00
LuGY	79cf1b5f33	[zero]support no_sync method for zero1 plugin (#4138 ) * support no sync for zero1 plugin * polish * polish	2023-07-31 22:13:29 +08:00
LuGY	c6ab96983a	[zero] refactor low level zero for shard evenly (#4030 ) * refactor low level zero * fix zero2 and support cpu offload * avg gradient and modify unit test * refactor grad store, support layer drop * refactor bucket store, support grad accumulation * fix and update unit test of zero and ddp * compatible with tp, ga and unit test * fix memory leak and polish * add zero layer drop unittest * polish code * fix import err in unit test * support diffenert comm dtype, modify docstring style * polish code * test padding and fix * fix unit test of low level zero * fix pad recording in bucket store * support some models * polish	2023-07-31 22:13:29 +08:00
dayellow	a50d39a143	[NFC] fix: format (#4270 ) * [NFC] polish colossalai/fx/profiler/experimental/profiler_module/embedding.py code style * [NFC] polish colossalai/communication/utils.py code style --------- Co-authored-by: Minghao Huang <huangminghao@luchentech.com>	2023-07-26 14:12:57 +08:00
Wenhao Chen	fee553288b	[NFC] polish runtime_preparation_pass style (#4266 )	2023-07-26 14:12:57 +08:00
YeAnbang	3883db452c	[NFC] polish unary_elementwise_generator.py code style (#4267 ) Co-authored-by: aye42 <aye42@gatech.edu>	2023-07-26 14:12:57 +08:00
梁爽	abe4f971e0	[NFC] polish colossalai/booster/plugin/low_level_zero_plugin.py code style (#4256 ) Co-authored-by: supercooledith <893754954@qq.com>	2023-07-26 14:12:57 +08:00
Yanjia0	c614a99d28	[NFC] polish colossalai/auto_parallel/offload/amp_optimizer.py code style (#4255 )	2023-07-26 14:12:57 +08:00
ocd_with_naming	85774f0c1f	[NFC] polish colossalai/cli/benchmark/utils.py code style (#4254 )	2023-07-26 14:12:57 +08:00
Michelle	86cf6aed5b	Fix/format (#4261 ) * revise shardformer readme (#4246) * [example] add llama pretraining (#4257) * [NFC] polish colossalai/communication/p2p.py code style --------- Co-authored-by: Jianghai <72591262+CjhHa1@users.noreply.github.com> Co-authored-by: binmakeswell <binmakeswell@gmail.com> Co-authored-by: Qianran Ma <qianranm@luchentech.com>	2023-07-26 14:12:57 +08:00
Jianghai	b366f1d99f	[NFC] Fix format for mixed precision (#4253 ) * [NFC] polish colossalai/booster/mixed_precision/mixed_precision_base.py code style	2023-07-26 14:12:57 +08:00
Baizhou Zhang	c6f6005990	[checkpointio] Sharded Optimizer Checkpoint for Gemini Plugin (#4302 ) * sharded optimizer checkpoint for gemini plugin * modify test to reduce testing time * update doc * fix bug when keep_gatherd is true under GeminiPlugin	2023-07-21 14:39:01 +08:00
Hongxin Liu	fc5cef2c79	[lazy] support init on cuda (#4269 ) * [lazy] support init on cuda * [test] update lazy init test * [test] fix transformer version	2023-07-19 16:43:01 +08:00
Cuiqing Li	4b977541a8	[Kernels] added triton-implemented of self attention for colossal-ai (#4241 ) * added softmax kernel * added qkv_kernel * added ops * adding tests * upload tets * fix tests * debugging * debugging tests * debugging * added * fixed errors * added softmax kernel * clean codes * added tests * update tests * update tests * added attention * add * fixed pytest checking * add cuda check * fix cuda version * fix typo	2023-07-18 23:53:38 +08:00
Jianghai	9a4842c571	revise shardformer readme (#4246 )	2023-07-17 17:30:57 +08:00
Baizhou Zhang	58913441a1	Next commit [checkpointio] Unsharded Optimizer Checkpoint for Gemini Plugin (#4141 ) * [checkpointio] unsharded optimizer checkpoint for Gemini plugin * [checkpointio] unsharded optimizer checkpoint for Gemini using all_gather	2023-07-07 16:33:06 +08:00
Frank Lee	190a6ea9c2	[dtensor] fixed readme file name and removed deprecated file (#4162 )	2023-07-04 18:21:11 +08:00
Hongxin Liu	1908caad38	[cli] hotfix launch command for multi-nodes (#4165 )	2023-07-04 17:54:40 +08:00
digger yu	2ac24040eb	fix some typo colossalai/shardformer (#4160 )	2023-07-04 17:53:39 +08:00
github-actions[bot]	c77b3b19be	[format] applied code formatting on changed files in pull request 4152 (#4157 ) Co-authored-by: github-actions <github-actions@github.com>	2023-07-04 16:07:47 +08:00
Frank Lee	89f45eda5a	[shardformer] added development protocol for standardization (#4149 )	2023-07-04 16:05:01 +08:00
Frank Lee	1fb0d95df0	[shardformer] made tensor parallelism configurable (#4144 ) * [shardformer] made tensor parallelism configurable * polish code	2023-07-04 16:05:01 +08:00
Frank Lee	74257cb446	[shardformer] refactored some doc and api (#4137 ) * [shardformer] refactored some doc and api * polish code	2023-07-04 16:05:01 +08:00
jiangmingyan	7f9b30335b	[shardformer] write an shardformer example with bert finetuning (#4126 ) * [shardformer] add benchmark of shardformer * [shardformer] add benchmark of shardformer	2023-07-04 16:05:01 +08:00
Frank Lee	ae035d305d	[shardformer] added embedding gradient check (#4124 )	2023-07-04 16:05:01 +08:00

1 2 3 4 5 ...

1618 Commits (50e5602c2d6c8e25ad544cbecc38649e5257e7b8)