ColossalAI

Commit Graph

Author	SHA1	Message	Date
LuGY	a78daf6180	[shardformer] support interleaved pipeline (#4448 ) * support interleaved pipeline * fix unit test * remove virtual stage test in stage mgr * add droped type hint and updated bwd	1 year ago
Hongxin Liu	26e29d58f0	[devops] add large-scale distributed test marker (#4452 ) * [test] remove cpu marker * [test] remove gpu marker * [test] update pytest markers * [ci] update unit test ci	1 year ago
Baizhou Zhang	6ef33f75aa	[shardformer] support DDP in HybridPlugin/add tp+dp tests (#4446 ) * support DDP for HybridPlugin/add tp+dp tests * add docstring for HybridParallelPlugin	1 year ago
Bin Jia	424629fea0	[shardformer/sequence parallel] Cherry pick commit to new branch (#4450 ) * [shardformer/sequence parallel] Support sequence parallel for gpt2 (#4384) * [sequence parallel] add sequence parallel linear col/row support (#4336) * add sequence parallel linear col/row support * add annotation * add annotation * add support for gpt2 fused qkv linear layer * support sequence parallel in GPT2 * add docstring and note * add requirments * remove unused flash-attb * modify flash attn test * modify flash attn setting * modify flash attn code * add assert before divide, rename forward function * [shardformer/test] fix gpt2 test with seq-parallel * [shardformer/sequence parallel] Overlap input gather and grad computation during col backward (#4401) * overlap gather input / grad computing during col backward * modify test for overlap * simplify code * fix code and modify cuda stream synchronize * [shardformer/sequence parallel] polish code	1 year ago
github-actions[bot]	d20dceb9a3	[format] applied code formatting on changed files in pull request 4441 (#4445 ) Co-authored-by: github-actions <github-actions@github.com>	1 year ago
Hongxin Liu	172f7fa3cf	[misc] resolve code factor issues (#4433 )	1 year ago
flybird11111	328a791d10	[shardformer] update bloom/llama/vit/chatglm tests (#4420 ) [shardformer] update bloom/llama/vit/chatglm tests [shardformer] update opt tests [shardformer] update opt tests [shardformer] update bloom/llama/vit/chatglm tests [shardformer] update bloom/llama/vit/chatglm tests [shardformer] update bloom/llama/vit/chatglm tests	1 year ago
flybird11111	108e54a0b4	[shardformer]update t5 tests for using all optimizations. (#4407 ) * [shardformer] gpt2 tests fix [shardformer] test all optimizations (#4399) [shardformer] test all optimizations [shardformer] test all optimizations [shardformer] test all optimizations [shardformer] gpt2 tests fix * [shardformer]update t5 to use all optimizations	1 year ago
flybird11111	1edc9b5fb3	[shardformer] update tests for all optimization (#4413 ) [shardformer] update tests for all optimization	1 year ago
Baizhou Zhang	7711bd524a	[shardformer] rewrite tests for opt/bloom/llama/vit/chatglm (#4395 ) * rewrite opt tests * rewrite llama tests * rewrite bloom & vit tests * rewrite chatglm tests * fix LinearCol for classfiers * add judge for other tp layers, fix lazy init in util	1 year ago
flybird11111	21e0a42fd1	[shardformer]fix, test gpt2 for AMP+TP (#4403 ) * [shardformer] gpt2 tests fix [shardformer] test all optimizations (#4399) [shardformer] test all optimizations [shardformer] test all optimizations [shardformer] test all optimizations [shardformer] gpt2 tests fix * [shardformer] gpt2 tests fix	1 year ago
Jianghai	7596e9ae08	[pipeline] rewrite bert tests and fix some bugs (#4409 ) * add pipeline policy and bert forward to be done * add bertmodel pipeline forward and make tests * add Bert_Policy and test for policy * update formatting * update formatting * update the code * fix bugs * fix name confilt * add bloom model and policy ,revise the base class of policy * revise * revision * add bert_for_pretraining * add bert_for_pretraining forward and policy * fix typos * cancel warning * change the imediate output to default dict * change the default output of get_shared_params * rewrite bert test * rewrite bert test * fix some bugs * del pipeline tests * del pipeline tests * del useless print * del useless print * rewrite data repeats	1 year ago
flybird1111	d2cd48e0be	[shardformer] test all optimizations (#4399 ) [shardformer] test all optimizations [shardformer] test all optimizations [shardformer] test all optimizations	1 year ago
flybird1111	7a3dfd0c64	[shardformer] update shardformer to use flash attention 2 (#4392 ) * cherry-pick flash attention 2 cherry-pick flash attention 2 * [shardformer] update shardformer to use flash attention 2 [shardformer] update shardformer to use flash attention 2, fix [shardformer] update shardformer to use flash attention 2, fix [shardformer] update shardformer to use flash attention 2, fix	1 year ago
Baizhou Zhang	ed4c448488	[pipeline] rewrite t5 tests & support multi-tensor transmitting in pipeline (#4388 ) * fix remaining t5 bugs/rewrite t5 tests * fix multi-tensor communication in pipeline * rearrange test_config * fix keyerror in sync_shared_params * fix get_held_layers & Randomnizer, complete t5 tests * erase printing * fix get_held_layers through modifying _release_unheld_layers * fix _get_recursive_held_layers bug	1 year ago
flybird1111	906426cb44	[Shardformer] Merge flash attention branch to pipeline branch (#4362 ) * [shardformer] supported flash attention test dependency (#4158) * [shardformer] fix flash attention utils test (#4180) * [shardformer] opt support flash attention (#4163) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] move to modeling * [shardformer] move to modeling * [shardformer] add performance benchmark of shardformer (#4175) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] benchmark fix * [shardformer] benchmark fix * [shardformer] llama support flash attention (#4185) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] move to modeling * [shardformer] move to modeling * [shardformer] llama support flash attention * [shardformer] llama support flash attention * [shardformer] Move the import statement for xformer outside the forward function. * [shardformer] gpt2 support flash attention. (#4191) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] move to modeling * [shardformer] move to modeling * [shardformer] gpt2 support flash attention * [shardformer] gpt2 support flash attention * [shardformer] gpt2 support flash attention * [shardformer] bloom support flash attention (#4188) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] move to modeling * [shardformer] move to modeling * [shardformer] bloom suport flash attention * [shardformer] add assert to sequence length * [shardformer] fix * [shardformer] fix * [shardformer] fix * [shardformer] bert support flash attention. (#4206) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] move to modeling * [shardformer] move to modeling * [shardformer] bert support flash attention * [shardformer] t5 support flash attention. (#4216) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] move to modeling * [shardformer] move to modeling * [shardformer] t5 support flash attention * [shardformer] t5 support flash attention * fix typo * fix typo * fix typo * fix typo * fix typo * fix typo * [shardformer] support 'paddedcausal' type of attention mask in Coloattention. (#4215) * added padded causal attn mask type for ColoAttention * [shardformer]t5 flash attention fix (#4239) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] move to modeling * [shardformer] move to modeling * [shardformer] t5 flash attention fix * [shardformer] update gpt2 to use coloattention. (#4234) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] move to modeling * [shardformer] move to modeling * [shardformer] update gpt2 to use coloattention * [shardformer] update gpt2 to use coloattention * [shardformer] update gpt2 to use coloattention * [shardformer] update gpt2 to use coloattention * [shardformer] update gpt2 * [shardformer] update opt and llama to use coloattention. (#4226) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] move to modeling * [shardformer] move to modeling * update opt to use coloattention * [shardformer]update opt to use coloattention * [shardformer]update opt to use coloattention * [shardformer]update opt to use coloattention * [shardformer]update opt to use coloattention * [shardformer]update opt to use coloattention * [shardformer]update opt to use coloattention * [shardformer]update opt * [shardformer] shardformer support jit fused operator. (#4236) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] move to modeling * [shardformer] move to modeling * [shardformer] bloom support jit fused operator * [shardformer] bloom support jit fused operator * [shardformer] bloom support jit fused operator * [shardformer] t5 support jit fused operator * [shardformer] t5 support jit fused operator * [shardformer] t5 support jit fused operator * [shardformer] add roadmap of flash attention * [shardformer] add roadmap of flash attention * [shardformer] add roadmap of flash attention * [shardformer] add type hint to 'self' param of forward * [shardformer] merge feature/shardformer-models branch to feature/flash-attention-shardformer branch. (#4290) * Feature/vit support (#4182) * [shardformer] added tests * [shardformer] vit test finish and support * fix attention dropout * [shardformer] support SAM (#4231) * 1.support sam 2.add fused qkv for nn.Linear * update utils support set element in list * overtwrite SamVisionAttention foward to use DropoutForParallelInput * remove unused code * [shardformer] support whisper (#4212) * support whisper * fix bug in vocabembedding * support downstream model of whisper * update readme * Feature/chatglm (#4240) * [shardformer] added tests * [shardformer] vit test finish and support * [shardformer] chatglm ready * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] chatglm shard without mlp sharding * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] fix chatglm configuration with pre-commit --------- Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com> Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com> * [shardformer] whisper support flash attention (#4301) * Feature/vit support (#4182) * [shardformer] added tests * [shardformer] vit test finish and support * fix attention dropout * [shardformer] support SAM (#4231) * 1.support sam 2.add fused qkv for nn.Linear * update utils support set element in list * overtwrite SamVisionAttention foward to use DropoutForParallelInput * remove unused code * [shardformer] support whisper (#4212) * support whisper * fix bug in vocabembedding * support downstream model of whisper * update readme * Feature/chatglm (#4240) * [shardformer] added tests * [shardformer] vit test finish and support * [shardformer] chatglm ready * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] chatglm shard without mlp sharding * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] fix chatglm configuration with pre-commit * [shardformer] whisper support flash attention * [shardformer] whisper support flash attention * [shardformer]whisper support jit operator --------- Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com> Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com> * [shardformer] sam support flash attention (#4316) * Feature/vit support (#4182) * [shardformer] added tests * [shardformer] vit test finish and support * fix attention dropout * [shardformer] support SAM (#4231) * 1.support sam 2.add fused qkv for nn.Linear * update utils support set element in list * overtwrite SamVisionAttention foward to use DropoutForParallelInput * remove unused code * [shardformer] support whisper (#4212) * support whisper * fix bug in vocabembedding * support downstream model of whisper * update readme * Feature/chatglm (#4240) * [shardformer] added tests * [shardformer] vit test finish and support * [shardformer] chatglm ready * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] chatglm shard without mlp sharding * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] fix chatglm configuration with pre-commit * [shardformer] sam support flash attention --------- Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com> Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com> * [shardformer] merge blip2/chatglm (#4321) * Feature/vit support (#4182) * [shardformer] added tests * [shardformer] vit test finish and support * fix attention dropout * [shardformer] support SAM (#4231) * 1.support sam 2.add fused qkv for nn.Linear * update utils support set element in list * overtwrite SamVisionAttention foward to use DropoutForParallelInput * remove unused code * [shardformer] support whisper (#4212) * support whisper * fix bug in vocabembedding * support downstream model of whisper * update readme * Feature/chatglm (#4240) * [shardformer] added tests * [shardformer] vit test finish and support * [shardformer] chatglm ready * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] chatglm shard without mlp sharding * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] fix chatglm configuration with pre-commit * [shardformer] added tests * [shardformer] vit test finish and support * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit * [shardformer] support Blip2 (#4243) * support base blip2 * add support for downstream blip2 model * update readme * add forward injection * skip not compatible models test * fix test for gemini and low_level_zero_pugin --------- Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com> Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com> Co-authored-by: klhhhhh <1412841649@qq.com> * [shardformer] blip2 support flash attention and jit operator (#4325) * Feature/vit support (#4182) * [shardformer] added tests * [shardformer] vit test finish and support * fix attention dropout * [shardformer] support SAM (#4231) * 1.support sam 2.add fused qkv for nn.Linear * update utils support set element in list * overtwrite SamVisionAttention foward to use DropoutForParallelInput * remove unused code * [shardformer] support whisper (#4212) * support whisper * fix bug in vocabembedding * support downstream model of whisper * update readme * Feature/chatglm (#4240) * [shardformer] added tests * [shardformer] vit test finish and support * [shardformer] chatglm ready * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] chatglm shard without mlp sharding * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] fix chatglm configuration with pre-commit * [shardformer] added tests * [shardformer] vit test finish and support * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit * [shardformer] support Blip2 (#4243) * support base blip2 * add support for downstream blip2 model * update readme * add forward injection * skip not compatible models test * fix test for gemini and low_level_zero_pugin * [shardformer] blip2 support flash attention and jit operator * [shardformer] blip2 support flash attention and jit operator * [shardformer] blip2 support flash attention and jit operator --------- Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com> Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com> Co-authored-by: klhhhhh <1412841649@qq.com> * [shardformer] chatglm support flash attention and jit operator (#4330) * Feature/vit support (#4182) * [shardformer] added tests * [shardformer] vit test finish and support * fix attention dropout * [shardformer] support SAM (#4231) * 1.support sam 2.add fused qkv for nn.Linear * update utils support set element in list * overtwrite SamVisionAttention foward to use DropoutForParallelInput * remove unused code * [shardformer] support whisper (#4212) * support whisper * fix bug in vocabembedding * support downstream model of whisper * update readme * Feature/chatglm (#4240) * [shardformer] added tests * [shardformer] vit test finish and support * [shardformer] chatglm ready * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] chatglm shard without mlp sharding * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] fix chatglm configuration with pre-commit * [shardformer] added tests * [shardformer] vit test finish and support * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit * [shardformer] support Blip2 (#4243) * support base blip2 * add support for downstream blip2 model * update readme * add forward injection * skip not compatible models test * fix test for gemini and low_level_zero_pugin * [shardformer] chatglm support flash attention and jit operator * [shardformer] chatglm support flash attention and jit operator * [shardformer] chatglm support flash attention and jit operator * [shardformer] chatglm support flash attention and jit operator --------- Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com> Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com> Co-authored-by: klhhhhh <1412841649@qq.com> * [shardformer] vit support flash attention and jit operator (#4334) * Feature/vit support (#4182) * [shardformer] added tests * [shardformer] vit test finish and support * fix attention dropout * [shardformer] support SAM (#4231) * 1.support sam 2.add fused qkv for nn.Linear * update utils support set element in list * overtwrite SamVisionAttention foward to use DropoutForParallelInput * remove unused code * [shardformer] support whisper (#4212) * support whisper * fix bug in vocabembedding * support downstream model of whisper * update readme * Feature/chatglm (#4240) * [shardformer] added tests * [shardformer] vit test finish and support * [shardformer] chatglm ready * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] chatglm shard without mlp sharding * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] fix chatglm configuration with pre-commit * [shardformer] added tests * [shardformer] vit test finish and support * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit * [shardformer] support Blip2 (#4243) * support base blip2 * add support for downstream blip2 model * update readme * add forward injection * skip not compatible models test * fix test for gemini and low_level_zero_pugin * [shardformer] vit support flash attention and jit operator * [shardformer] vit support flash attention and jit operator --------- Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com> Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com> Co-authored-by: klhhhhh <1412841649@qq.com> * [pipeline] merge flash attention branch * [pipeline] merge flash attention branch * [pipeline] merge flash attention branch * [pipeline] fix conflict * [pipeline] fix conflict * Merge branch 'feature/pipeline' into feature/pipeline * Merge branch 'feature/pipeline' into feature/pipeline * Merge branch 'feature/pipeline' into feature/pipeline * activate checks * activate checks * activate checks * activate checks * activate checks * activate checks * activate checks * activate checks * fix flash attention tests * gemini ignore whisper * fix vit * fix xformers import handle --------- Co-authored-by: Frank Lee <somerlee.9@gmail.com> Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com> Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com> Co-authored-by: klhhhhh <1412841649@qq.com>	1 year ago
Jianghai	a88e92251d	[pipeline] add chatglm (#4363 ) * add pipeline policy and bert forward to be done * add bertmodel pipeline forward and make tests * add Bert_Policy and test for policy * update formatting * update formatting * update the code * fix bugs * fix name confilt * add bloom model and policy ,revise the base class of policy * revise * revision * add bert_for_pretraining * add bert_for_pretraining forward and policy * fix typos * cancel warning * change the imediate output to default dict * change the default output of get_shared_params * add chatglm * add * chatglm * chatglm * finish chatglm * deletes * fix rmsnorm * chatglm * fix chatglm shard * init	1 year ago
Baizhou Zhang	b1feeced8e	[shardformer] add util functions for shardformer tests/fix sync_shared_param (#4366 ) * add util functions for shardformer tests & rewrite gpt2 test * fix shared_params & embedding/merging * fix precision	1 year ago
Bin Jia	5c6f183192	[test] Hotfix/fix some model test and refactor check util api (#4369 ) * fix llama test * fix test bug of bert, blip2, bloom, gpt2 * fix llama test * fix opt test * fix sam test * fix sam test * fix t5 test * fix vit test * fix whisper test * fix whisper test * polish code * adjust allclose parameter * Add mistakenly deleted code * addjust allclose * change loss function for some base model	1 year ago
FoolPlayer	c3ca53cf05	[test] skip some not compatible models	1 year ago
FoolPlayer	726541afe2	update some module with new api version	1 year ago
FoolPlayer	879301d0da	[shardformer] support Blip2 (#4243 ) * support base blip2 * add support for downstream blip2 model * update readme * add forward injection * skip not compatible models test * fix test for gemini and low_level_zero_pugin	1 year ago
klhhhhh	8120eca0c0	[shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit	1 year ago
klhhhhh	4da05052f4	[shardformer] pre-commit check files	1 year ago
klhhhhh	f155ae89c4	[shardformer] ChatGLM support layernorm sharding	1 year ago
klhhhhh	00f6ef159d	[shardformer] delete some file	1 year ago
klhhhhh	dad00c42aa	[shardformer] support chatglm without layernorm	1 year ago
klhhhhh	cbb54d3202	[shardformer] polish code	1 year ago
klhhhhh	1a29e8fc29	[shardformer] polish chatglm code	1 year ago
klhhhhh	8620009dd7	[sharformer] add first version of policy of chatglm	1 year ago
klhhhhh	6ee4c9ee21	[shardformer] add test kit in model zoo for chatglm	1 year ago
klhhhhh	7377be7a53	import chatglm	1 year ago
klhhhhh	c49286985d	[shardformer] vit test finish and support	1 year ago
klhhhhh	f60162b265	[shardformer] added tests	1 year ago
Kun Lin	ed34bb1310	Feature/chatglm (#4240 ) * [shardformer] added tests * [shardformer] vit test finish and support * [shardformer] chatglm ready * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] chatglm shard without mlp sharding * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] fix chatglm configuration with pre-commit	1 year ago
FoolPlayer	9ee4ebea83	[shardformer] support whisper (#4212 ) * support whisper * fix bug in vocabembedding * support downstream model of whisper * update readme	1 year ago
FoolPlayer	dd2bf02679	[shardformer] support SAM (#4231 ) * 1.support sam 2.add fused qkv for nn.Linear * update utils support set element in list * overtwrite SamVisionAttention foward to use DropoutForParallelInput * remove unused code	1 year ago
Baizhou Zhang	0ceec8f9a9	[pipeline] support fp32 for HybridPlugin/merge shardformer test and pipeline test into one file (#4354 ) * add naive optimizer for 3DPlugin/refactor gpt2 shardformer test * merge tests of PP/DP/TP combinations into one test file * fix bug when sync grad for dp in HybridPlugin * update supported precisions for 3DPlugin/fix bug when shifting tp_degree * improve the passing of lazy_init * modify lazy_init/use sync_shared_params	1 year ago
Jianghai	f13954cd58	[pipeline] refactor test pipeline and remove useless utils in pipeline (#4324 ) * refactor tests * refactor bloom model * finish policy tests * refactor tests * fix test pure pipeline * remove test pipeline and cutdown launch process * refactor tests * refactor bloom model * finish policy tests * refactor tests * fix test pure pipeline * remove test pipeline and cutdown launch process	1 year ago
LuGY	d3c6cd66f3	[pipeline] add unit test for 1f1b (#4303 ) * add unit test for 1f1b * polish code * polish code and update ut version * fix	1 year ago
Baizhou Zhang	da3cef27ad	[pipeline] fix return_dict/fix pure_pipeline_test (#4331 )	1 year ago
Hongxin Liu	411cf1d2db	[hotfix] fix gemini and zero test (#4333 ) * [hotfix] fix gemini and zero test * [hotfix] fix lazy init test * [hotfix] fix lazy init test	1 year ago
Hongxin Liu	261eab02fb	[plugin] add 3d parallel plugin (#4295 ) * [amp] add mixed precision optimizer * [plugin] add 3d parallel plugin * [booster] support pipeline * [plugin] 3d parallel plugin support clip grad norm * [shardformer] fix sharder and add plugin test * [plugin] rename 3d parallel plugin * [ci] support testmon core pkg change detection (#4305) * [hotfix] debug testmon * [hotfix] fix llama * [hotfix] fix p2p bugs * [hotfix] fix requirements	1 year ago
FoolPlayer	b3f5d7a3ba	[shardformer] support pipeline base vit model (#4284 ) * Feature/vit support (#4182) * [shardformer] added tests * [shardformer] vit test finish and support * fix attention dropout * support base vit pipeline * support vit downstream model * fix vit shard test * modify hidden states return type --------- Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com>	1 year ago
Baizhou Zhang	083d7da33d	[pipeline] add pipeline support for all T5 models (#4310 ) * complete policy for T5Model & T5ForConditionalGeneration * modify function signature in forwards * add forward for T5model * add forward for T5ForConditionalGeneration * fix a bug * fix hidden_states transporting in decoder * fix the passing of encoder_outputs	1 year ago
Jianghai	d0807122e2	[pipeline] test pure pipeline process using llama (#4218 ) * bloom policy * llama pipeline forward and tests * fix the output and attention_mask * fix name * bind argument to policy * Revert "bloom policy" This reverts commit `8dee68a0a2`. This policy should be revert and copied to feature/bloom * revert the bloom changes * cancel unneeded inputs * gpt * finish llama * causal lm and sequence classification * revision * add pure pipeline test * fixed version * fixed version * pure pipeline	1 year ago
Baizhou Zhang	36e546b2cc	[pipeline] add pipeline support for T5Stack/T5EncoderModel (#4300 ) * modify t5 policy & add test * pipeline stage distribution for t5 * complete t5 base policy * t5 stack: halfway * modify gpt2 pipeline test * complete pipeline forward for T5Stack/T5EncoderModel * fix docstring * move t5 util tests to test_pipeline	1 year ago
Jianghai	d8408d185c	[pipeline] OPT model pipeline (#4258 ) * opt forward and test * pause * finish opt model pipeline * finish opt pipeline * opt forward and test * pause * finish opt model pipeline * finish opt pipeline * fix opt * set transformers version * refactor the test pipeline	1 year ago
Hongxin Liu	d921ce8391	[shardformer] support inplace sharding (#4251 ) * [shardformer] embedding support inplace sharding * [shardformer] linear support inplace sharding * [shardformer] layernorm support inplace sharding * [shardformer] qkv support inplace sharding * [test] update shardformer layer test * [shardformer] fix shared param sharding * [shardformer] fix bert policy * [shardformer] fix bloom policy * [shardformer] fix llama policy * [shardformer] fix opt policy * [shardformer] fix t5 policy * [shardformer] fix fused qkv linear * [shardformer] fix bugs * force sync * [test] fix bugs * [test] fix transformer version	1 year ago
Baizhou Zhang	2a2eacfaf1	[pipeline] support shardformer for GPT2ForQuestionAnswering & complete pipeline support for GPT2 (#4245 ) * change for transformers loggers * add forward for GPT2ForQuestionAnswering * fix assert * fix torchrec test	1 year ago
Jianghai	d9be0472ef	[bugs] hot fix some testing bugs for new models (#4268 ) * hot fix * hot fx tracer	1 year ago
Jianghai	34f0e34a4c	[pipeline] finish bloom models pipeline and tests (#4223 ) * bloom policy * llama pipeline forward and tests * fix the output and attention_mask * fix name * bind argument to policy * finish bloom model * test shard gpt2 * clear cache * support all bloom models * add bloom models policies * finish bloom pipeline and tests * add set pipeline * finish bloom	1 year ago
Jianghai	e7cc62d735	[pipeline] All bert models (#4233 ) * bloom policy * llama pipeline forward and tests * fix the output and attention_mask * fix name * bind argument to policy * Revert "bloom policy" This reverts commit `8dee68a0a2`. This policy should be revert and copied to feature/bloom * revert the bloom changes * cancel unneeded inputs * gpt * finish llama * causal lm and sequence classification * revision * add pure pipeline test * finish some bert models * finish all bert models * finish bert tests * fix bugs * fix bugs * fix test pipeline * fix data gen for qa * update the set pipeline forward * shared params * fix bugs	1 year ago
Baizhou Zhang	a14d352088	[pipeline] add pipeline forward for variants of gpt2 (#4238 ) * add forward for GPTLMHeadModel * add test for gpt_lm * arranging get_held_layers method * arrange forward replacement * add forward for GPT2ForTokenClassification * add forward for GPT2ForSequenceClassification * fix test_shard_gpt2.py * add GPT2DoubleHeadsmodel & fix bugs * add id checking in get_shared_params	1 year ago
Baizhou Zhang	208ac8f2ba	[pipeline] Add Pipeline Forward for GPT2Model Shardformer (#4224 ) * * fix typehint & docstring in sharder.py * * update pipeline forward for GPT2Model * * add test for pipeline forward of GPT2Model * * add cache cleaning in gpt2 test * * change assert to raise command	1 year ago
Jianghai	37d22f6878	[pipeline] add bloom model pipeline (#4210 ) * bloom policy * llama pipeline forward and tests * fix the output and attention_mask * fix name * bind argument to policy * finish bloom model * test shard gpt2 * clear cache	1 year ago
Jianghai	31bcf867ae	[pipeline] Llama causal lm and llama for sequence classification pipeline (#4208 ) * bloom policy * llama pipeline forward and tests * fix the output and attention_mask * fix name * bind argument to policy * Revert "bloom policy" This reverts commit `8dee68a0a2`. This policy should be revert and copied to feature/bloom * revert the bloom changes * cancel unneeded inputs * gpt * finish llama * causal lm and sequence classification * revision	1 year ago
Jianghai	1622031058	[pipeline] Llama pipeline (#4205 ) * bloom policy * llama pipeline forward and tests * fix the output and attention_mask * fix name * bind argument to policy * Revert "bloom policy" This reverts commit `8dee68a0a2`. This policy should be revert and copied to feature/bloom * revert the bloom changes * cancel unneeded inputs * gpt	1 year ago
Jianghai	1094e0f0d3	[pipeline] Bert pipeline for shardformer and its tests (#4197 ) * add pipeline forward * complete pipeline forward check * fix bert forward without pipeline * fix comments * discard useless line * add todo * clean prints * fix distribute layers	1 year ago
Hongxin Liu	890774b2fb	[shardformer] support lazy init (#4202 ) * [shardformer] support lazy init * [shardformer] linear support lazy init * [shardformer] embedding support lazy init * [shardformer] norm support lazy init * [shardformer] fused linear support lazy init * [test] update shardformer test layer * [test] shardformer with lazy init fit ddp * [lazy] hotfix deepcopy of param * [shardformer] fix bert policy and update test * [shardformer] fix bloom policy and update test * [shardformer] fix opt policy and update test * [shardformer] fix t5 policy and update test * [shardformer] fix gpt2 policy and update test * [shardformer] fix llama policy and update test	1 year ago
Jianghai	f3bcc292c8	[pipeline] move bert related pipeline components to shardformer (#4187 ) * move bert related pipeline components to shardformer * fix bugs * revision * fix bert model tests * fix bert_lm_head model tests * fix tests * fix tests * done checks * skip bloom	1 year ago
Jianghai	c5ea728016	[pipeline] add bert_for_pretraining bert_lmhead forward and policy (#4172 ) * add pipeline policy and bert forward to be done * add bertmodel pipeline forward and make tests * add Bert_Policy and test for policy * update formatting * update formatting * update the code * fix bugs * fix name confilt * add bloom model and policy ,revise the base class of policy * revise * revision * add bert_for_pretraining * add bert_for_pretraining forward and policy * fix typos * cancel warning * change the imediate output to default dict * change the default output of get_shared_params	1 year ago
ver217	5fc60a3a04	[test] add shard util tests	1 year ago
ver217	2d6cc07feb	[test] update shardformer tests	1 year ago
Jianghai	90a65ea682	[pipeline] build bloom model and policy , revise the base class of policy (#4161 ) * add pipeline policy and bert forward to be done * add bertmodel pipeline forward and make tests * add Bert_Policy and test for policy * update formatting * update formatting * update the code * fix bugs * fix name confilt * add bloom model and policy ,revise the base class of policy * revise * revision * add bert_for_pretraining	1 year ago
Jianghai	c552cefa93	[pipeline]add pipeline policy and bert forward (#4130 ) * add pipeline policy and bert forward to be done * add bertmodel pipeline forward and make tests * add Bert_Policy and test for policy * update formatting * update formatting * update the code * fix bugs * fix name confilt	1 year ago
Hongxin Liu	5c897ddb94	[pipeline] add stage manager (#4093 ) * [pipeline] add stage manager * [test] add pipeline stage manager test * [pipeline] add docstring for stage manager	1 year ago
Jianghai	e8e7e49243	[pipeline]add pipeline policy and bert forward (#4130 ) * add pipeline policy and bert forward to be done * add bertmodel pipeline forward and make tests * add Bert_Policy and test for policy * update formatting * update formatting * update the code * fix bugs * fix name confilt	1 year ago
Hongxin Liu	f51ce1bc8e	[pipeline] refactor 1f1b schedule (#4115 ) * [api] update optimizer wrapper to fit pipeline * [pipeline] add base schedule * [pipeline] add 1f1b schedule * [test] add pipeline schedule utils test * [pipeline] fix import	1 year ago
Hongxin Liu	45fdc9b42c	[pipeline] implement p2p communication (#4100 ) * [pipeline] add p2p communication * [test] add p2p communication test * [test] add rerun decorator * [test] rename to avoid conflict	1 year ago
Hongxin Liu	422544222f	[pipeline] add stage manager (#4093 ) * [pipeline] add stage manager * [test] add pipeline stage manager test * [pipeline] add docstring for stage manager	1 year ago
Hongxin Liu	5e1a9d48dd	[cluster] add process group mesh (#4039 ) * [cluster] add process group mesh * [test] add process group mesh test * force sync	1 year ago
LuGY	d86ddd9b29	[hotfix] fix unsafe async comm in zero (#4404 ) * improve stablility of zero * fix wrong index * add record stream	1 year ago
flybird1111	458ae331ad	[kernel] updated unittests for coloattention (#4389 ) Updated coloattention tests of checking outputs and gradients	1 year ago
flybird1111	38b792aab2	[coloattention] fix import error (#4380 ) fixed an import error	1 year ago
flybird1111	25c57b9fb4	[fix] coloattention support flash attention 2 (#4347 ) Improved ColoAttention interface to support flash attention 2. Solved #4322	1 year ago
Hongxin Liu	16bf4c0221	[test] remove useless tests (#4359 ) * [test] remove legacy zero test * [test] remove lazy distribute test * [test] remove outdated checkpoint io	1 year ago
LuGY	1a49a5ea00	[zero] support shard optimizer state dict of zero (#4194 ) * support shard optimizer of zero * polish code * support sync grad manually	1 year ago
LuGY	dd7cc58299	[zero] add state dict for low level zero (#4179 ) * add state dict for zero * fix unit test * polish	1 year ago
LuGY	c668801d36	[zero] allow passing process group to zero12 (#4153 ) * allow passing process group to zero12 * union tp-zero and normal-zero * polish code	1 year ago
LuGY	79cf1b5f33	[zero]support no_sync method for zero1 plugin (#4138 ) * support no sync for zero1 plugin * polish * polish	1 year ago
LuGY	c6ab96983a	[zero] refactor low level zero for shard evenly (#4030 ) * refactor low level zero * fix zero2 and support cpu offload * avg gradient and modify unit test * refactor grad store, support layer drop * refactor bucket store, support grad accumulation * fix and update unit test of zero and ddp * compatible with tp, ga and unit test * fix memory leak and polish * add zero layer drop unittest * polish code * fix import err in unit test * support diffenert comm dtype, modify docstring style * polish code * test padding and fix * fix unit test of low level zero * fix pad recording in bucket store * support some models * polish	1 year ago
Baizhou Zhang	c6f6005990	[checkpointio] Sharded Optimizer Checkpoint for Gemini Plugin (#4302 ) * sharded optimizer checkpoint for gemini plugin * modify test to reduce testing time * update doc * fix bug when keep_gatherd is true under GeminiPlugin	1 year ago
Hongxin Liu	fc5cef2c79	[lazy] support init on cuda (#4269 ) * [lazy] support init on cuda * [test] update lazy init test * [test] fix transformer version	1 year ago
Cuiqing Li	4b977541a8	[Kernels] added triton-implemented of self attention for colossal-ai (#4241 ) * added softmax kernel * added qkv_kernel * added ops * adding tests * upload tets * fix tests * debugging * debugging tests * debugging * added * fixed errors * added softmax kernel * clean codes * added tests * update tests * update tests * added attention * add * fixed pytest checking * add cuda check * fix cuda version * fix typo	1 year ago
Baizhou Zhang	58913441a1	Next commit [checkpointio] Unsharded Optimizer Checkpoint for Gemini Plugin (#4141 ) * [checkpointio] unsharded optimizer checkpoint for Gemini plugin * [checkpointio] unsharded optimizer checkpoint for Gemini using all_gather	1 year ago
github-actions[bot]	c77b3b19be	[format] applied code formatting on changed files in pull request 4152 (#4157 ) Co-authored-by: github-actions <github-actions@github.com>	1 year ago
Frank Lee	1fb0d95df0	[shardformer] made tensor parallelism configurable (#4144 ) * [shardformer] made tensor parallelism configurable * polish code	1 year ago
Frank Lee	74257cb446	[shardformer] refactored some doc and api (#4137 ) * [shardformer] refactored some doc and api * polish code	1 year ago
Frank Lee	ae035d305d	[shardformer] added embedding gradient check (#4124 )	1 year ago
Frank Lee	6a88bae4ec	[shardformer] integrate with data parallelism (#4103 )	1 year ago
Frank Lee	f3b6aaa6b7	[shardformer] supported fused normalization (#4112 )	1 year ago
Frank Lee	b1c2901530	[shardformer] supported bloom model (#4098 )	1 year ago
Kun Lin	8af29ee47a	[shardformer] support vision transformer (#4096 ) * first v of vit shardformer * keep vit * update * vit shard add vitattention vitlayer * update num head shard para * finish test for vit * add new_model_class & postprocess * add vit readme * delete old files & fix the conflict * fix sth	1 year ago
jiangmingyan	ac80937138	[shardformer] shardformer support opt models (#4091 ) * [shardformer] shardformer support opt models * [shardformer] shardformer support opt models, fix * [shardformer] shardformer support opt models, fix * [shardformer] shardformer support opt models, fix	1 year ago
Frank Lee	d33a44e8c3	[shardformer] refactored layernorm (#4086 )	1 year ago
Frank Lee	c4b1b65931	[test] fixed tests failed due to dtensor change (#4082 ) * [test] fixed tests failed due to dtensor change * polish code	1 year ago
FoolPlayer	92f6791095	[shardformer] Add layernorm (#4072 ) * add layernorm to bert * add layernorm test * add layernorm test with load state dict * add use_mixedfusedLN in shard config * refactor policy to support fused_layernorm	1 year ago
Frank Lee	70c58cfd4f	[shardformer] supported fused qkv checkpoint (#4073 )	1 year ago
FoolPlayer	0803a61412	[shardformer] add linearconv1d test (#4067 ) * add linearconv1d test * add linearconv1d test	1 year ago
Frank Lee	8eb09a4c69	[shardformer] support module saving and loading (#4062 ) * [shardformer] support module saving and loading * polish code	1 year ago
FoolPlayer	7740c55c55	support kit use for bert/gpt test (#4055 ) * support kit use for bert test * support kit test for gpt2	1 year ago
Frank Lee	f22ddacef0	[shardformer] refactored the shardformer layer structure (#4053 )	1 year ago
Frank Lee	58df720570	[shardformer] adapted T5 and LLaMa test to use kit (#4049 ) * [shardformer] adapted T5 and LLaMa test to use kit * polish code	1 year ago
FoolPlayer	4021b9a8a2	[shardformer] add gpt2 test and layer class refactor (#4041 ) * add gpt2 test and layer class refactor * add dropout in gpt2 policy	1 year ago
Frank Lee	d857f3dbba	[shardformer] supported T5 and its variants (#4045 )	1 year ago
Frank Lee	c1d5453e9f	[shardformer] adapted llama to the new API (#4036 )	1 year ago
FoolPlayer	74d176c8d8	[shardformer] fix bert and gpt downstream with new api (#4024 ) * fix bert downstream with new api * remove comment line	1 year ago
FoolPlayer	507c0ad368	add vocabembedding layer	1 year ago
Frank Lee	3893fa1a8d	[shardformer] refactored embedding and dropout to parallel module (#4013 ) * [shardformer] refactored embedding and dropout to parallel module * polish code	1 year ago
FoolPlayer	dfca9678fa	integrate with dist layer (#4011 )	1 year ago
Frank Lee	015af592f8	[shardformer] integrated linear 1D with dtensor (#3996 ) * [shardformer] integrated linear 1D with dtensor * polish code	1 year ago
Frank Lee	611971248c	[device] support init device mesh from process group (#3990 )	1 year ago
FoolPlayer	f7774ec0f3	[Shardformer] Downstream bert (#3979 ) * add dist dropout in model * update docstring and bert policy with dropout * refactor basepolicy and sharded, update bert * update format * update gpt2 policy * update bert policy * remove unused code * update readme for new policy usage * add downstream model of bert * remove unused code	1 year ago
wukong1992	c1c672d0f0	[shardformer] shardformer support t5 model (#3994 ) test t5	1 year ago
wukong1992	6b30dfb7ce	[shardformer] support llama model using shardformer (#3969 ) adjust layer attr	1 year ago
FoolPlayer	a73130482d	[shardformer] Unit test (#3928 ) * fix bug in slicer, add slicer unit test * add dropout test * use pid as dropout seed * updata dropout test with local pattern * ad todo	1 year ago
FoolPlayer	f1cb5ac6bf	[shardformer] Align bert value (#3907 ) * add bert align test, fix dist loss bug * forward and backward align * add ignore index * add shardformer CI * add gather_output optional for user in shardconfig * update readme with optional gather_ouput * add dist crossentropy loss test, remove unused files * remove unused file * remove unused file * rename the file * polish code	1 year ago
Baizhou Zhang	0bb0b481b4	[gemini] fix argument naming during chunk configuration searching	1 year ago
github-actions[bot]	a52f62082d	[format] applied code formatting on changed files in pull request 4021 (#4022 ) Co-authored-by: github-actions <github-actions@github.com>	1 year ago
Frank Lee	a5883aa790	[test] fixed codefactor format report (#4026 )	1 year ago
Baizhou Zhang	822c3d4d66	[checkpointio] sharded optimizer checkpoint for DDP plugin (#4002 )	1 year ago
Wenhao Chen	725af3eeeb	[booster] make optimizer argument optional for boost (#3993 ) * feat: make optimizer optional in Booster.boost * test: skip unet test if diffusers version > 0.10.2	1 year ago
Baizhou Zhang	c9cff7e7fa	[checkpointio] General Checkpointing of Sharded Optimizers (#3984 )	1 year ago
digger yu	e61ffc77c6	fix typo tests/ (#3936 )	1 year ago
Frank Lee	ddcf58cacf	Revert "[sync] sync feature/shardformer with develop"	1 year ago
Frank Lee	eb39154d40	[dtensor] updated api and doc (#3845 )	1 year ago
Hongxin Liu	ae02d4e4f7	[bf16] add bf16 support (#3882 ) * [bf16] add bf16 support for fused adam (#3844) * [bf16] fused adam kernel support bf16 * [test] update fused adam kernel test * [test] update fused adam test * [bf16] cpu adam and hybrid adam optimizers support bf16 (#3860) * [bf16] implement mixed precision mixin and add bf16 support for low level zero (#3869) * [bf16] add mixed precision mixin * [bf16] low level zero optim support bf16 * [text] update low level zero test * [text] fix low level zero grad acc test * [bf16] add bf16 support for gemini (#3872) * [bf16] gemini support bf16 * [test] update gemini bf16 test * [doc] update gemini docstring * [bf16] add bf16 support for plugins (#3877) * [bf16] add bf16 support for legacy zero (#3879) * [zero] init context support bf16 * [zero] legacy zero support bf16 * [test] add zero bf16 test * [doc] add bf16 related docstring for legacy zero	1 year ago
Hongxin Liu	dbb32692d2	[lazy] refactor lazy init (#3891 ) * [lazy] remove old lazy init * [lazy] refactor lazy init folder structure * [lazy] fix lazy tensor deepcopy * [test] update lazy init test	1 year ago
wukong1992	6b305a99d6	[booster] torch fsdp fix ckpt (#3788 )	2 years ago
Frank Lee	615e2e5fc1	[test] fixed lazy init test import error (#3799 )	2 years ago
Hongxin Liu	3c07a2846e	[plugin] a workaround for zero plugins' optimizer checkpoint (#3780 ) * [test] refactor torch ddp checkpoint test * [plugin] update low level zero optim checkpoint * [plugin] update gemini optim checkpoint	2 years ago
Hongxin Liu	5452df63c5	[plugin] torch ddp plugin supports sharded model checkpoint (#3775 ) * [plugin] torch ddp plugin add save sharded model * [test] fix torch ddp ckpt io test * [test] fix torch ddp ckpt io test * [test] fix low level zero plugin test * [test] fix low level zero plugin test * [test] add debug info * [test] add debug info * [test] add debug info * [test] add debug info * [test] add debug info * [test] fix low level zero plugin test * [test] fix low level zero plugin test * [test] remove debug info	2 years ago
wukong1992	6050f37776	[booster] removed models that don't support fsdp (#3744 ) Co-authored-by: 纪少敏 <jishaomin@jishaomindeMBP.lan>	2 years ago
Hongxin Liu	afb239bbf8	[devops] update torch version of CI (#3725 ) * [test] fix flop tensor test * [test] fix autochunk test * [test] fix lazyinit test * [devops] update torch version of CI * [devops] enable testmon * [devops] fix ci * [devops] fix ci * [test] fix checkpoint io test * [test] fix cluster test * [test] fix timm test * [devops] fix ci * [devops] fix ci * [devops] fix ci * [devops] fix ci * [devops] force sync to test ci * [test] skip fsdp test	2 years ago
wukong1992	b37797ed3d	[booster] support torch fsdp plugin in booster (#3697 ) Co-authored-by: 纪少敏 <jishaomin@jishaomindeMBP.lan>	2 years ago
digger-yu	1f73609adb	[CI] fix typo with tests/ etc. (#3727 ) * fix spelling error with examples/comminity/ * fix spelling error with tests/ * fix some spelling error with tests/ colossalai/ etc. * fix spelling error with tests/ etc. date:2023.5.10	2 years ago
digger-yu	b7141c36dd	[CI] fix some spelling errors (#3707 ) * fix spelling error with examples/comminity/ * fix spelling error with tests/ * fix some spelling error with tests/ colossalai/ etc.	2 years ago
jiangmingyan	20068ba188	[booster] add tests for ddp and low level zero's checkpointio (#3715 ) * [booster] update tests for booster * [booster] update tests for booster * [booster] update tests for booster * [booster] update tests for booster * [booster] update tests for booster * [booster] update booster tutorials#3717, fix recursive check	2 years ago
Hongxin Liu	6552cbf8e1	[booster] fix no_sync method (#3709 ) * [booster] fix no_sync method * [booster] add test for ddp no_sync * [booster] fix merge * [booster] update unit test * [booster] update unit test * [booster] update unit test	2 years ago
Hongxin Liu	3bf09efe74	[booster] update prepare dataloader method for plugin (#3706 ) * [booster] add prepare dataloader method for plug * [booster] update examples and docstr	2 years ago
Hongxin Liu	d0915f54f4	[booster] refactor all dp fashion plugins (#3684 ) * [booster] add dp plugin base * [booster] inherit dp plugin base * [booster] refactor unit tests	2 years ago
digger-yu	b49020c1b1	[CI] Update test_sharded_optim_with_sync_bn.py (#3688 ) fix spelling error in line23 change "cudnn_determinstic"=True to "cudnn_deterministic=True"	2 years ago
jiangmingyan	307894f74d	[booster] gemini plugin support shard checkpoint (#3610 ) * gemini plugin add shard checkpoint save/load * gemini plugin add shard checkpoint save/load * gemini plugin add shard checkpoint save/load * gemini plugin add shard checkpoint save/load * gemini plugin add shard checkpoint save/load * gemini plugin add shard checkpoint save/load * gemini plugin add shard checkpoint save/load * gemini plugin add shard checkpoint save/load * gemini plugin add shard checkpoint save/load * gemini plugin add shard checkpoint save/load * gemini plugin add shard checkpoint save/load * gemini plugin add shard checkpoint save/load * gemini plugin add shard checkpoint save/load * gemini plugin add shard checkpoint save/load * gemini plugin support shard checkpoint * [API Refactoring]gemini plugin support shard checkpoint * [API Refactoring]gemini plugin support shard checkpoint * [API Refactoring]gemini plugin support shard checkpoint * [API Refactoring]gemini plugin support shard checkpoint * [API Refactoring]gemini plugin support shard checkpoint * [API Refactoring]gemini plugin support shard checkpoint * [API Refactoring]gemini plugin support shard checkpoint * [API Refactoring]gemini plugin support shard checkpoint * [API Refactoring]gemini plugin support shard checkpoint * [API Refactoring]gemini plugin support shard checkpoint * [API Refactoring]gemini plugin support shard checkpoint * [API Refactoring]gemini plugin support shard checkpoint * [API Refactoring]gemini plugin support shard checkpoint --------- Co-authored-by: luchen <luchen@luchendeMBP.lan> Co-authored-by: luchen <luchen@luchendeMacBook-Pro.local>	2 years ago
Hongxin Liu	50793b35f4	[gemini] accelerate inference (#3641 ) * [gemini] support don't scatter after inference * [chat] update colossalai strategy * [chat] fix opt benchmark * [chat] update opt benchmark * [gemini] optimize inference * [test] add gemini inference test * [chat] fix unit test ci * [chat] fix ci * [chat] fix ci * [chat] skip checkpoint test	2 years ago
Hongxin Liu	4b3240cb59	[booster] add low level zero plugin (#3594 ) * [booster] add low level zero plugin * [booster] fix gemini plugin test * [booster] fix precision * [booster] add low level zero plugin test * [test] fix booster plugin test oom * [test] fix booster plugin test oom * [test] fix googlenet and inception output trans * [test] fix diffuser clip vision model * [test] fix torchaudio_wav2vec2_base * [test] fix low level zero plugin test	2 years ago
Hongxin Liu	f313babd11	[gemini] support save state dict in shards (#3581 ) * [gemini] support state dict shard * [gemini] add test state dict shard * [gemini] polish docstr * [gemini] fix merge * [gemini] polish code	2 years ago
Hongxin Liu	152239bbfa	[gemini] gemini supports lazy init (#3379 ) * [gemini] fix nvme optimizer init * [gemini] gemini supports lazy init * [gemini] add init example * [gemini] add fool model * [zero] update gemini ddp * [zero] update init example * add chunk method * add chunk method * [lazyinit] fix lazy tensor tolist * [gemini] fix buffer materialization * [misc] remove useless file * [booster] update gemini plugin * [test] update gemini plugin test * [test] fix gemini plugin test * [gemini] fix import * [gemini] fix import * [lazyinit] use new metatensor * [lazyinit] use new metatensor * [lazyinit] fix __set__ method	2 years ago
jiangmingyan	52a933e175	[checkpoint] support huggingface style sharded checkpoint (#3461 ) * [checkpoint] support huggingface style sharded checkpoint * [checkpoint] support huggingface style sharded checkpoint * [checkpoint] support huggingface style sharded checkpoint * [checkpoint] support huggingface style sharded checkpoint * [checkpoint] support huggingface style sharded checkpoint --------- Co-authored-by: luchen <luchen@luchendeMBP.lan>	2 years ago
Frank Lee	80eba05b0a	[test] refactor tests with spawn (#3452 ) * [test] added spawn decorator * polish code * polish code * polish code * polish code * polish code * polish code	2 years ago

1 2 3 4 5 ...

999 Commits (feat/moe)