* [misc] remove config arg from initialize
* [misc] remove old tensor contrusctor
* [plugin] add npu support for ddp
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [devops] fix doc test ci
* [test] fix test launch
* [doc] update launch doc
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* [devops] fix compatibility
* [hotfix] update compatibility test on pr
* [devops] fix compatibility
* [devops] record duration during comp test
* [test] decrease test duration
* fix falcon
* fix 3d checkpoint load when booster boost without optimizer
fix 3d checkpoint load when booster boost without optimizer
* test ci
* revert ci
* fix
fix
* [shardformer] implement policy for all GPT-J models and test
* [shardformer] support interleaved pipeline parallel for bert finetune
* [shardformer] shardformer support falcon (#4883)
* [shardformer]: fix interleaved pipeline for bert model (#5048)
* [hotfix]: disable seq parallel for gptj and falcon, and polish code (#5093)
* Add Mistral support for Shardformer (#5103)
* [shardformer] add tests to mistral (#5105)
---------
Co-authored-by: Pengtai Xu <henryxu880@gmail.com>
Co-authored-by: ppt0011 <143150326+ppt0011@users.noreply.github.com>
Co-authored-by: flybird11111 <1829166702@qq.com>
Co-authored-by: eric8607242 <e0928021388@gmail.com>
* [legacy] move engine to legacy
* [example] fix seq parallel example
* [example] fix seq parallel example
* [test] test gemini pluging hang
* [test] test gemini pluging hang
* [test] test gemini pluging hang
* [test] test gemini pluging hang
* [test] test gemini pluging hang
* [example] update seq parallel requirements
* [shardformer] supported flash attention test dependency (#4158)
* [shardformer] fix flash attention utils test (#4180)
* [shardformer] opt support flash attention (#4163)
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] move to modeling
* [shardformer] move to modeling
* [shardformer] add performance benchmark of shardformer (#4175)
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] benchmark fix
* [shardformer] benchmark fix
* [shardformer] llama support flash attention (#4185)
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] move to modeling
* [shardformer] move to modeling
* [shardformer] llama support flash attention
* [shardformer] llama support flash attention
* [shardformer] Move the import statement for xformer outside the forward function.
* [shardformer] gpt2 support flash attention. (#4191)
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] move to modeling
* [shardformer] move to modeling
* [shardformer] gpt2 support flash attention
* [shardformer] gpt2 support flash attention
* [shardformer] gpt2 support flash attention
* [shardformer] bloom support flash attention (#4188)
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] move to modeling
* [shardformer] move to modeling
* [shardformer] bloom suport flash attention
* [shardformer] add assert to sequence length
* [shardformer] fix
* [shardformer] fix
* [shardformer] fix
* [shardformer] bert support flash attention. (#4206)
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] move to modeling
* [shardformer] move to modeling
* [shardformer] bert support flash attention
* [shardformer] t5 support flash attention. (#4216)
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] move to modeling
* [shardformer] move to modeling
* [shardformer] t5 support flash attention
* [shardformer] t5 support flash attention
* fix typo
* fix typo
* fix typo
* fix typo
* fix typo
* fix typo
* [shardformer] support 'paddedcausal' type of attention mask in Coloattention. (#4215)
* added padded causal attn mask type for ColoAttention
* [shardformer]t5 flash attention fix (#4239)
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] move to modeling
* [shardformer] move to modeling
* [shardformer] t5 flash attention fix
* [shardformer] update gpt2 to use coloattention. (#4234)
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] move to modeling
* [shardformer] move to modeling
* [shardformer] update gpt2 to use coloattention
* [shardformer] update gpt2 to use coloattention
* [shardformer] update gpt2 to use coloattention
* [shardformer] update gpt2 to use coloattention
* [shardformer] update gpt2
* [shardformer] update opt and llama to use coloattention. (#4226)
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] move to modeling
* [shardformer] move to modeling
* update opt to use coloattention
* [shardformer]update opt to use coloattention
* [shardformer]update opt to use coloattention
* [shardformer]update opt to use coloattention
* [shardformer]update opt to use coloattention
* [shardformer]update opt to use coloattention
* [shardformer]update opt to use coloattention
* [shardformer]update opt
* [shardformer] shardformer support jit fused operator. (#4236)
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] move to modeling
* [shardformer] move to modeling
* [shardformer] bloom support jit fused operator
* [shardformer] bloom support jit fused operator
* [shardformer] bloom support jit fused operator
* [shardformer] t5 support jit fused operator
* [shardformer] t5 support jit fused operator
* [shardformer] t5 support jit fused operator
* [shardformer] add roadmap of flash attention
* [shardformer] add roadmap of flash attention
* [shardformer] add roadmap of flash attention
* [shardformer] add type hint to 'self' param of forward
* [shardformer] merge feature/shardformer-models branch to feature/flash-attention-shardformer branch. (#4290)
* Feature/vit support (#4182)
* [shardformer] added tests
* [shardformer] vit test finish and support
* fix attention dropout
* [shardformer] support SAM (#4231)
* 1.support sam 2.add fused qkv for nn.Linear
* update utils support set element in list
* overtwrite SamVisionAttention foward to use DropoutForParallelInput
* remove unused code
* [shardformer] support whisper (#4212)
* support whisper
* fix bug in vocabembedding
* support downstream model of whisper
* update readme
* Feature/chatglm (#4240)
* [shardformer] added tests
* [shardformer] vit test finish and support
* [shardformer] chatglm ready
* import chatglm
* [shardformer] add test kit in model zoo for chatglm
* [sharformer] add first version of policy of chatglm
* [shardformer] polish chatglm code
* [shardformer] polish code
* [shardformer] support chatglm without layernorm
* [shardformer] chatglm shard without mlp sharding
* [shardformer] delete some file
* [shardformer] ChatGLM support layernorm sharding
* [shardformer] register without auto policy
* [shardformer] pre-commit check files
* [shardformer] fix chatglm configuration with pre-commit
---------
Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com>
Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
* [shardformer] whisper support flash attention (#4301)
* Feature/vit support (#4182)
* [shardformer] added tests
* [shardformer] vit test finish and support
* fix attention dropout
* [shardformer] support SAM (#4231)
* 1.support sam 2.add fused qkv for nn.Linear
* update utils support set element in list
* overtwrite SamVisionAttention foward to use DropoutForParallelInput
* remove unused code
* [shardformer] support whisper (#4212)
* support whisper
* fix bug in vocabembedding
* support downstream model of whisper
* update readme
* Feature/chatglm (#4240)
* [shardformer] added tests
* [shardformer] vit test finish and support
* [shardformer] chatglm ready
* import chatglm
* [shardformer] add test kit in model zoo for chatglm
* [sharformer] add first version of policy of chatglm
* [shardformer] polish chatglm code
* [shardformer] polish code
* [shardformer] support chatglm without layernorm
* [shardformer] chatglm shard without mlp sharding
* [shardformer] delete some file
* [shardformer] ChatGLM support layernorm sharding
* [shardformer] register without auto policy
* [shardformer] pre-commit check files
* [shardformer] fix chatglm configuration with pre-commit
* [shardformer] whisper support flash attention
* [shardformer] whisper support flash attention
* [shardformer]whisper support jit operator
---------
Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com>
Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
* [shardformer] sam support flash attention (#4316)
* Feature/vit support (#4182)
* [shardformer] added tests
* [shardformer] vit test finish and support
* fix attention dropout
* [shardformer] support SAM (#4231)
* 1.support sam 2.add fused qkv for nn.Linear
* update utils support set element in list
* overtwrite SamVisionAttention foward to use DropoutForParallelInput
* remove unused code
* [shardformer] support whisper (#4212)
* support whisper
* fix bug in vocabembedding
* support downstream model of whisper
* update readme
* Feature/chatglm (#4240)
* [shardformer] added tests
* [shardformer] vit test finish and support
* [shardformer] chatglm ready
* import chatglm
* [shardformer] add test kit in model zoo for chatglm
* [sharformer] add first version of policy of chatglm
* [shardformer] polish chatglm code
* [shardformer] polish code
* [shardformer] support chatglm without layernorm
* [shardformer] chatglm shard without mlp sharding
* [shardformer] delete some file
* [shardformer] ChatGLM support layernorm sharding
* [shardformer] register without auto policy
* [shardformer] pre-commit check files
* [shardformer] fix chatglm configuration with pre-commit
* [shardformer] sam support flash attention
---------
Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com>
Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
* [shardformer] merge blip2/chatglm (#4321)
* Feature/vit support (#4182)
* [shardformer] added tests
* [shardformer] vit test finish and support
* fix attention dropout
* [shardformer] support SAM (#4231)
* 1.support sam 2.add fused qkv for nn.Linear
* update utils support set element in list
* overtwrite SamVisionAttention foward to use DropoutForParallelInput
* remove unused code
* [shardformer] support whisper (#4212)
* support whisper
* fix bug in vocabembedding
* support downstream model of whisper
* update readme
* Feature/chatglm (#4240)
* [shardformer] added tests
* [shardformer] vit test finish and support
* [shardformer] chatglm ready
* import chatglm
* [shardformer] add test kit in model zoo for chatglm
* [sharformer] add first version of policy of chatglm
* [shardformer] polish chatglm code
* [shardformer] polish code
* [shardformer] support chatglm without layernorm
* [shardformer] chatglm shard without mlp sharding
* [shardformer] delete some file
* [shardformer] ChatGLM support layernorm sharding
* [shardformer] register without auto policy
* [shardformer] pre-commit check files
* [shardformer] fix chatglm configuration with pre-commit
* [shardformer] added tests
* [shardformer] vit test finish and support
* import chatglm
* [shardformer] add test kit in model zoo for chatglm
* [sharformer] add first version of policy of chatglm
* [shardformer] polish chatglm code
* [shardformer] polish code
* [shardformer] support chatglm without layernorm
* [shardformer] delete some file
* [shardformer] ChatGLM support layernorm sharding
* [shardformer] register without auto policy
* [shardformer] pre-commit check files
* [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit
* [shardformer] support Blip2 (#4243)
* support base blip2
* add support for downstream blip2 model
* update readme
* add forward injection
* skip not compatible models test
* fix test for gemini and low_level_zero_pugin
---------
Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com>
Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: klhhhhh <1412841649@qq.com>
* [shardformer] blip2 support flash attention and jit operator (#4325)
* Feature/vit support (#4182)
* [shardformer] added tests
* [shardformer] vit test finish and support
* fix attention dropout
* [shardformer] support SAM (#4231)
* 1.support sam 2.add fused qkv for nn.Linear
* update utils support set element in list
* overtwrite SamVisionAttention foward to use DropoutForParallelInput
* remove unused code
* [shardformer] support whisper (#4212)
* support whisper
* fix bug in vocabembedding
* support downstream model of whisper
* update readme
* Feature/chatglm (#4240)
* [shardformer] added tests
* [shardformer] vit test finish and support
* [shardformer] chatglm ready
* import chatglm
* [shardformer] add test kit in model zoo for chatglm
* [sharformer] add first version of policy of chatglm
* [shardformer] polish chatglm code
* [shardformer] polish code
* [shardformer] support chatglm without layernorm
* [shardformer] chatglm shard without mlp sharding
* [shardformer] delete some file
* [shardformer] ChatGLM support layernorm sharding
* [shardformer] register without auto policy
* [shardformer] pre-commit check files
* [shardformer] fix chatglm configuration with pre-commit
* [shardformer] added tests
* [shardformer] vit test finish and support
* import chatglm
* [shardformer] add test kit in model zoo for chatglm
* [sharformer] add first version of policy of chatglm
* [shardformer] polish chatglm code
* [shardformer] polish code
* [shardformer] support chatglm without layernorm
* [shardformer] delete some file
* [shardformer] ChatGLM support layernorm sharding
* [shardformer] register without auto policy
* [shardformer] pre-commit check files
* [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit
* [shardformer] support Blip2 (#4243)
* support base blip2
* add support for downstream blip2 model
* update readme
* add forward injection
* skip not compatible models test
* fix test for gemini and low_level_zero_pugin
* [shardformer] blip2 support flash attention and jit operator
* [shardformer] blip2 support flash attention and jit operator
* [shardformer] blip2 support flash attention and jit operator
---------
Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com>
Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: klhhhhh <1412841649@qq.com>
* [shardformer] chatglm support flash attention and jit operator (#4330)
* Feature/vit support (#4182)
* [shardformer] added tests
* [shardformer] vit test finish and support
* fix attention dropout
* [shardformer] support SAM (#4231)
* 1.support sam 2.add fused qkv for nn.Linear
* update utils support set element in list
* overtwrite SamVisionAttention foward to use DropoutForParallelInput
* remove unused code
* [shardformer] support whisper (#4212)
* support whisper
* fix bug in vocabembedding
* support downstream model of whisper
* update readme
* Feature/chatglm (#4240)
* [shardformer] added tests
* [shardformer] vit test finish and support
* [shardformer] chatglm ready
* import chatglm
* [shardformer] add test kit in model zoo for chatglm
* [sharformer] add first version of policy of chatglm
* [shardformer] polish chatglm code
* [shardformer] polish code
* [shardformer] support chatglm without layernorm
* [shardformer] chatglm shard without mlp sharding
* [shardformer] delete some file
* [shardformer] ChatGLM support layernorm sharding
* [shardformer] register without auto policy
* [shardformer] pre-commit check files
* [shardformer] fix chatglm configuration with pre-commit
* [shardformer] added tests
* [shardformer] vit test finish and support
* import chatglm
* [shardformer] add test kit in model zoo for chatglm
* [sharformer] add first version of policy of chatglm
* [shardformer] polish chatglm code
* [shardformer] polish code
* [shardformer] support chatglm without layernorm
* [shardformer] delete some file
* [shardformer] ChatGLM support layernorm sharding
* [shardformer] register without auto policy
* [shardformer] pre-commit check files
* [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit
* [shardformer] support Blip2 (#4243)
* support base blip2
* add support for downstream blip2 model
* update readme
* add forward injection
* skip not compatible models test
* fix test for gemini and low_level_zero_pugin
* [shardformer] chatglm support flash attention and jit operator
* [shardformer] chatglm support flash attention and jit operator
* [shardformer] chatglm support flash attention and jit operator
* [shardformer] chatglm support flash attention and jit operator
---------
Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com>
Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: klhhhhh <1412841649@qq.com>
* [shardformer] vit support flash attention and jit operator (#4334)
* Feature/vit support (#4182)
* [shardformer] added tests
* [shardformer] vit test finish and support
* fix attention dropout
* [shardformer] support SAM (#4231)
* 1.support sam 2.add fused qkv for nn.Linear
* update utils support set element in list
* overtwrite SamVisionAttention foward to use DropoutForParallelInput
* remove unused code
* [shardformer] support whisper (#4212)
* support whisper
* fix bug in vocabembedding
* support downstream model of whisper
* update readme
* Feature/chatglm (#4240)
* [shardformer] added tests
* [shardformer] vit test finish and support
* [shardformer] chatglm ready
* import chatglm
* [shardformer] add test kit in model zoo for chatglm
* [sharformer] add first version of policy of chatglm
* [shardformer] polish chatglm code
* [shardformer] polish code
* [shardformer] support chatglm without layernorm
* [shardformer] chatglm shard without mlp sharding
* [shardformer] delete some file
* [shardformer] ChatGLM support layernorm sharding
* [shardformer] register without auto policy
* [shardformer] pre-commit check files
* [shardformer] fix chatglm configuration with pre-commit
* [shardformer] added tests
* [shardformer] vit test finish and support
* import chatglm
* [shardformer] add test kit in model zoo for chatglm
* [sharformer] add first version of policy of chatglm
* [shardformer] polish chatglm code
* [shardformer] polish code
* [shardformer] support chatglm without layernorm
* [shardformer] delete some file
* [shardformer] ChatGLM support layernorm sharding
* [shardformer] register without auto policy
* [shardformer] pre-commit check files
* [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit
* [shardformer] support Blip2 (#4243)
* support base blip2
* add support for downstream blip2 model
* update readme
* add forward injection
* skip not compatible models test
* fix test for gemini and low_level_zero_pugin
* [shardformer] vit support flash attention and jit operator
* [shardformer] vit support flash attention and jit operator
---------
Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com>
Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: klhhhhh <1412841649@qq.com>
* [pipeline] merge flash attention branch
* [pipeline] merge flash attention branch
* [pipeline] merge flash attention branch
* [pipeline] fix conflict
* [pipeline] fix conflict
* Merge branch 'feature/pipeline' into feature/pipeline
* Merge branch 'feature/pipeline' into feature/pipeline
* Merge branch 'feature/pipeline' into feature/pipeline
* activate checks
* activate checks
* activate checks
* activate checks
* activate checks
* activate checks
* activate checks
* activate checks
* fix flash attention tests
* gemini ignore whisper
* fix vit
* fix xformers import handle
---------
Co-authored-by: Frank Lee <somerlee.9@gmail.com>
Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com>
Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: klhhhhh <1412841649@qq.com>