* Use self.[distribute_layers|get_stage_index] to exploit custom layer distribution
* Change static methods for t5 layer distribution to member functions
* Change static methods for whisper layer distribution to member functions
* Replace whisper policy usage with self one
* Fix test case to use non-static layer distribution methods
* fix: fix typo
---------
Co-authored-by: Wenhao Chen <cwher@outlook.com>
* fix: simplify merge_batch
* fix: use return_outputs=False to eliminate extra memory consumption
* feat: add return_outputs warning
* style: remove `return_outputs=False` as it is the default value
* [devops] fix compatibility
* [hotfix] update compatibility test on pr
* [devops] fix compatibility
* [devops] record duration during comp test
* [test] decrease test duration
* fix falcon
* fix 3d checkpoint load when booster boost without optimizer
fix 3d checkpoint load when booster boost without optimizer
* test ci
* revert ci
* fix
fix
* [shardformer] implement policy for all GPT-J models and test
* [shardformer] support interleaved pipeline parallel for bert finetune
* [shardformer] shardformer support falcon (#4883)
* [shardformer]: fix interleaved pipeline for bert model (#5048)
* [hotfix]: disable seq parallel for gptj and falcon, and polish code (#5093)
* Add Mistral support for Shardformer (#5103)
* [shardformer] add tests to mistral (#5105)
---------
Co-authored-by: Pengtai Xu <henryxu880@gmail.com>
Co-authored-by: ppt0011 <143150326+ppt0011@users.noreply.github.com>
Co-authored-by: flybird11111 <1829166702@qq.com>
Co-authored-by: eric8607242 <e0928021388@gmail.com>
* [npu] setup device utils (#5047)
* [npu] add npu device support
* [npu] support low level zero
* [test] update npu zero plugin test
* [hotfix] fix import
* [test] recover tests
* [npu] gemini support npu (#5052)
* [npu] refactor device utils
* [gemini] support npu
* [example] llama2+gemini support npu
* [kernel] add arm cpu adam kernel (#5065)
* [kernel] add arm cpu adam
* [optim] update adam optimizer
* [kernel] arm cpu adam remove bf16 support
* add test
* fix no_sync bug in low level zero plugin
* fix test
* add argument for grad accum
* add grad accum in backward hook for gemini
* finish implementation, rewrite tests
* fix test
* skip stuck model in low level zero test
* update doc
* optimize communication & fix gradient checkpoint
* modify doc
* cleaning codes
* update cpu adam fp16 case
* [legacy] move engine to legacy
* [example] fix seq parallel example
* [example] fix seq parallel example
* [test] test gemini pluging hang
* [test] test gemini pluging hang
* [test] test gemini pluging hang
* [test] test gemini pluging hang
* [test] test gemini pluging hang
* [example] update seq parallel requirements
* [shardformer] supported flash attention test dependency (#4158)
* [shardformer] fix flash attention utils test (#4180)
* [shardformer] opt support flash attention (#4163)
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] move to modeling
* [shardformer] move to modeling
* [shardformer] add performance benchmark of shardformer (#4175)
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] benchmark fix
* [shardformer] benchmark fix
* [shardformer] llama support flash attention (#4185)
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] move to modeling
* [shardformer] move to modeling
* [shardformer] llama support flash attention
* [shardformer] llama support flash attention
* [shardformer] Move the import statement for xformer outside the forward function.
* [shardformer] gpt2 support flash attention. (#4191)
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] move to modeling
* [shardformer] move to modeling
* [shardformer] gpt2 support flash attention
* [shardformer] gpt2 support flash attention
* [shardformer] gpt2 support flash attention
* [shardformer] bloom support flash attention (#4188)
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] move to modeling
* [shardformer] move to modeling
* [shardformer] bloom suport flash attention
* [shardformer] add assert to sequence length
* [shardformer] fix
* [shardformer] fix
* [shardformer] fix
* [shardformer] bert support flash attention. (#4206)
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] move to modeling
* [shardformer] move to modeling
* [shardformer] bert support flash attention
* [shardformer] t5 support flash attention. (#4216)
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] move to modeling
* [shardformer] move to modeling
* [shardformer] t5 support flash attention
* [shardformer] t5 support flash attention
* fix typo
* fix typo
* fix typo
* fix typo
* fix typo
* fix typo
* [shardformer] support 'paddedcausal' type of attention mask in Coloattention. (#4215)
* added padded causal attn mask type for ColoAttention
* [shardformer]t5 flash attention fix (#4239)
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] move to modeling
* [shardformer] move to modeling
* [shardformer] t5 flash attention fix
* [shardformer] update gpt2 to use coloattention. (#4234)
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] move to modeling
* [shardformer] move to modeling
* [shardformer] update gpt2 to use coloattention
* [shardformer] update gpt2 to use coloattention
* [shardformer] update gpt2 to use coloattention
* [shardformer] update gpt2 to use coloattention
* [shardformer] update gpt2
* [shardformer] update opt and llama to use coloattention. (#4226)
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] move to modeling
* [shardformer] move to modeling
* update opt to use coloattention
* [shardformer]update opt to use coloattention
* [shardformer]update opt to use coloattention
* [shardformer]update opt to use coloattention
* [shardformer]update opt to use coloattention
* [shardformer]update opt to use coloattention
* [shardformer]update opt to use coloattention
* [shardformer]update opt
* [shardformer] shardformer support jit fused operator. (#4236)
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] move to modeling
* [shardformer] move to modeling
* [shardformer] bloom support jit fused operator
* [shardformer] bloom support jit fused operator
* [shardformer] bloom support jit fused operator
* [shardformer] t5 support jit fused operator
* [shardformer] t5 support jit fused operator
* [shardformer] t5 support jit fused operator
* [shardformer] add roadmap of flash attention
* [shardformer] add roadmap of flash attention
* [shardformer] add roadmap of flash attention
* [shardformer] add type hint to 'self' param of forward
* [shardformer] merge feature/shardformer-models branch to feature/flash-attention-shardformer branch. (#4290)
* Feature/vit support (#4182)
* [shardformer] added tests
* [shardformer] vit test finish and support
* fix attention dropout
* [shardformer] support SAM (#4231)
* 1.support sam 2.add fused qkv for nn.Linear
* update utils support set element in list
* overtwrite SamVisionAttention foward to use DropoutForParallelInput
* remove unused code
* [shardformer] support whisper (#4212)
* support whisper
* fix bug in vocabembedding
* support downstream model of whisper
* update readme
* Feature/chatglm (#4240)
* [shardformer] added tests
* [shardformer] vit test finish and support
* [shardformer] chatglm ready
* import chatglm
* [shardformer] add test kit in model zoo for chatglm
* [sharformer] add first version of policy of chatglm
* [shardformer] polish chatglm code
* [shardformer] polish code
* [shardformer] support chatglm without layernorm
* [shardformer] chatglm shard without mlp sharding
* [shardformer] delete some file
* [shardformer] ChatGLM support layernorm sharding
* [shardformer] register without auto policy
* [shardformer] pre-commit check files
* [shardformer] fix chatglm configuration with pre-commit
---------
Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com>
Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
* [shardformer] whisper support flash attention (#4301)
* Feature/vit support (#4182)
* [shardformer] added tests
* [shardformer] vit test finish and support
* fix attention dropout
* [shardformer] support SAM (#4231)
* 1.support sam 2.add fused qkv for nn.Linear
* update utils support set element in list
* overtwrite SamVisionAttention foward to use DropoutForParallelInput
* remove unused code
* [shardformer] support whisper (#4212)
* support whisper
* fix bug in vocabembedding
* support downstream model of whisper
* update readme
* Feature/chatglm (#4240)
* [shardformer] added tests
* [shardformer] vit test finish and support
* [shardformer] chatglm ready
* import chatglm
* [shardformer] add test kit in model zoo for chatglm
* [sharformer] add first version of policy of chatglm
* [shardformer] polish chatglm code
* [shardformer] polish code
* [shardformer] support chatglm without layernorm
* [shardformer] chatglm shard without mlp sharding
* [shardformer] delete some file
* [shardformer] ChatGLM support layernorm sharding
* [shardformer] register without auto policy
* [shardformer] pre-commit check files
* [shardformer] fix chatglm configuration with pre-commit
* [shardformer] whisper support flash attention
* [shardformer] whisper support flash attention
* [shardformer]whisper support jit operator
---------
Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com>
Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
* [shardformer] sam support flash attention (#4316)
* Feature/vit support (#4182)
* [shardformer] added tests
* [shardformer] vit test finish and support
* fix attention dropout
* [shardformer] support SAM (#4231)
* 1.support sam 2.add fused qkv for nn.Linear
* update utils support set element in list
* overtwrite SamVisionAttention foward to use DropoutForParallelInput
* remove unused code
* [shardformer] support whisper (#4212)
* support whisper
* fix bug in vocabembedding
* support downstream model of whisper
* update readme
* Feature/chatglm (#4240)
* [shardformer] added tests
* [shardformer] vit test finish and support
* [shardformer] chatglm ready
* import chatglm
* [shardformer] add test kit in model zoo for chatglm
* [sharformer] add first version of policy of chatglm
* [shardformer] polish chatglm code
* [shardformer] polish code
* [shardformer] support chatglm without layernorm
* [shardformer] chatglm shard without mlp sharding
* [shardformer] delete some file
* [shardformer] ChatGLM support layernorm sharding
* [shardformer] register without auto policy
* [shardformer] pre-commit check files
* [shardformer] fix chatglm configuration with pre-commit
* [shardformer] sam support flash attention
---------
Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com>
Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
* [shardformer] merge blip2/chatglm (#4321)
* Feature/vit support (#4182)
* [shardformer] added tests
* [shardformer] vit test finish and support
* fix attention dropout
* [shardformer] support SAM (#4231)
* 1.support sam 2.add fused qkv for nn.Linear
* update utils support set element in list
* overtwrite SamVisionAttention foward to use DropoutForParallelInput
* remove unused code
* [shardformer] support whisper (#4212)
* support whisper
* fix bug in vocabembedding
* support downstream model of whisper
* update readme
* Feature/chatglm (#4240)
* [shardformer] added tests
* [shardformer] vit test finish and support
* [shardformer] chatglm ready
* import chatglm
* [shardformer] add test kit in model zoo for chatglm
* [sharformer] add first version of policy of chatglm
* [shardformer] polish chatglm code
* [shardformer] polish code
* [shardformer] support chatglm without layernorm
* [shardformer] chatglm shard without mlp sharding
* [shardformer] delete some file
* [shardformer] ChatGLM support layernorm sharding
* [shardformer] register without auto policy
* [shardformer] pre-commit check files
* [shardformer] fix chatglm configuration with pre-commit
* [shardformer] added tests
* [shardformer] vit test finish and support
* import chatglm
* [shardformer] add test kit in model zoo for chatglm
* [sharformer] add first version of policy of chatglm
* [shardformer] polish chatglm code
* [shardformer] polish code
* [shardformer] support chatglm without layernorm
* [shardformer] delete some file
* [shardformer] ChatGLM support layernorm sharding
* [shardformer] register without auto policy
* [shardformer] pre-commit check files
* [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit
* [shardformer] support Blip2 (#4243)
* support base blip2
* add support for downstream blip2 model
* update readme
* add forward injection
* skip not compatible models test
* fix test for gemini and low_level_zero_pugin
---------
Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com>
Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: klhhhhh <1412841649@qq.com>
* [shardformer] blip2 support flash attention and jit operator (#4325)
* Feature/vit support (#4182)
* [shardformer] added tests
* [shardformer] vit test finish and support
* fix attention dropout
* [shardformer] support SAM (#4231)
* 1.support sam 2.add fused qkv for nn.Linear
* update utils support set element in list
* overtwrite SamVisionAttention foward to use DropoutForParallelInput
* remove unused code
* [shardformer] support whisper (#4212)
* support whisper
* fix bug in vocabembedding
* support downstream model of whisper
* update readme
* Feature/chatglm (#4240)
* [shardformer] added tests
* [shardformer] vit test finish and support
* [shardformer] chatglm ready
* import chatglm
* [shardformer] add test kit in model zoo for chatglm
* [sharformer] add first version of policy of chatglm
* [shardformer] polish chatglm code
* [shardformer] polish code
* [shardformer] support chatglm without layernorm
* [shardformer] chatglm shard without mlp sharding
* [shardformer] delete some file
* [shardformer] ChatGLM support layernorm sharding
* [shardformer] register without auto policy
* [shardformer] pre-commit check files
* [shardformer] fix chatglm configuration with pre-commit
* [shardformer] added tests
* [shardformer] vit test finish and support
* import chatglm
* [shardformer] add test kit in model zoo for chatglm
* [sharformer] add first version of policy of chatglm
* [shardformer] polish chatglm code
* [shardformer] polish code
* [shardformer] support chatglm without layernorm
* [shardformer] delete some file
* [shardformer] ChatGLM support layernorm sharding
* [shardformer] register without auto policy
* [shardformer] pre-commit check files
* [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit
* [shardformer] support Blip2 (#4243)
* support base blip2
* add support for downstream blip2 model
* update readme
* add forward injection
* skip not compatible models test
* fix test for gemini and low_level_zero_pugin
* [shardformer] blip2 support flash attention and jit operator
* [shardformer] blip2 support flash attention and jit operator
* [shardformer] blip2 support flash attention and jit operator
---------
Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com>
Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: klhhhhh <1412841649@qq.com>
* [shardformer] chatglm support flash attention and jit operator (#4330)
* Feature/vit support (#4182)
* [shardformer] added tests
* [shardformer] vit test finish and support
* fix attention dropout
* [shardformer] support SAM (#4231)
* 1.support sam 2.add fused qkv for nn.Linear
* update utils support set element in list
* overtwrite SamVisionAttention foward to use DropoutForParallelInput
* remove unused code
* [shardformer] support whisper (#4212)
* support whisper
* fix bug in vocabembedding
* support downstream model of whisper
* update readme
* Feature/chatglm (#4240)
* [shardformer] added tests
* [shardformer] vit test finish and support
* [shardformer] chatglm ready
* import chatglm
* [shardformer] add test kit in model zoo for chatglm
* [sharformer] add first version of policy of chatglm
* [shardformer] polish chatglm code
* [shardformer] polish code
* [shardformer] support chatglm without layernorm
* [shardformer] chatglm shard without mlp sharding
* [shardformer] delete some file
* [shardformer] ChatGLM support layernorm sharding
* [shardformer] register without auto policy
* [shardformer] pre-commit check files
* [shardformer] fix chatglm configuration with pre-commit
* [shardformer] added tests
* [shardformer] vit test finish and support
* import chatglm
* [shardformer] add test kit in model zoo for chatglm
* [sharformer] add first version of policy of chatglm
* [shardformer] polish chatglm code
* [shardformer] polish code
* [shardformer] support chatglm without layernorm
* [shardformer] delete some file
* [shardformer] ChatGLM support layernorm sharding
* [shardformer] register without auto policy
* [shardformer] pre-commit check files
* [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit
* [shardformer] support Blip2 (#4243)
* support base blip2
* add support for downstream blip2 model
* update readme
* add forward injection
* skip not compatible models test
* fix test for gemini and low_level_zero_pugin
* [shardformer] chatglm support flash attention and jit operator
* [shardformer] chatglm support flash attention and jit operator
* [shardformer] chatglm support flash attention and jit operator
* [shardformer] chatglm support flash attention and jit operator
---------
Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com>
Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: klhhhhh <1412841649@qq.com>
* [shardformer] vit support flash attention and jit operator (#4334)
* Feature/vit support (#4182)
* [shardformer] added tests
* [shardformer] vit test finish and support
* fix attention dropout
* [shardformer] support SAM (#4231)
* 1.support sam 2.add fused qkv for nn.Linear
* update utils support set element in list
* overtwrite SamVisionAttention foward to use DropoutForParallelInput
* remove unused code
* [shardformer] support whisper (#4212)
* support whisper
* fix bug in vocabembedding
* support downstream model of whisper
* update readme
* Feature/chatglm (#4240)
* [shardformer] added tests
* [shardformer] vit test finish and support
* [shardformer] chatglm ready
* import chatglm
* [shardformer] add test kit in model zoo for chatglm
* [sharformer] add first version of policy of chatglm
* [shardformer] polish chatglm code
* [shardformer] polish code
* [shardformer] support chatglm without layernorm
* [shardformer] chatglm shard without mlp sharding
* [shardformer] delete some file
* [shardformer] ChatGLM support layernorm sharding
* [shardformer] register without auto policy
* [shardformer] pre-commit check files
* [shardformer] fix chatglm configuration with pre-commit
* [shardformer] added tests
* [shardformer] vit test finish and support
* import chatglm
* [shardformer] add test kit in model zoo for chatglm
* [sharformer] add first version of policy of chatglm
* [shardformer] polish chatglm code
* [shardformer] polish code
* [shardformer] support chatglm without layernorm
* [shardformer] delete some file
* [shardformer] ChatGLM support layernorm sharding
* [shardformer] register without auto policy
* [shardformer] pre-commit check files
* [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit
* [shardformer] support Blip2 (#4243)
* support base blip2
* add support for downstream blip2 model
* update readme
* add forward injection
* skip not compatible models test
* fix test for gemini and low_level_zero_pugin
* [shardformer] vit support flash attention and jit operator
* [shardformer] vit support flash attention and jit operator
---------
Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com>
Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: klhhhhh <1412841649@qq.com>
* [pipeline] merge flash attention branch
* [pipeline] merge flash attention branch
* [pipeline] merge flash attention branch
* [pipeline] fix conflict
* [pipeline] fix conflict
* Merge branch 'feature/pipeline' into feature/pipeline
* Merge branch 'feature/pipeline' into feature/pipeline
* Merge branch 'feature/pipeline' into feature/pipeline
* activate checks
* activate checks
* activate checks
* activate checks
* activate checks
* activate checks
* activate checks
* activate checks
* fix flash attention tests
* gemini ignore whisper
* fix vit
* fix xformers import handle
---------
Co-authored-by: Frank Lee <somerlee.9@gmail.com>
Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com>
Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: klhhhhh <1412841649@qq.com>
* refactor low level zero
* fix zero2 and support cpu offload
* avg gradient and modify unit test
* refactor grad store, support layer drop
* refactor bucket store, support grad accumulation
* fix and update unit test of zero and ddp
* compatible with tp, ga and unit test
* fix memory leak and polish
* add zero layer drop unittest
* polish code
* fix import err in unit test
* support diffenert comm dtype, modify docstring style
* polish code
* test padding and fix
* fix unit test of low level zero
* fix pad recording in bucket store
* support some models
* polish
* [plugin] torch ddp plugin add save sharded model
* [test] fix torch ddp ckpt io test
* [test] fix torch ddp ckpt io test
* [test] fix low level zero plugin test
* [test] fix low level zero plugin test
* [test] add debug info
* [test] add debug info
* [test] add debug info
* [test] add debug info
* [test] add debug info
* [test] fix low level zero plugin test
* [test] fix low level zero plugin test
* [test] remove debug info
* [test] fix flop tensor test
* [test] fix autochunk test
* [test] fix lazyinit test
* [devops] update torch version of CI
* [devops] enable testmon
* [devops] fix ci
* [devops] fix ci
* [test] fix checkpoint io test
* [test] fix cluster test
* [test] fix timm test
* [devops] fix ci
* [devops] fix ci
* [devops] fix ci
* [devops] fix ci
* [devops] force sync to test ci
* [test] skip fsdp test
* fix spelling error with examples/comminity/
* fix spelling error with tests/
* fix some spelling error with tests/ colossalai/ etc.
* fix spelling error with tests/ etc. date:2023.5.10
* [booster] fix no_sync method
* [booster] add test for ddp no_sync
* [booster] fix merge
* [booster] update unit test
* [booster] update unit test
* [booster] update unit test