* [inference] Dynamic Batching for Single and Multiple GPUs (#4831)
* finish batch manager
* 1
* first
* fix
* fix dynamic batching
* llama infer
* finish test
* support different lengths generating
* del prints
* del prints
* fix
* fix bug
---------
Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
* [inference] Async dynamic batching (#4894)
* finish input and output logic
* add generate
* test forward
* 1
* [inference]Re push async dynamic batching (#4901)
* adapt to ray server
* finish async
* finish test
* del test
---------
Co-authored-by: yuehuayingxueluo <867460659@qq.com>
* Revert "[inference]Re push async dynamic batching (#4901)" (#4905)
This reverts commit fbf3c09e67.
* Revert "[inference] Async dynamic batching (#4894)"
This reverts commit fced140250.
* Revert "[inference] Async dynamic batching (#4894)" (#4909)
This reverts commit fced140250.
* Add Ray Distributed Environment Init Scripts
* support DynamicBatchManager base function
* revert _set_tokenizer version
* add driver async generate
* add async test
* fix bugs in test_ray_dist.py
* add get_tokenizer.py
* fix code style
* fix bugs about No module named 'pydantic' in ci test
* fix bugs in ci test
* fix bugs in ci test
* fix bugs in ci test
* [infer]Add Ray Distributed Environment Init Scripts (#4911)
* Revert "[inference] Async dynamic batching (#4894)"
This reverts commit fced140250.
* Add Ray Distributed Environment Init Scripts
* support DynamicBatchManager base function
* revert _set_tokenizer version
* add driver async generate
* add async test
* fix bugs in test_ray_dist.py
* add get_tokenizer.py
* fix code style
* fix bugs about No module named 'pydantic' in ci test
* fix bugs in ci test
* fix bugs in ci test
* fix bugs in ci test
* support dynamic batch for bloom model and is_running function
* [Inference]Test for new Async engine (#4935)
* infer engine
* infer engine
* test engine
* test engine
* new manager
* change step
* add
* test
* fix
* fix
* finish test
* finish test
* finish test
* finish test
* add license
---------
Co-authored-by: yuehuayingxueluo <867460659@qq.com>
* add assertion for config (#4947)
* [Inference] Finish dynamic batching offline test (#4948)
* test
* fix test
* fix quant
* add default
* fix
* fix some bugs
* fix some bugs
* fix
* fix bug
* fix bugs
* reset param
---------
Co-authored-by: yuehuayingxueluo <867460659@qq.com>
Co-authored-by: Cuiqing Li <lixx3527@gmail.com>
Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
* [test] add custom models in model zoo
* [test] update legacy test
* [test] update model zoo
* [test] update gemini test
* [test] remove components to test
* [shardformer] gpt2 tests fix
[shardformer] test all optimizations (#4399)
[shardformer] test all optimizations
[shardformer] test all optimizations
[shardformer] test all optimizations
[shardformer] gpt2 tests fix
* [shardformer]update t5 to use all optimizations
* add pipeline policy and bert forward to be done
* add bertmodel pipeline forward and make tests
* add Bert_Policy and test for policy
* update formatting
* update formatting
* update the code
* fix bugs
* fix name confilt
* add bloom model and policy ,revise the base class of policy
* revise
* revision
* add bert_for_pretraining
* add bert_for_pretraining forward and policy
* fix typos
* cancel warning
* change the imediate output to default dict
* change the default output of get_shared_params
* rewrite bert test
* rewrite bert test
* fix some bugs
* del pipeline tests
* del pipeline tests
* del useless print
* del useless print
* rewrite data repeats
* [shardformer] supported flash attention test dependency (#4158)
* [shardformer] fix flash attention utils test (#4180)
* [shardformer] opt support flash attention (#4163)
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] move to modeling
* [shardformer] move to modeling
* [shardformer] add performance benchmark of shardformer (#4175)
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] benchmark fix
* [shardformer] benchmark fix
* [shardformer] llama support flash attention (#4185)
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] move to modeling
* [shardformer] move to modeling
* [shardformer] llama support flash attention
* [shardformer] llama support flash attention
* [shardformer] Move the import statement for xformer outside the forward function.
* [shardformer] gpt2 support flash attention. (#4191)
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] move to modeling
* [shardformer] move to modeling
* [shardformer] gpt2 support flash attention
* [shardformer] gpt2 support flash attention
* [shardformer] gpt2 support flash attention
* [shardformer] bloom support flash attention (#4188)
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] move to modeling
* [shardformer] move to modeling
* [shardformer] bloom suport flash attention
* [shardformer] add assert to sequence length
* [shardformer] fix
* [shardformer] fix
* [shardformer] fix
* [shardformer] bert support flash attention. (#4206)
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] move to modeling
* [shardformer] move to modeling
* [shardformer] bert support flash attention
* [shardformer] t5 support flash attention. (#4216)
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] move to modeling
* [shardformer] move to modeling
* [shardformer] t5 support flash attention
* [shardformer] t5 support flash attention
* fix typo
* fix typo
* fix typo
* fix typo
* fix typo
* fix typo
* [shardformer] support 'paddedcausal' type of attention mask in Coloattention. (#4215)
* added padded causal attn mask type for ColoAttention
* [shardformer]t5 flash attention fix (#4239)
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] move to modeling
* [shardformer] move to modeling
* [shardformer] t5 flash attention fix
* [shardformer] update gpt2 to use coloattention. (#4234)
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] move to modeling
* [shardformer] move to modeling
* [shardformer] update gpt2 to use coloattention
* [shardformer] update gpt2 to use coloattention
* [shardformer] update gpt2 to use coloattention
* [shardformer] update gpt2 to use coloattention
* [shardformer] update gpt2
* [shardformer] update opt and llama to use coloattention. (#4226)
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] move to modeling
* [shardformer] move to modeling
* update opt to use coloattention
* [shardformer]update opt to use coloattention
* [shardformer]update opt to use coloattention
* [shardformer]update opt to use coloattention
* [shardformer]update opt to use coloattention
* [shardformer]update opt to use coloattention
* [shardformer]update opt to use coloattention
* [shardformer]update opt
* [shardformer] shardformer support jit fused operator. (#4236)
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] opt support flash attention
* [shardformer] move to modeling
* [shardformer] move to modeling
* [shardformer] bloom support jit fused operator
* [shardformer] bloom support jit fused operator
* [shardformer] bloom support jit fused operator
* [shardformer] t5 support jit fused operator
* [shardformer] t5 support jit fused operator
* [shardformer] t5 support jit fused operator
* [shardformer] add roadmap of flash attention
* [shardformer] add roadmap of flash attention
* [shardformer] add roadmap of flash attention
* [shardformer] add type hint to 'self' param of forward
* [shardformer] merge feature/shardformer-models branch to feature/flash-attention-shardformer branch. (#4290)
* Feature/vit support (#4182)
* [shardformer] added tests
* [shardformer] vit test finish and support
* fix attention dropout
* [shardformer] support SAM (#4231)
* 1.support sam 2.add fused qkv for nn.Linear
* update utils support set element in list
* overtwrite SamVisionAttention foward to use DropoutForParallelInput
* remove unused code
* [shardformer] support whisper (#4212)
* support whisper
* fix bug in vocabembedding
* support downstream model of whisper
* update readme
* Feature/chatglm (#4240)
* [shardformer] added tests
* [shardformer] vit test finish and support
* [shardformer] chatglm ready
* import chatglm
* [shardformer] add test kit in model zoo for chatglm
* [sharformer] add first version of policy of chatglm
* [shardformer] polish chatglm code
* [shardformer] polish code
* [shardformer] support chatglm without layernorm
* [shardformer] chatglm shard without mlp sharding
* [shardformer] delete some file
* [shardformer] ChatGLM support layernorm sharding
* [shardformer] register without auto policy
* [shardformer] pre-commit check files
* [shardformer] fix chatglm configuration with pre-commit
---------
Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com>
Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
* [shardformer] whisper support flash attention (#4301)
* Feature/vit support (#4182)
* [shardformer] added tests
* [shardformer] vit test finish and support
* fix attention dropout
* [shardformer] support SAM (#4231)
* 1.support sam 2.add fused qkv for nn.Linear
* update utils support set element in list
* overtwrite SamVisionAttention foward to use DropoutForParallelInput
* remove unused code
* [shardformer] support whisper (#4212)
* support whisper
* fix bug in vocabembedding
* support downstream model of whisper
* update readme
* Feature/chatglm (#4240)
* [shardformer] added tests
* [shardformer] vit test finish and support
* [shardformer] chatglm ready
* import chatglm
* [shardformer] add test kit in model zoo for chatglm
* [sharformer] add first version of policy of chatglm
* [shardformer] polish chatglm code
* [shardformer] polish code
* [shardformer] support chatglm without layernorm
* [shardformer] chatglm shard without mlp sharding
* [shardformer] delete some file
* [shardformer] ChatGLM support layernorm sharding
* [shardformer] register without auto policy
* [shardformer] pre-commit check files
* [shardformer] fix chatglm configuration with pre-commit
* [shardformer] whisper support flash attention
* [shardformer] whisper support flash attention
* [shardformer]whisper support jit operator
---------
Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com>
Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
* [shardformer] sam support flash attention (#4316)
* Feature/vit support (#4182)
* [shardformer] added tests
* [shardformer] vit test finish and support
* fix attention dropout
* [shardformer] support SAM (#4231)
* 1.support sam 2.add fused qkv for nn.Linear
* update utils support set element in list
* overtwrite SamVisionAttention foward to use DropoutForParallelInput
* remove unused code
* [shardformer] support whisper (#4212)
* support whisper
* fix bug in vocabembedding
* support downstream model of whisper
* update readme
* Feature/chatglm (#4240)
* [shardformer] added tests
* [shardformer] vit test finish and support
* [shardformer] chatglm ready
* import chatglm
* [shardformer] add test kit in model zoo for chatglm
* [sharformer] add first version of policy of chatglm
* [shardformer] polish chatglm code
* [shardformer] polish code
* [shardformer] support chatglm without layernorm
* [shardformer] chatglm shard without mlp sharding
* [shardformer] delete some file
* [shardformer] ChatGLM support layernorm sharding
* [shardformer] register without auto policy
* [shardformer] pre-commit check files
* [shardformer] fix chatglm configuration with pre-commit
* [shardformer] sam support flash attention
---------
Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com>
Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
* [shardformer] merge blip2/chatglm (#4321)
* Feature/vit support (#4182)
* [shardformer] added tests
* [shardformer] vit test finish and support
* fix attention dropout
* [shardformer] support SAM (#4231)
* 1.support sam 2.add fused qkv for nn.Linear
* update utils support set element in list
* overtwrite SamVisionAttention foward to use DropoutForParallelInput
* remove unused code
* [shardformer] support whisper (#4212)
* support whisper
* fix bug in vocabembedding
* support downstream model of whisper
* update readme
* Feature/chatglm (#4240)
* [shardformer] added tests
* [shardformer] vit test finish and support
* [shardformer] chatglm ready
* import chatglm
* [shardformer] add test kit in model zoo for chatglm
* [sharformer] add first version of policy of chatglm
* [shardformer] polish chatglm code
* [shardformer] polish code
* [shardformer] support chatglm without layernorm
* [shardformer] chatglm shard without mlp sharding
* [shardformer] delete some file
* [shardformer] ChatGLM support layernorm sharding
* [shardformer] register without auto policy
* [shardformer] pre-commit check files
* [shardformer] fix chatglm configuration with pre-commit
* [shardformer] added tests
* [shardformer] vit test finish and support
* import chatglm
* [shardformer] add test kit in model zoo for chatglm
* [sharformer] add first version of policy of chatglm
* [shardformer] polish chatglm code
* [shardformer] polish code
* [shardformer] support chatglm without layernorm
* [shardformer] delete some file
* [shardformer] ChatGLM support layernorm sharding
* [shardformer] register without auto policy
* [shardformer] pre-commit check files
* [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit
* [shardformer] support Blip2 (#4243)
* support base blip2
* add support for downstream blip2 model
* update readme
* add forward injection
* skip not compatible models test
* fix test for gemini and low_level_zero_pugin
---------
Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com>
Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: klhhhhh <1412841649@qq.com>
* [shardformer] blip2 support flash attention and jit operator (#4325)
* Feature/vit support (#4182)
* [shardformer] added tests
* [shardformer] vit test finish and support
* fix attention dropout
* [shardformer] support SAM (#4231)
* 1.support sam 2.add fused qkv for nn.Linear
* update utils support set element in list
* overtwrite SamVisionAttention foward to use DropoutForParallelInput
* remove unused code
* [shardformer] support whisper (#4212)
* support whisper
* fix bug in vocabembedding
* support downstream model of whisper
* update readme
* Feature/chatglm (#4240)
* [shardformer] added tests
* [shardformer] vit test finish and support
* [shardformer] chatglm ready
* import chatglm
* [shardformer] add test kit in model zoo for chatglm
* [sharformer] add first version of policy of chatglm
* [shardformer] polish chatglm code
* [shardformer] polish code
* [shardformer] support chatglm without layernorm
* [shardformer] chatglm shard without mlp sharding
* [shardformer] delete some file
* [shardformer] ChatGLM support layernorm sharding
* [shardformer] register without auto policy
* [shardformer] pre-commit check files
* [shardformer] fix chatglm configuration with pre-commit
* [shardformer] added tests
* [shardformer] vit test finish and support
* import chatglm
* [shardformer] add test kit in model zoo for chatglm
* [sharformer] add first version of policy of chatglm
* [shardformer] polish chatglm code
* [shardformer] polish code
* [shardformer] support chatglm without layernorm
* [shardformer] delete some file
* [shardformer] ChatGLM support layernorm sharding
* [shardformer] register without auto policy
* [shardformer] pre-commit check files
* [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit
* [shardformer] support Blip2 (#4243)
* support base blip2
* add support for downstream blip2 model
* update readme
* add forward injection
* skip not compatible models test
* fix test for gemini and low_level_zero_pugin
* [shardformer] blip2 support flash attention and jit operator
* [shardformer] blip2 support flash attention and jit operator
* [shardformer] blip2 support flash attention and jit operator
---------
Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com>
Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: klhhhhh <1412841649@qq.com>
* [shardformer] chatglm support flash attention and jit operator (#4330)
* Feature/vit support (#4182)
* [shardformer] added tests
* [shardformer] vit test finish and support
* fix attention dropout
* [shardformer] support SAM (#4231)
* 1.support sam 2.add fused qkv for nn.Linear
* update utils support set element in list
* overtwrite SamVisionAttention foward to use DropoutForParallelInput
* remove unused code
* [shardformer] support whisper (#4212)
* support whisper
* fix bug in vocabembedding
* support downstream model of whisper
* update readme
* Feature/chatglm (#4240)
* [shardformer] added tests
* [shardformer] vit test finish and support
* [shardformer] chatglm ready
* import chatglm
* [shardformer] add test kit in model zoo for chatglm
* [sharformer] add first version of policy of chatglm
* [shardformer] polish chatglm code
* [shardformer] polish code
* [shardformer] support chatglm without layernorm
* [shardformer] chatglm shard without mlp sharding
* [shardformer] delete some file
* [shardformer] ChatGLM support layernorm sharding
* [shardformer] register without auto policy
* [shardformer] pre-commit check files
* [shardformer] fix chatglm configuration with pre-commit
* [shardformer] added tests
* [shardformer] vit test finish and support
* import chatglm
* [shardformer] add test kit in model zoo for chatglm
* [sharformer] add first version of policy of chatglm
* [shardformer] polish chatglm code
* [shardformer] polish code
* [shardformer] support chatglm without layernorm
* [shardformer] delete some file
* [shardformer] ChatGLM support layernorm sharding
* [shardformer] register without auto policy
* [shardformer] pre-commit check files
* [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit
* [shardformer] support Blip2 (#4243)
* support base blip2
* add support for downstream blip2 model
* update readme
* add forward injection
* skip not compatible models test
* fix test for gemini and low_level_zero_pugin
* [shardformer] chatglm support flash attention and jit operator
* [shardformer] chatglm support flash attention and jit operator
* [shardformer] chatglm support flash attention and jit operator
* [shardformer] chatglm support flash attention and jit operator
---------
Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com>
Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: klhhhhh <1412841649@qq.com>
* [shardformer] vit support flash attention and jit operator (#4334)
* Feature/vit support (#4182)
* [shardformer] added tests
* [shardformer] vit test finish and support
* fix attention dropout
* [shardformer] support SAM (#4231)
* 1.support sam 2.add fused qkv for nn.Linear
* update utils support set element in list
* overtwrite SamVisionAttention foward to use DropoutForParallelInput
* remove unused code
* [shardformer] support whisper (#4212)
* support whisper
* fix bug in vocabembedding
* support downstream model of whisper
* update readme
* Feature/chatglm (#4240)
* [shardformer] added tests
* [shardformer] vit test finish and support
* [shardformer] chatglm ready
* import chatglm
* [shardformer] add test kit in model zoo for chatglm
* [sharformer] add first version of policy of chatglm
* [shardformer] polish chatglm code
* [shardformer] polish code
* [shardformer] support chatglm without layernorm
* [shardformer] chatglm shard without mlp sharding
* [shardformer] delete some file
* [shardformer] ChatGLM support layernorm sharding
* [shardformer] register without auto policy
* [shardformer] pre-commit check files
* [shardformer] fix chatglm configuration with pre-commit
* [shardformer] added tests
* [shardformer] vit test finish and support
* import chatglm
* [shardformer] add test kit in model zoo for chatglm
* [sharformer] add first version of policy of chatglm
* [shardformer] polish chatglm code
* [shardformer] polish code
* [shardformer] support chatglm without layernorm
* [shardformer] delete some file
* [shardformer] ChatGLM support layernorm sharding
* [shardformer] register without auto policy
* [shardformer] pre-commit check files
* [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit
* [shardformer] support Blip2 (#4243)
* support base blip2
* add support for downstream blip2 model
* update readme
* add forward injection
* skip not compatible models test
* fix test for gemini and low_level_zero_pugin
* [shardformer] vit support flash attention and jit operator
* [shardformer] vit support flash attention and jit operator
---------
Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com>
Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: klhhhhh <1412841649@qq.com>
* [pipeline] merge flash attention branch
* [pipeline] merge flash attention branch
* [pipeline] merge flash attention branch
* [pipeline] fix conflict
* [pipeline] fix conflict
* Merge branch 'feature/pipeline' into feature/pipeline
* Merge branch 'feature/pipeline' into feature/pipeline
* Merge branch 'feature/pipeline' into feature/pipeline
* activate checks
* activate checks
* activate checks
* activate checks
* activate checks
* activate checks
* activate checks
* activate checks
* fix flash attention tests
* gemini ignore whisper
* fix vit
* fix xformers import handle
---------
Co-authored-by: Frank Lee <somerlee.9@gmail.com>
Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com>
Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: klhhhhh <1412841649@qq.com>
* fix llama test
* fix test bug of bert, blip2, bloom, gpt2
* fix llama test
* fix opt test
* fix sam test
* fix sam test
* fix t5 test
* fix vit test
* fix whisper test
* fix whisper test
* polish code
* adjust allclose parameter
* Add mistakenly deleted code
* addjust allclose
* change loss function for some base model
* support base blip2
* add support for downstream blip2 model
* update readme
* add forward injection
* skip not compatible models test
* fix test for gemini and low_level_zero_pugin
* [shardformer] added tests
* [shardformer] vit test finish and support
* [shardformer] chatglm ready
* import chatglm
* [shardformer] add test kit in model zoo for chatglm
* [sharformer] add first version of policy of chatglm
* [shardformer] polish chatglm code
* [shardformer] polish code
* [shardformer] support chatglm without layernorm
* [shardformer] chatglm shard without mlp sharding
* [shardformer] delete some file
* [shardformer] ChatGLM support layernorm sharding
* [shardformer] register without auto policy
* [shardformer] pre-commit check files
* [shardformer] fix chatglm configuration with pre-commit
* 1.support sam 2.add fused qkv for nn.Linear
* update utils support set element in list
* overtwrite SamVisionAttention foward to use DropoutForParallelInput
* remove unused code
* add naive optimizer for 3DPlugin/refactor gpt2 shardformer test
* merge tests of PP/DP/TP combinations into one test file
* fix bug when sync grad for dp in HybridPlugin
* update supported precisions for 3DPlugin/fix bug when shifting tp_degree
* improve the passing of lazy_init
* modify lazy_init/use sync_shared_params
* Feature/vit support (#4182)
* [shardformer] added tests
* [shardformer] vit test finish and support
* fix attention dropout
* support base vit pipeline
* support vit downstream model
* fix vit shard test
* modify hidden states return type
---------
Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com>
* bloom policy
* llama pipeline forward and tests
* fix the output and attention_mask
* fix name
* bind argument to policy
* Revert "bloom policy"
This reverts commit 8dee68a0a2.
This policy should be revert and copied to feature/bloom
* revert the bloom changes
* cancel unneeded inputs
* gpt
* finish llama
* causal lm and sequence classification
* revision
* add pure pipeline test
* finish some bert models
* finish all bert models
* finish bert tests
* fix bugs
* fix bugs
* fix test pipeline
* fix data gen for qa
* update the set pipeline forward
* shared params
* fix bugs
* bloom policy
* llama pipeline forward and tests
* fix the output and attention_mask
* fix name
* bind argument to policy
* finish bloom model
* test shard gpt2
* clear cache
* bloom policy
* llama pipeline forward and tests
* fix the output and attention_mask
* fix name
* bind argument to policy
* Revert "bloom policy"
This reverts commit 8dee68a0a2.
This policy should be revert and copied to feature/bloom
* revert the bloom changes
* cancel unneeded inputs
* gpt
* finish llama
* causal lm and sequence classification
* revision
* bloom policy
* llama pipeline forward and tests
* fix the output and attention_mask
* fix name
* bind argument to policy
* Revert "bloom policy"
This reverts commit 8dee68a0a2.
This policy should be revert and copied to feature/bloom
* revert the bloom changes
* cancel unneeded inputs
* gpt
* pass gpt trace and meta_prop
* pass t5 trace and meta_prop
* [FX] refactor experimental tracer and adapt it with hf models
* pass all mainstream model zoo
* fix CI
* fix CI
* fix CI
* fix CI
* fix CI
* fix CI
* fix CI
* fix CI
* skip tests
* fix CI
* using packaging version
* polish