flybird11111
2ddf624a86
[shardformer] upgrade transformers to 4.39.3 ( #5815 )
...
* [shardformer]upgrade transformers for gpt2/gptj/whisper (#5807 )
* [shardformer] fix modeling of gpt2 and gptj
* [shardformer] fix whisper modeling
* [misc] update requirements
---------
Co-authored-by: ver217 <lhx0217@gmail.com>
* [shardformer]upgrade transformers for mistral (#5808 )
* upgrade transformers for mistral
* fix
* fix
* [shardformer]upgrade transformers for llama (#5809 )
* update transformers
fix
* fix
* fix
* [inference] upgrade transformers (#5810 )
* update transformers
fix
* fix
* fix
* fix
* fix
* [gemini] update transformers for gemini (#5814 )
---------
Co-authored-by: ver217 <lhx0217@gmail.com>
2024-06-14 10:59:33 +08:00
Li Xingjian
8554585a5f
[Inference] Fix flash-attn import and add model test ( #5794 )
...
* Fix torch int32 dtype
Signed-off-by: char-1ee <xingjianli59@gmail.com>
* Fix flash-attn import
Signed-off-by: char-1ee <xingjianli59@gmail.com>
* Add generalized model test
Signed-off-by: char-1ee <xingjianli59@gmail.com>
* Remove exposed path to model
Signed-off-by: char-1ee <xingjianli59@gmail.com>
* Add default value for use_flash_attn
Signed-off-by: char-1ee <xingjianli59@gmail.com>
* Rename model test
Signed-off-by: char-1ee <xingjianli59@gmail.com>
---------
Signed-off-by: char-1ee <xingjianli59@gmail.com>
2024-06-12 14:13:50 +08:00
char-1ee
b303976a27
Fix test import
...
Signed-off-by: char-1ee <xingjianli59@gmail.com>
2024-06-10 02:03:30 +00:00
Hongxin Liu
68359ed1e1
[release] update version ( #5752 )
...
* [release] update version
* [devops] update compatibility test
* [devops] update compatibility test
* [devops] update compatibility test
* [devops] update compatibility test
* [test] fix ddp plugin test
* [test] fix gptj and rpc test
* [devops] fix cuda ext compatibility
* [inference] fix flash decoding test
* [inference] fix flash decoding test
2024-05-31 19:40:26 +08:00
Steve Luo
7806842f2d
add paged-attetionv2: support seq length split across thread block ( #5707 )
2024-05-14 12:46:54 +08:00
傅剑寒
50104ab340
[Inference/Feat] Add convert_fp8 op for fp8 test in the future ( #5706 )
...
* add convert_fp8 op for fp8 test in the future
* rerun ci
2024-05-10 18:39:54 +08:00
Yuanheng Zhao
55cc7f3df7
[Fix] Fix Inference Example, Tests, and Requirements ( #5688 )
...
* clean requirements
* modify example inference struct
* add test ci scripts
* mark test_infer as submodule
* rm deprecated cls & deps
* import of HAS_FLASH_ATTN
* prune inference tests to be run
* prune triton kernel tests
* increment pytest timeout mins
* revert import path in openmoe
2024-05-08 11:30:15 +08:00
Yuanheng Zhao
8754abae24
[Fix] Fix & Update Inference Tests (compatibility w/ main)
2024-05-05 16:28:56 +00:00