Li Xingjian
|
8554585a5f
|
[Inference] Fix flash-attn import and add model test (#5794)
* Fix torch int32 dtype
Signed-off-by: char-1ee <xingjianli59@gmail.com>
* Fix flash-attn import
Signed-off-by: char-1ee <xingjianli59@gmail.com>
* Add generalized model test
Signed-off-by: char-1ee <xingjianli59@gmail.com>
* Remove exposed path to model
Signed-off-by: char-1ee <xingjianli59@gmail.com>
* Add default value for use_flash_attn
Signed-off-by: char-1ee <xingjianli59@gmail.com>
* Rename model test
Signed-off-by: char-1ee <xingjianli59@gmail.com>
---------
Signed-off-by: char-1ee <xingjianli59@gmail.com>
|
2024-06-12 14:13:50 +08:00 |
char-1ee
|
f5981e808e
|
Remove flash attention backend
Signed-off-by: char-1ee <xingjianli59@gmail.com>
|
2024-06-07 10:02:19 +00:00 |
char-1ee
|
5f398fc000
|
Pass inference model shard configs for module init
Signed-off-by: char-1ee <xingjianli59@gmail.com>
|
2024-06-07 08:33:52 +00:00 |
char-1ee
|
eec77e5702
|
Fix tests and naming
Signed-off-by: char-1ee <xingjianli59@gmail.com>
|
2024-06-07 08:33:47 +00:00 |
char-1ee
|
04386d9eff
|
Refactor modeling by adding attention backend
Signed-off-by: char-1ee <xingjianli59@gmail.com>
|
2024-06-07 08:33:47 +00:00 |