Commit Graph

5 Commits (e16ccc272a0fd25ee8b6b6856c6b90cf8bcfb6b0)

Author SHA1 Message Date
Hongxin Liu 68359ed1e1
[release] update version (#5752)
* [release] update version

* [devops] update compatibility test

* [devops] update compatibility test

* [devops] update compatibility test

* [devops] update compatibility test

* [test] fix ddp plugin test

* [test] fix gptj and rpc test

* [devops] fix cuda ext compatibility

* [inference] fix flash decoding test

* [inference] fix flash decoding test
2024-05-31 19:40:26 +08:00
Steve Luo 7806842f2d
add paged-attetionv2: support seq length split across thread block (#5707) 2024-05-14 12:46:54 +08:00
傅剑寒 50104ab340
[Inference/Feat] Add convert_fp8 op for fp8 test in the future (#5706)
* add convert_fp8 op for fp8 test in the future

* rerun ci
2024-05-10 18:39:54 +08:00
Yuanheng Zhao 55cc7f3df7
[Fix] Fix Inference Example, Tests, and Requirements (#5688)
* clean requirements

* modify example inference struct

* add test ci scripts

* mark test_infer as submodule

* rm deprecated cls & deps

* import of HAS_FLASH_ATTN

* prune inference tests to be run

* prune triton kernel tests

* increment pytest timeout mins

* revert import path in openmoe
2024-05-08 11:30:15 +08:00
Yuanheng Zhao 8754abae24 [Fix] Fix & Update Inference Tests (compatibility w/ main) 2024-05-05 16:28:56 +00:00