Commit Graph

7 Commits (80c3c8789bdfc095618a0b725a4980cd575c6c6b)

Author SHA1 Message Date
Yuanheng Zhao 55cc7f3df7
[Fix] Fix Inference Example, Tests, and Requirements (#5688)
* clean requirements

* modify example inference struct

* add test ci scripts

* mark test_infer as submodule

* rm deprecated cls & deps

* import of HAS_FLASH_ATTN

* prune inference tests to be run

* prune triton kernel tests

* increment pytest timeout mins

* revert import path in openmoe
2024-05-08 11:30:15 +08:00
Yuanheng Zhao 8754abae24 [Fix] Fix & Update Inference Tests (compatibility w/ main) 2024-05-05 16:28:56 +00:00
Runyu Lu e37ee2fb65
[Feat]Tensor Model Parallel Support For Inference (#5563)
* tensor parallel support naive source

* [fix]precision, model load and refactor the framework

* add tp unit test

* docstring

* fix do_sample
2024-04-18 16:56:46 +08:00
Runyu Lu 9fe61b4475 [fix] 2024-03-25 11:37:58 +08:00
Runyu Lu aabc9fb6aa [feat] add use_cuda_kernel option 2024-03-19 13:24:25 +08:00
Runyu Lu ae24b4f025 diverse tests 2024-03-14 10:35:08 +08:00
Runyu Lu 1821a6dab0 [fix] pytest and fix dyn grid bug 2024-03-13 17:28:32 +08:00