12 Commits (80c3c8789bdfc095618a0b725a4980cd575c6c6b)

Author SHA1 Message Date
Yuanheng Zhao 8754abae24 [Fix] Fix & Update Inference Tests (compatibility w/ main) 7 months ago
Yuanheng Zhao b21aac5bae
[Inference] Optimize and Refactor Inference Batching/Scheduling (#5367) 9 months ago
Yuanheng Zhao 5f98a9d68a
[Infer] Optimize Blocked KVCache And Kernels Using It (#5325) 10 months ago
Jianghai e545a871b8 [Hotfix] Fix accuracy and align attention method api with Triton kernel (#5229) 11 months ago
Jianghai 0e616462a7 [Inference] add logit processor and request handler (#5166) 11 months ago
yuehuayingxueluo 8daee26989 [Inference] Add the logic of the inference engine (#5173) 11 months ago
Jianghai 93aeacca34 [Inference]Update inference config and fix test (#5178) 11 months ago
Yuanheng Zhao 3de2e62299 [Inference] Add CacheBlock and KV-Cache Manager (#5156) 11 months ago
Yuanheng Zhao 2bb92243d4 [Inference/NFC] Clean outdated inference tests and deprecated kernels (#5159) 11 months ago
Xu Kai fd6482ad8c
[inference] Refactor inference architecture (#5057) 1 year ago
Hongxin Liu 079bf3cb26
[misc] update pre-commit and run all files (#4752) 1 year ago
Cuiqing Li bce0f16702
[Feature] The first PR to Add TP inference engine, kv-cache manager and related kernels for our inference system (#4577) 1 year ago