23 Commits (bd38fe6b912379080673a43d77fd3bdf0e5c852e)

Author SHA1 Message Date
Jianghai c064032865 [Online Server] Chat Api for streaming and not streaming response (#5470) 7 months ago
Jianghai de378cd2ab [Inference] Finish Online Serving Test, add streaming output api, continuous batching test and example (#5432) 7 months ago
Jianghai 69cd7e069d [Inference] ADD async and sync Api server using FastAPI (#5396) 7 months ago
yuehuayingxueluo d482922035
[Inference] Support the logic related to ignoring EOS token (#5693) 7 months ago
Yuanheng Zhao 55cc7f3df7
[Fix] Fix Inference Example, Tests, and Requirements (#5688) 7 months ago
Yuanheng Zhao 5d4c1fe8f5
[Fix/Inference] Fix GQA Triton and Support Llama3 (#5624) 7 months ago
yuehuayingxueluo bc1da87366
[Fix/Inference] Fix format of input prompts and input model in inference engine (#5395) 9 months ago
Yuanheng Zhao b21aac5bae
[Inference] Optimize and Refactor Inference Batching/Scheduling (#5367) 9 months ago
Frank Lee db1a763307
[inference] removed redundancy init_batch (#5353) 10 months ago
yuehuayingxueluo e8f0642f28
[Inference]Add Nopadding Llama Modeling (#5327) 10 months ago
yuehuayingxueluo 4f28cb43c0
[inference]Optimize the usage of the mid tensors space in flash attn (#5304) 10 months ago
Jianghai 9e2342bde2
[Hotfix] Fix bugs in testing continuous batching (#5270) 10 months ago
yuehuayingxueluo 86b63f720c
[Inference]Adapted to the triton attn kernels (#5264) 10 months ago
Jianghai d8db500efc
[Inference] Fix request handler and add recycle logic (#5260) 10 months ago
yuehuayingxueluo fa4fbdbffb adapted to pad_context_forward 11 months ago
yuehuayingxueluo 47e53eaa1c fix bugs in attention.py and request_handler.py 11 months ago
yuehuayingxueluo 9489dc64d8 precision alignment 11 months ago
yuehuayingxueluo 62968588d1 fix bugs in request_handler 11 months ago
yuehuayingxueluo 62fd08ee44 Fixed a bug in the inference frame 11 months ago
yuehuayingxueluo 86853a37d5 Add padding llama model 11 months ago
Jianghai 0e616462a7 [Inference] add logit processor and request handler (#5166) 11 months ago
yuehuayingxueluo 8daee26989 [Inference] Add the logic of the inference engine (#5173) 11 months ago
Jianghai 93aeacca34 [Inference]Update inference config and fix test (#5178) 11 months ago
yuehuayingxueluo fab9b931d9 [Inference]Add BatchInferState, Sequence and InferConfig (#5149) 11 months ago