Commit Graph

24 Commits (8cc8f645cd1d971a3bef52f625b7881f17c6d22b)

Author SHA1 Message Date
Runyu Lu cba20525a8
[Feat] Diffusion Model(PixArtAlpha/StableDiffusion3) Support (#5838)
5 months ago
Jianghai c064032865 [Online Server] Chat Api for streaming and not streaming response (#5470)
7 months ago
Jianghai de378cd2ab [Inference] Finish Online Serving Test, add streaming output api, continuous batching test and example (#5432)
7 months ago
Jianghai 69cd7e069d [Inference] ADD async and sync Api server using FastAPI (#5396)
7 months ago
yuehuayingxueluo d482922035
[Inference] Support the logic related to ignoring EOS token (#5693)
7 months ago
Yuanheng Zhao 55cc7f3df7
[Fix] Fix Inference Example, Tests, and Requirements (#5688)
7 months ago
Yuanheng Zhao 5d4c1fe8f5
[Fix/Inference] Fix GQA Triton and Support Llama3 (#5624)
7 months ago
yuehuayingxueluo bc1da87366
[Fix/Inference] Fix format of input prompts and input model in inference engine (#5395)
9 months ago
Yuanheng Zhao b21aac5bae
[Inference] Optimize and Refactor Inference Batching/Scheduling (#5367)
9 months ago
Frank Lee db1a763307
[inference] removed redundancy init_batch (#5353)
10 months ago
yuehuayingxueluo e8f0642f28
[Inference]Add Nopadding Llama Modeling (#5327)
10 months ago
yuehuayingxueluo 4f28cb43c0
[inference]Optimize the usage of the mid tensors space in flash attn (#5304)
10 months ago
Jianghai 9e2342bde2
[Hotfix] Fix bugs in testing continuous batching (#5270)
11 months ago
yuehuayingxueluo 86b63f720c
[Inference]Adapted to the triton attn kernels (#5264)
11 months ago
Jianghai d8db500efc
[Inference] Fix request handler and add recycle logic (#5260)
11 months ago
yuehuayingxueluo fa4fbdbffb adapted to pad_context_forward
11 months ago
yuehuayingxueluo 47e53eaa1c fix bugs in attention.py and request_handler.py
11 months ago
yuehuayingxueluo 9489dc64d8 precision alignment
11 months ago
yuehuayingxueluo 62968588d1 fix bugs in request_handler
11 months ago
yuehuayingxueluo 62fd08ee44 Fixed a bug in the inference frame
11 months ago
yuehuayingxueluo 86853a37d5 Add padding llama model
11 months ago
Jianghai 0e616462a7 [Inference] add logit processor and request handler (#5166)
11 months ago
yuehuayingxueluo 8daee26989 [Inference] Add the logic of the inference engine (#5173)
11 months ago
Jianghai 93aeacca34 [Inference]Update inference config and fix test (#5178)
11 months ago