Commit Graph

10 Commits (89a9a600bc4802c912b0ed48d48f70bbcdd8142b)

Author SHA1 Message Date
pre-commit-ci[bot] 7c2f79fa98
[pre-commit.ci] pre-commit autoupdate (#5572)
5 months ago
yuehuayingxueluo b45000f839
[Inference]Add Streaming LLM (#5745)
6 months ago
yuehuayingxueluo de4bf3dedf
[Inference]Adapt repetition_penalty and no_repeat_ngram_size (#5708)
7 months ago
Jianghai 69cd7e069d [Inference] ADD async and sync Api server using FastAPI (#5396)
7 months ago
Yuanheng Zhao 5d4c1fe8f5
[Fix/Inference] Fix GQA Triton and Support Llama3 (#5624)
7 months ago
Yuanheng Zhao e60d430cf5 [Fix] resolve conflicts of rebasing feat/speculative-decoding (#5557)
8 months ago
Yuanheng Zhao 912e24b2aa [SpecDec] Fix inputs for speculation and revise past KV trimming (#5449)
8 months ago
Yuanheng Zhao a37f82629d [Inference/SpecDec] Add Speculative Decoding Implementation (#5423)
8 months ago
yuehuayingxueluo bc1da87366
[Fix/Inference] Fix format of input prompts and input model in inference engine (#5395)
9 months ago
Yuanheng Zhao b21aac5bae
[Inference] Optimize and Refactor Inference Batching/Scheduling (#5367)
9 months ago