Commit Graph

35 Commits (19d1510ea26d10484a804eb62f6d03dbcc7c80a8)

Author SHA1 Message Date
Runyu Lu cba20525a8
[Feat] Diffusion Model(PixArtAlpha/StableDiffusion3) Support (#5838)
5 months ago
yuehuayingxueluo b45000f839
[Inference]Add Streaming LLM (#5745)
6 months ago
Runyu Lu 18d67d0e8e
[Feat]Inference RPC Server Support (#5705)
7 months ago
yuehuayingxueluo de4bf3dedf
[Inference]Adapt repetition_penalty and no_repeat_ngram_size (#5708)
7 months ago
Jianghai 61a1b2e798 [Inference] Fix bugs and docs for feat/online-server (#5598)
7 months ago
Jianghai de378cd2ab [Inference] Finish Online Serving Test, add streaming output api, continuous batching test and example (#5432)
7 months ago
Jianghai 69cd7e069d [Inference] ADD async and sync Api server using FastAPI (#5396)
7 months ago
yuehuayingxueluo 9c2fe7935f
[Inference]Adapt temperature processing logic (#5689)
7 months ago
Yuanheng Zhao 5d4c1fe8f5
[Fix/Inference] Fix GQA Triton and Support Llama3 (#5624)
7 months ago
Runyu Lu e37ee2fb65
[Feat]Tensor Model Parallel Support For Inference (#5563)
7 months ago
Yuanheng Zhao 912e24b2aa [SpecDec] Fix inputs for speculation and revise past KV trimming (#5449)
8 months ago
Yuanheng Zhao a37f82629d [Inference/SpecDec] Add Speculative Decoding Implementation (#5423)
8 months ago
傅剑寒 e6496dd371
[Inference] Optimize request handler of llama (#5512)
8 months ago
Yuanheng Zhao b21aac5bae
[Inference] Optimize and Refactor Inference Batching/Scheduling (#5367)
9 months ago
Jianghai 1f8c7e7046
[Inference] User Experience: update the logic of default tokenizer and generation config. (#5337)
10 months ago
Frank Lee 027aa1043f
[doc] updated inference readme (#5343)
10 months ago
Frank Lee db1a763307
[inference] removed redundancy init_batch (#5353)
10 months ago
yuehuayingxueluo 4f28cb43c0
[inference]Optimize the usage of the mid tensors space in flash attn (#5304)
10 months ago
Jianghai 9e2342bde2
[Hotfix] Fix bugs in testing continuous batching (#5270)
11 months ago
yuehuayingxueluo 86b63f720c
[Inference]Adapted to the triton attn kernels (#5264)
11 months ago
Jianghai d8db500efc
[Inference] Fix request handler and add recycle logic (#5260)
11 months ago
FrankLeeeee 1ded7e81ef [git] fixed rebased files
11 months ago
yuehuayingxueluo d40eb26029 fix bugs in request_handler.py and engine.py
11 months ago
yuehuayingxueluo 10e3c9f923 rm torch.cuda.synchronize
11 months ago
yuehuayingxueluo fab294c7f4 fix CI bugs
11 months ago
yuehuayingxueluo fa4fbdbffb adapted to pad_context_forward
11 months ago
yuehuayingxueluo 47e53eaa1c fix bugs in attention.py and request_handler.py
11 months ago
yuehuayingxueluo bbfebfb9fc fix bugs in sampler
11 months ago
yuehuayingxueluo 02c1bf8b2a add context_attention_unpadded
11 months ago
yuehuayingxueluo 62968588d1 fix bugs in request_handler
11 months ago
yuehuayingxueluo 62fd08ee44 Fixed a bug in the inference frame
11 months ago
Jianghai 0e616462a7 [Inference] add logit processor and request handler (#5166)
11 months ago
yuehuayingxueluo 8daee26989 [Inference] Add the logic of the inference engine (#5173)
11 months ago
Jianghai 56e75eeb06 [Inference] Add readme (roadmap) and fulfill request handler (#5147)
11 months ago
Jianghai 4cf4682e70 [Inference] First PR for rebuild colossal-infer (#5143)
11 months ago