43 Commits (colossalchat)

Author SHA1 Message Date
Runyu Lu bcf0181ecd
[Feat] Distrifusion Acceleration Support for Diffusion Inference (#5895) 4 months ago
Runyu Lu cba20525a8
[Feat] Diffusion Model(PixArtAlpha/StableDiffusion3) Support (#5838) 5 months ago
pre-commit-ci[bot] 7c2f79fa98
[pre-commit.ci] pre-commit autoupdate (#5572) 5 months ago
char-1ee 5f398fc000 Pass inference model shard configs for module init 6 months ago
char-1ee 04386d9eff Refactor modeling by adding attention backend 6 months ago
yuehuayingxueluo b45000f839
[Inference]Add Streaming LLM (#5745) 6 months ago
Yuanheng Zhao 283c407a19
[Inference] Fix Inference Generation Config and Sampling (#5710) 6 months ago
Runyu Lu 18d67d0e8e
[Feat]Inference RPC Server Support (#5705) 6 months ago
yuehuayingxueluo de4bf3dedf
[Inference]Adapt repetition_penalty and no_repeat_ngram_size (#5708) 7 months ago
傅剑寒 bfad39357b
[Inference/Feat] Add quant kvcache interface (#5700) 7 months ago
Jianghai 61a1b2e798 [Inference] Fix bugs and docs for feat/online-server (#5598) 7 months ago
Jianghai 69cd7e069d [Inference] ADD async and sync Api server using FastAPI (#5396) 7 months ago
yuehuayingxueluo d482922035
[Inference] Support the logic related to ignoring EOS token (#5693) 7 months ago
yuehuayingxueluo 5f00002e43
[Inference] Adapt Baichuan2-13B TP (#5659) 7 months ago
yuehuayingxueluo 56b222eff8
[inference/model]Adapted to the baichuan2-7B model (#5591) 7 months ago
Yuanheng Zhao e60d430cf5 [Fix] resolve conflicts of rebasing feat/speculative-decoding (#5557) 8 months ago
Yuanheng Zhao d85d91435a [Inference/SpecDec] Support GLIDE Drafter Model (#5455) 8 months ago
Yuanheng Zhao a37f82629d [Inference/SpecDec] Add Speculative Decoding Implementation (#5423) 8 months ago
Runyu Lu 6251d68dc9
[fix] PR #5354 (#5501) 8 months ago
yuehuayingxueluo 87079cffe8
[Inference]Support FP16/BF16 Flash Attention 2 And Add high_precision Flag To Rotary Embedding (#5461) 8 months ago
Runyu Lu ff4998c6f3 [fix] remove unused comment 8 months ago
Runyu Lu aabc9fb6aa [feat] add use_cuda_kernel option 8 months ago
Runyu Lu 1821a6dab0 [fix] pytest and fix dyn grid bug 8 months ago
Runyu Lu b2c0d9ff2b [fix] multi graphs capture error 9 months ago
Runyu Lu cefaeb5fdd [feat] cuda graph support and refactor non-functional api 9 months ago
Yuanheng Zhao b21aac5bae
[Inference] Optimize and Refactor Inference Batching/Scheduling (#5367) 9 months ago
Frank Lee 9afa52061f
[inference] refactored config (#5376) 10 months ago
Jianghai 1f8c7e7046
[Inference] User Experience: update the logic of default tokenizer and generation config. (#5337) 10 months ago
Frank Lee 58740b5f68
[inference] added inference template (#5375) 10 months ago
Frank Lee f8e456d202
[inference] simplified config verification (#5346) 10 months ago
yuehuayingxueluo e8f0642f28
[Inference]Add Nopadding Llama Modeling (#5327) 10 months ago
yuehuayingxueluo 4f28cb43c0
[inference]Optimize the usage of the mid tensors space in flash attn (#5304) 10 months ago
yuehuayingxueluo d40eb26029 fix bugs in request_handler.py and engine.py 11 months ago
yuehuayingxueluo fa4fbdbffb adapted to pad_context_forward 11 months ago
yuehuayingxueluo 62fd08ee44 Fixed a bug in the inference frame 11 months ago
yuehuayingxueluo 86853a37d5 Add padding llama model 11 months ago
Jianghai 0e616462a7 [Inference] add logit processor and request handler (#5166) 11 months ago
yuehuayingxueluo 8daee26989 [Inference] Add the logic of the inference engine (#5173) 11 months ago
Jianghai 93aeacca34 [Inference]Update inference config and fix test (#5178) 11 months ago
Yuanheng Zhao 3de2e62299 [Inference] Add CacheBlock and KV-Cache Manager (#5156) 11 months ago
yuehuayingxueluo fab9b931d9 [Inference]Add BatchInferState, Sequence and InferConfig (#5149) 11 months ago