Commit Graph

38 Commits (73e88a5553235897dc92f10fa9704b531f1e2959)

Author SHA1 Message Date
yuehuayingxueluo b45000f839
[Inference]Add Streaming LLM (#5745)
6 months ago
Yuanheng Zhao 283c407a19
[Inference] Fix Inference Generation Config and Sampling (#5710)
6 months ago
Runyu Lu 18d67d0e8e
[Feat]Inference RPC Server Support (#5705)
7 months ago
yuehuayingxueluo de4bf3dedf
[Inference]Adapt repetition_penalty and no_repeat_ngram_size (#5708)
7 months ago
傅剑寒 bfad39357b
[Inference/Feat] Add quant kvcache interface (#5700)
7 months ago
Jianghai 61a1b2e798 [Inference] Fix bugs and docs for feat/online-server (#5598)
7 months ago
Jianghai 69cd7e069d [Inference] ADD async and sync Api server using FastAPI (#5396)
7 months ago
yuehuayingxueluo d482922035
[Inference] Support the logic related to ignoring EOS token (#5693)
7 months ago
yuehuayingxueluo 5f00002e43
[Inference] Adapt Baichuan2-13B TP (#5659)
7 months ago
yuehuayingxueluo 56b222eff8
[inference/model]Adapted to the baichuan2-7B model (#5591)
8 months ago
Yuanheng Zhao e60d430cf5 [Fix] resolve conflicts of rebasing feat/speculative-decoding (#5557)
8 months ago
Yuanheng Zhao d85d91435a [Inference/SpecDec] Support GLIDE Drafter Model (#5455)
8 months ago
Yuanheng Zhao a37f82629d [Inference/SpecDec] Add Speculative Decoding Implementation (#5423)
8 months ago
Runyu Lu 6251d68dc9
[fix] PR #5354 (#5501)
8 months ago
Runyu Lu 68e9396bc0 [fix] merge conflicts
8 months ago
yuehuayingxueluo 87079cffe8
[Inference]Support FP16/BF16 Flash Attention 2 And Add high_precision Flag To Rotary Embedding (#5461)
8 months ago
Runyu Lu ff4998c6f3 [fix] remove unused comment
8 months ago
Runyu Lu aabc9fb6aa [feat] add use_cuda_kernel option
8 months ago
Runyu Lu 1821a6dab0 [fix] pytest and fix dyn grid bug
9 months ago
Runyu Lu b2c0d9ff2b [fix] multi graphs capture error
9 months ago
Runyu Lu cefaeb5fdd [feat] cuda graph support and refactor non-functional api
9 months ago
Yuanheng Zhao b21aac5bae
[Inference] Optimize and Refactor Inference Batching/Scheduling (#5367)
9 months ago
Frank Lee 9afa52061f
[inference] refactored config (#5376)
10 months ago
Jianghai 1f8c7e7046
[Inference] User Experience: update the logic of default tokenizer and generation config. (#5337)
10 months ago
Frank Lee 58740b5f68
[inference] added inference template (#5375)
10 months ago
Frank Lee f8e456d202
[inference] simplified config verification (#5346)
10 months ago
yuehuayingxueluo e8f0642f28
[Inference]Add Nopadding Llama Modeling (#5327)
10 months ago
yuehuayingxueluo 4f28cb43c0
[inference]Optimize the usage of the mid tensors space in flash attn (#5304)
10 months ago
yuehuayingxueluo d40eb26029 fix bugs in request_handler.py and engine.py
11 months ago
yuehuayingxueluo fa4fbdbffb adapted to pad_context_forward
11 months ago
yuehuayingxueluo 62fd08ee44 Fixed a bug in the inference frame
11 months ago
yuehuayingxueluo 86853a37d5 Add padding llama model
11 months ago
Jianghai 0e616462a7 [Inference] add logit processor and request handler (#5166)
11 months ago
yuehuayingxueluo 8daee26989 [Inference] Add the logic of the inference engine (#5173)
11 months ago
Jianghai 93aeacca34 [Inference]Update inference config and fix test (#5178)
11 months ago
yuehuayingxueluo fab9b931d9 [Inference]Add BatchInferState, Sequence and InferConfig (#5149)
11 months ago
Jianghai 56e75eeb06 [Inference] Add readme (roadmap) and fulfill request handler (#5147)
11 months ago
Jianghai 4cf4682e70 [Inference] First PR for rebuild colossal-infer (#5143)
11 months ago