24 Commits (8241c0c054b38a109ed3ce7be1052a1e600b8471)

Author SHA1 Message Date
pre-commit-ci[bot] 7c2f79fa98
[pre-commit.ci] pre-commit autoupdate (#5572) 5 months ago
yuehuayingxueluo b45000f839
[Inference]Add Streaming LLM (#5745) 6 months ago
Runyu Lu 18d67d0e8e
[Feat]Inference RPC Server Support (#5705) 6 months ago
傅剑寒 bfad39357b
[Inference/Feat] Add quant kvcache interface (#5700) 7 months ago
Yuanheng Zhao f9afe0addd
[hotfix] Fix KV Heads Number Assignment in KVCacheManager (#5695) 7 months ago
Steve Luo 5cd75ce4c7
[Inference/Kernel] refactor kvcache manager and rotary_embedding and kvcache_memcpy oper… (#5663) 7 months ago
yuehuayingxueluo 3c91e3f176
[Inference]Adapt to baichuan2 13B (#5614) 7 months ago
Yuanheng Zhao 5d4c1fe8f5
[Fix/Inference] Fix GQA Triton and Support Llama3 (#5624) 7 months ago
Yuanheng Zhao a37f82629d [Inference/SpecDec] Add Speculative Decoding Implementation (#5423) 8 months ago
Yuanheng Zhao b21aac5bae
[Inference] Optimize and Refactor Inference Batching/Scheduling (#5367) 9 months ago
Frank Lee 027aa1043f
[doc] updated inference readme (#5343) 10 months ago
Yuanheng Zhao 5f98a9d68a
[Infer] Optimize Blocked KVCache And Kernels Using It (#5325) 10 months ago
yuehuayingxueluo 4f28cb43c0
[inference]Optimize the usage of the mid tensors space in flash attn (#5304) 10 months ago
Jianghai d8db500efc
[Inference] Fix request handler and add recycle logic (#5260) 10 months ago
yuehuayingxueluo d40eb26029 fix bugs in request_handler.py and engine.py 11 months ago
yuehuayingxueluo fa4fbdbffb adapted to pad_context_forward 11 months ago
yuehuayingxueluo 62fd08ee44 Fixed a bug in the inference frame 11 months ago
yuehuayingxueluo 86853a37d5 Add padding llama model 11 months ago
Jianghai 0e616462a7 [Inference] add logit processor and request handler (#5166) 11 months ago
yuehuayingxueluo 8daee26989 [Inference] Add the logic of the inference engine (#5173) 11 months ago
Jianghai 93aeacca34 [Inference]Update inference config and fix test (#5178) 11 months ago
Yuanheng Zhao 3de2e62299 [Inference] Add CacheBlock and KV-Cache Manager (#5156) 11 months ago
Jianghai 4cf4682e70 [Inference] First PR for rebuild colossal-infer (#5143) 11 months ago
Xu Kai fd6482ad8c
[inference] Refactor inference architecture (#5057) 1 year ago