Commit Graph

15 Commits (2a718c8be89918ec70b88f1f059148a7294dbccb)

Author SHA1 Message Date
Yuanheng Zhao b21aac5bae
[Inference] Optimize and Refactor Inference Batching/Scheduling (#5367)
9 months ago
Frank Lee 027aa1043f
[doc] updated inference readme (#5343)
10 months ago
Yuanheng Zhao 5f98a9d68a
[Infer] Optimize Blocked KVCache And Kernels Using It (#5325)
10 months ago
yuehuayingxueluo 4f28cb43c0
[inference]Optimize the usage of the mid tensors space in flash attn (#5304)
10 months ago
Jianghai d8db500efc
[Inference] Fix request handler and add recycle logic (#5260)
11 months ago
yuehuayingxueluo d40eb26029 fix bugs in request_handler.py and engine.py
11 months ago
yuehuayingxueluo fa4fbdbffb adapted to pad_context_forward
11 months ago
yuehuayingxueluo 62fd08ee44 Fixed a bug in the inference frame
11 months ago
yuehuayingxueluo 86853a37d5 Add padding llama model
11 months ago
Jianghai 0e616462a7 [Inference] add logit processor and request handler (#5166)
11 months ago
yuehuayingxueluo 8daee26989 [Inference] Add the logic of the inference engine (#5173)
11 months ago
Jianghai 93aeacca34 [Inference]Update inference config and fix test (#5178)
11 months ago
Yuanheng Zhao 3de2e62299 [Inference] Add CacheBlock and KV-Cache Manager (#5156)
11 months ago
Jianghai 4cf4682e70 [Inference] First PR for rebuild colossal-infer (#5143)
11 months ago
Xu Kai fd6482ad8c
[inference] Refactor inference architecture (#5057)
1 year ago