Commit Graph

20 Commits (2d1a785b71ca68789ac82536265f94563cd0db8e)

Author SHA1 Message Date
Jianghai 85946d4236
[Inference]Fix readme and example for API server (#5742)
6 months ago
binmakeswell 4647ec28c8
[inference] release (#5747)
6 months ago
Yuanheng Zhao d8b1ea4ac9
[doc] Update Inference Readme (#5736)
6 months ago
Yuanheng Zhao 55cc7f3df7
[Fix] Fix Inference Example, Tests, and Requirements (#5688)
7 months ago
Yuanheng Zhao e1acb58423 [doc] Add inference/speculative-decoding README (#5552)
8 months ago
Runyu Lu 5b017d6324 [fix]
8 months ago
Runyu Lu 633e95b301 [doc] add doc
9 months ago
Jianghai 1f8c7e7046
[Inference] User Experience: update the logic of default tokenizer and generation config. (#5337)
10 months ago
Frank Lee 027aa1043f
[doc] updated inference readme (#5343)
10 months ago
Jianghai c7c104cb7c
[DOC] Update inference readme (#5280)
10 months ago
Frank Lee c597678da4
[doc] updated inference readme (#5269)
11 months ago
Jianghai 4cf4682e70 [Inference] First PR for rebuild colossal-infer (#5143)
11 months ago
Zhongkai Zhao 75af66cd81
[Hotfix] Fix model policy matching strategy in ShardFormer (#5064)
1 year ago
Xu Kai fd6482ad8c
[inference] Refactor inference architecture (#5057)
1 year ago
Cuiqing Li (李崔卿) 28052a71fb
[Kernels]Update triton kernels into 2.1.0 (#5046)
1 year ago
Cuiqing Li (李崔卿) 4f0234f236
[doc]Update doc for colossal-inference (#4989)
1 year ago
Cuiqing Li 459a88c806
[Kernels]Updated Triton kernels into 2.1.0 and adding flash-decoding for llama token attention (#4965)
1 year ago
Cuiqing Li 3a41e8304e
[Refactor] Integrated some lightllm kernels into token-attention (#4946)
1 year ago
digger yu 11009103be
[nfc] fix some typo with colossalai/ docs/ etc. (#4920)
1 year ago
Cuiqing Li bce0f16702
[Feature] The first PR to Add TP inference engine, kv-cache manager and related kernels for our inference system (#4577)
1 year ago