8 Commits (d10ee42f68d090db17a8b87cac46ab6d1c2c8ca2)

Author SHA1 Message Date
Xu Kai fd6482ad8c
[inference] Refactor inference architecture (#5057) 1 year ago
Cuiqing Li (李崔卿) 28052a71fb
[Kernels]Update triton kernels into 2.1.0 (#5046) 1 year ago
Xuanlei Zhao dc003c304c
[moe] merge moe into main (#4978) 1 year ago
Cuiqing Li 3a41e8304e
[Refactor] Integrated some lightllm kernels into token-attention (#4946) 1 year ago
Xu Kai d1fcc0fa4d
[infer] fix test bug (#4838) 1 year ago
Jianghai ce7ade3882
[inference] chatglm2 infer demo (#4724) 1 year ago
Hongxin Liu 079bf3cb26
[misc] update pre-commit and run all files (#4752) 1 year ago
Cuiqing Li bce0f16702
[Feature] The first PR to Add TP inference engine, kv-cache manager and related kernels for our inference system (#4577) 1 year ago