Commit Graph

12 Commits (5d9a49483d98ccd4bebebbfd039162caceefe6bd)

Author SHA1 Message Date
CjhHa1 bc9063adf1 resolve rebase conflicts on Branch feat/online-serving
7 months ago
Jianghai de378cd2ab [Inference] Finish Online Serving Test, add streaming output api, continuous batching test and example (#5432)
7 months ago
Yuanheng Zhao 537a3cbc4d
[kernel] Support New KCache Layout - Triton Kernel (#5677)
7 months ago
Yuanheng Zhao 5d4c1fe8f5
[Fix/Inference] Fix GQA Triton and Support Llama3 (#5624)
7 months ago
Jianghai 730103819d
[Inference]Fused kv copy into rotary calculation (#5383)
9 months ago
Frank Lee 8106ede07f
Revert "[Inference] Adapt to Fused rotary (#5348)" (#5373)
10 months ago
Jianghai 9f4ab2eb92
[Inference] Adapt to Fused rotary (#5348)
10 months ago
yuehuayingxueluo 35382a7fbf
[Inference]Fused the gate and up proj in mlp,and optimized the autograd process. (#5365)
10 months ago
Jianghai df0aa49585
[Inference] Kernel Fusion, fused copy kv cache into rotary embedding (#5336)
10 months ago
Jianghai 7ddd8b37f0
fix (#5311)
10 months ago
Jianghai c647e00e3c
[Inference]Add fused rotary kernel and get cos cache kernel (#5302)
10 months ago
Jianghai fded91d049 [Inference] Kernel: no pad rotary embedding (#5252)
11 months ago