Commit Graph

6 Commits (82aecd637481d00eac9df00f10aef270498e4d66)

Author SHA1 Message Date
Yuanheng Zhao 8754abae24 [Fix] Fix & Update Inference Tests (compatibility w/ main)
7 months ago
Yuanheng Zhao b21aac5bae
[Inference] Optimize and Refactor Inference Batching/Scheduling (#5367)
9 months ago
yuehuayingxueluo 4f28cb43c0
[inference]Optimize the usage of the mid tensors space in flash attn (#5304)
10 months ago
yuehuayingxueluo fab294c7f4 fix CI bugs
11 months ago
Jianghai e545a871b8 [Hotfix] Fix accuracy and align attention method api with Triton kernel (#5229)
11 months ago
Jianghai 0e616462a7 [Inference] add logit processor and request handler (#5166)
11 months ago