Commit Graph

7 Commits (8fd25d6e09069a8437c6ebee8dd83e1de4c9b83d)

Author SHA1 Message Date
Runyu Lu b2c0d9ff2b [fix] multi graphs capture error
9 months ago
Runyu Lu cefaeb5fdd [feat] cuda graph support and refactor non-functional api
9 months ago
yuehuayingxueluo 2a718c8be8
Optimized the execution interval time between cuda kernels caused by view and memcopy (#5390)
9 months ago
yuehuayingxueluo 35382a7fbf
[Inference]Fused the gate and up proj in mlp,and optimized the autograd process. (#5365)
10 months ago
yuehuayingxueluo 21ad4a27f9
[Inference/opt]Optimize the mid tensor of RMS Norm (#5350)
10 months ago
Jianghai 1f8a75d470
[Inference] Update rms norm kernel, benchmark with vLLM (#5315)
10 months ago
Yaozheng Fang 5ae9099f92
[kernel] Add RMSLayerNorm triton kernel (#5262)
10 months ago