Commit Graph

7 Commits (62cdac6b7b655e11626382d64e56503146a516ee)

Author SHA1 Message Date
Runyu Lu b2c0d9ff2b [fix] multi graphs capture error 2024-03-11 10:49:31 +08:00
Runyu Lu cefaeb5fdd [feat] cuda graph support and refactor non-functional api 2024-03-08 14:19:35 +08:00
yuehuayingxueluo 2a718c8be8
Optimized the execution interval time between cuda kernels caused by view and memcopy (#5390)
* opt_view_and_memcopy

* fix bugs in ci

* fix ci bugs

* update benchmark scripts

* fix ci bugs
2024-02-21 13:23:57 +08:00
yuehuayingxueluo 35382a7fbf
[Inference]Fused the gate and up proj in mlp,and optimized the autograd process. (#5365)
* fused the gate and up proj in mlp

* fix code styles

* opt auto_grad

* rollback test_inference_engine.py

* modifications based on the review feedback.

* fix bugs in flash attn

* Change reshape to view

* fix test_rmsnorm_triton.py
2024-02-06 19:38:25 +08:00
yuehuayingxueluo 21ad4a27f9
[Inference/opt]Optimize the mid tensor of RMS Norm (#5350)
* opt rms_norm

* fix bugs in rms_layernorm
2024-02-02 15:06:01 +08:00
Jianghai 1f8a75d470
[Inference] Update rms norm kernel, benchmark with vLLM (#5315)
* add

* xi

* del

* del

* fix
2024-01-29 10:22:33 +08:00
Yaozheng Fang 5ae9099f92
[kernel] Add RMSLayerNorm triton kernel (#5262)
* add layerrmsnorm triton kernel

* add layerrmsnorm kernel

* modify the atol and rtol in test file

* Remove the logics of mean computations, and update the name of ther kernel functions and files

* add benchmark of rms norm
2024-01-18 10:21:03 +08:00