Commit Graph

11 Commits (f7aecc0c6bac001d10c1dd00274e0152e4c86df6)

Author SHA1 Message Date
Steve Luo f7aecc0c6b
feat rmsnorm cuda kernel and add unittest, benchmark script (#5417) 2024-03-08 16:21:12 +08:00
xs_courtesy 95c21498d4 add silu_and_mul for infer 2024-03-07 16:57:49 +08:00
yuehuayingxueluo 0aa27f1961
[Inference]Move benchmark-related code to the example directory. (#5408)
* move benchmark-related code to the example directory.

* fix bugs in test_fused_rotary_embedding.py
2024-02-28 16:46:03 +08:00
yuehuayingxueluo 600881a8ea
[Inference]Add CUDA KVCache Kernel (#5406)
* add cuda KVCache kernel

* annotation benchmark_kvcache_copy

* add use cuda

* fix import path

* move benchmark scripts to example/

* rm benchmark codes in test_kv_cache_memcpy.py

* rm redundancy codes

* rm redundancy codes

* pr was modified according to the review
2024-02-28 14:36:50 +08:00
Yuanheng Zhao 19061188c3
[Infer/Fix] Fix Dependency in test - RMSNorm kernel (#5399)
fix dependency in pytest
2024-02-26 16:17:47 +08:00
yuehuayingxueluo 2a718c8be8
Optimized the execution interval time between cuda kernels caused by view and memcopy (#5390)
* opt_view_and_memcopy

* fix bugs in ci

* fix ci bugs

* update benchmark scripts

* fix ci bugs
2024-02-21 13:23:57 +08:00
Jianghai 730103819d
[Inference]Fused kv copy into rotary calculation (#5383)
* revise rotary embedding

* remove useless print

* adapt

* fix

* add

* fix

* modeling

* fix

* fix

* fix

* fused kv copy

* fused copy

* colossalai/kernel/triton/no_pad_rotary_embedding.py

* del padding llama

* del
2024-02-21 11:31:48 +08:00
yuehuayingxueluo 6fb4bcbb24
[Inference/opt] Fused KVCahce Memcopy (#5374)
* fused kv memcopy

* add TODO in test_kvcache_copy.py
2024-02-07 17:15:42 +08:00
Frank Lee 8106ede07f
Revert "[Inference] Adapt to Fused rotary (#5348)" (#5373)
This reverts commit 9f4ab2eb92.
2024-02-07 14:27:04 +08:00
Jianghai 9f4ab2eb92
[Inference] Adapt to Fused rotary (#5348)
* revise rotary embedding

* remove useless print

* adapt

* fix

* add

* fix

* modeling

* fix

* fix

* fix
2024-02-07 11:36:04 +08:00
Frank Lee e76acbb076
[inference] moved ops tests to test_infer (#5354) 2024-02-02 13:51:22 +08:00