Commit Graph

12 Commits (ed431de4e4f73584e6b9c11ab041ef54a8e83de6)

Author SHA1 Message Date
Steve Luo ed431de4e4
fix rmsnorm template function invocation problem(template function partial specialization is not allowed in Cpp) and luckily pass e2e precision test (#5454) 2024-03-13 16:00:55 +08:00
xs_courtesy c1c45e9d8e fix include path 2024-03-13 11:21:06 +08:00
Steve Luo b699f54007
optimize rmsnorm: add vectorized elementwise op, feat loop unrolling (#5441) 2024-03-12 17:48:02 +08:00
xs_courtesy 095c070a6e refactor code 2024-03-11 17:06:57 +08:00
傅剑寒 21e1e3645c
Merge pull request #5435 from Courtesy-Xs/add_gpu_launch_config
Add query and other components
2024-03-11 11:15:29 +08:00
Steve Luo f7aecc0c6b
feat rmsnorm cuda kernel and add unittest, benchmark script (#5417) 2024-03-08 16:21:12 +08:00
xs_courtesy 5eb5ff1464 refactor code 2024-03-08 15:41:14 +08:00
xs_courtesy 01d289d8e5 Merge branch 'feature/colossal-infer' of https://github.com/hpcaitech/ColossalAI into add_gpu_launch_config 2024-03-08 15:04:55 +08:00
xs_courtesy a46598ac59 add reusable utils for cuda 2024-03-08 14:53:29 +08:00
xs_courtesy 95c21498d4 add silu_and_mul for infer 2024-03-07 16:57:49 +08:00
yuehuayingxueluo 600881a8ea
[Inference]Add CUDA KVCache Kernel (#5406)
* add cuda KVCache kernel

* annotation benchmark_kvcache_copy

* add use cuda

* fix import path

* move benchmark scripts to example/

* rm benchmark codes in test_kv_cache_memcpy.py

* rm redundancy codes

* rm redundancy codes

* pr was modified according to the review
2024-02-28 14:36:50 +08:00
Frank Lee 7cfed5f076
[feat] refactored extension module (#5298)
* [feat] refactored extension module

* polish

* polish

* polish

* polish

* polish

* polish

* polish

* polish

* polish

* polish
2024-01-25 17:01:48 +08:00