傅剑寒
279300dc5f
[Inference/Refactor] Refactor compilation mechanism and unified multi hw ( #5613 )
...
* refactor compilation mechanism and unified multi hw
* fix file path bug
* add init.py to make pybind a module to avoid relative path error caused by softlink
* delete duplicated micros
* fix micros bug in gcc
2024-04-24 14:17:54 +08:00
傅剑寒
a2878e39f4
[Inference] Add Reduce Utils ( #5537 )
...
* add reduce utils
* add using to delele namespace prefix
2024-04-01 15:34:25 +08:00
yuehuayingxueluo
934e31afb2
The writing style of tail processing and the logic related to macro definitions have been optimized. ( #5519 )
2024-03-28 10:42:51 +08:00
yuehuayingxueluo
87079cffe8
[Inference]Support FP16/BF16 Flash Attention 2 And Add high_precision Flag To Rotary Embedding ( #5461 )
...
* Support FP16/BF16 Flash Attention 2
* fix bugs in test_kv_cache_memcpy.py
* add context_kv_cache_memcpy_kernel.cu
* rm typename MT
* add tail process
* add high_precision
* add high_precision to config.py
* rm unused code
* change the comment for the high_precision parameter
* update test_rotary_embdding_unpad.py
* fix vector_copy_utils.h
* add comment for self.high_precision when using float32
2024-03-25 13:40:34 +08:00
xs_courtesy
5eb5ff1464
refactor code
2024-03-08 15:41:14 +08:00