Commit Graph

14 Commits (cf519dac6a5799b8f314aac6f510e2a98d3af9c6)

Author SHA1 Message Date
Steve Luo 725fbd2ed0
[Inference] Remove unnecessary float4_ and rename float8_ to float8 (#5679)
7 months ago
傅剑寒 ef8e4ffe31
[Inference/Feat] Add kvcache quant support for fused_rotary_embedding_cache_copy (#5680)
7 months ago
傅剑寒 8ccb6714e7
[Inference/Feat] Add kvcache quantization support for FlashDecoding (#5656)
7 months ago
傅剑寒 279300dc5f
[Inference/Refactor] Refactor compilation mechanism and unified multi hw (#5613)
7 months ago
傅剑寒 a2878e39f4
[Inference] Add Reduce Utils (#5537)
8 months ago
yuehuayingxueluo 934e31afb2
The writing style of tail processing and the logic related to macro definitions have been optimized. (#5519)
8 months ago
yuehuayingxueluo 87079cffe8
[Inference]Support FP16/BF16 Flash Attention 2 And Add high_precision Flag To Rotary Embedding (#5461)
8 months ago
傅剑寒 7ff42cc06d
add vec_type_trait implementation (#5473)
8 months ago
xs_courtesy 48c4f29b27 refactor vector utils
8 months ago
xs_courtesy 388e043930 add implementatino for GetGPULaunchConfig1D
9 months ago
yuehuayingxueluo f366a5ea1f
[Inference/kernel]Add Fused Rotary Embedding and KVCache Memcopy CUDA Kernel (#5418)
9 months ago
Steve Luo b699f54007
optimize rmsnorm: add vectorized elementwise op, feat loop unrolling (#5441)
9 months ago
xs_courtesy 5eb5ff1464 refactor code
9 months ago
xs_courtesy a46598ac59 add reusable utils for cuda
9 months ago