Commit Graph

41 Commits (2d642eea0f92c7f7c1fb7bef3abdfdb0cb61d1bf)

Author SHA1 Message Date
pre-commit-ci[bot] 7c2f79fa98
[pre-commit.ci] pre-commit autoupdate (#5572)
5 months ago
傅剑寒 121d7ad629
[Inference] Delete duplicated copy_vector (#5716)
7 months ago
Steve Luo 7806842f2d
add paged-attetionv2: support seq length split across thread block (#5707)
7 months ago
傅剑寒 50104ab340
[Inference/Feat] Add convert_fp8 op for fp8 test in the future (#5706)
7 months ago
傅剑寒 1ace1065e6
[Inference/Feat] Add quant kvcache support for decode_kv_cache_memcpy (#5686)
7 months ago
Steve Luo 725fbd2ed0
[Inference] Remove unnecessary float4_ and rename float8_ to float8 (#5679)
7 months ago
傅剑寒 9df016fc45
[Inference] Fix quant bits order (#5681)
7 months ago
傅剑寒 ef8e4ffe31
[Inference/Feat] Add kvcache quant support for fused_rotary_embedding_cache_copy (#5680)
7 months ago
Steve Luo 5cd75ce4c7
[Inference/Kernel] refactor kvcache manager and rotary_embedding and kvcache_memcpy oper… (#5663)
7 months ago
傅剑寒 808ee6e4ad
[Inference/Feat] Feat quant kvcache step2 (#5674)
7 months ago
傅剑寒 8ccb6714e7
[Inference/Feat] Add kvcache quantization support for FlashDecoding (#5656)
7 months ago
Steve Luo a8fd3b0342
[Inference/Kernel] Optimize paged attention: Refactor key cache layout (#5643)
7 months ago
傅剑寒 279300dc5f
[Inference/Refactor] Refactor compilation mechanism and unified multi hw (#5613)
7 months ago
yuehuayingxueluo 12f10d5b0b
[Fix/Inference]Fix CUDA Rotary Rmbedding GQA (#5623)
7 months ago
Steve Luo ccf72797e3
feat baichuan2 rmsnorm whose hidden size equals to 5120 (#5611)
7 months ago
Steve Luo be396ad6cc
[Inference/Kernel] Add Paged Decoding kernel, sequence split within the same thread block (#5531)
7 months ago
傅剑寒 d4cb023b62
[Inference/Refactor] Delete Duplicated code and refactor vec_copy utils and reduce utils (#5593)
8 months ago
傅剑寒 a21912339a
refactor csrc (#5582)
8 months ago
pre-commit-ci[bot] d78817539e [pre-commit.ci] auto fixes from pre-commit.com hooks
8 months ago
傅剑寒 7ebdf48ac5
add cast and op_functor for cuda build-in types (#5546)
8 months ago
傅剑寒 a2878e39f4
[Inference] Add Reduce Utils (#5537)
8 months ago
yuehuayingxueluo 04aca9e55b
[Inference/Kernel]Add get_cos_and_sin Kernel (#5528)
8 months ago
yuehuayingxueluo 934e31afb2
The writing style of tail processing and the logic related to macro definitions have been optimized. (#5519)
8 months ago
yuehuayingxueluo 87079cffe8
[Inference]Support FP16/BF16 Flash Attention 2 And Add high_precision Flag To Rotary Embedding (#5461)
8 months ago
傅剑寒 7ff42cc06d
add vec_type_trait implementation (#5473)
8 months ago
xs_courtesy 48c4f29b27 refactor vector utils
8 months ago
xs_courtesy 5724b9e31e add some comments
9 months ago
xs_courtesy 388e043930 add implementatino for GetGPULaunchConfig1D
9 months ago
yuehuayingxueluo f366a5ea1f
[Inference/kernel]Add Fused Rotary Embedding and KVCache Memcopy CUDA Kernel (#5418)
9 months ago
Steve Luo ed431de4e4
fix rmsnorm template function invocation problem(template function partial specialization is not allowed in Cpp) and luckily pass e2e precision test (#5454)
9 months ago
xs_courtesy c1c45e9d8e fix include path
9 months ago
Steve Luo b699f54007
optimize rmsnorm: add vectorized elementwise op, feat loop unrolling (#5441)
9 months ago
xs_courtesy 095c070a6e refactor code
9 months ago
傅剑寒 21e1e3645c
Merge pull request #5435 from Courtesy-Xs/add_gpu_launch_config
9 months ago
Steve Luo f7aecc0c6b
feat rmsnorm cuda kernel and add unittest, benchmark script (#5417)
9 months ago
xs_courtesy 5eb5ff1464 refactor code
9 months ago
xs_courtesy 01d289d8e5 Merge branch 'feature/colossal-infer' of https://github.com/hpcaitech/ColossalAI into add_gpu_launch_config
9 months ago
xs_courtesy a46598ac59 add reusable utils for cuda
9 months ago
xs_courtesy 95c21498d4 add silu_and_mul for infer
9 months ago
yuehuayingxueluo 600881a8ea
[Inference]Add CUDA KVCache Kernel (#5406)
9 months ago
Frank Lee 7cfed5f076
[feat] refactored extension module (#5298)
10 months ago