ColossalAI

Commit Graph

Author	SHA1	Message	Date
pre-commit-ci[bot]	7c2f79fa98	[pre-commit.ci] pre-commit autoupdate (#5572 ) * [pre-commit.ci] pre-commit autoupdate updates: - [github.com/PyCQA/autoflake: v2.2.1 → v2.3.1](https://github.com/PyCQA/autoflake/compare/v2.2.1...v2.3.1) - [github.com/pycqa/isort: 5.12.0 → 5.13.2](https://github.com/pycqa/isort/compare/5.12.0...5.13.2) - [github.com/psf/black-pre-commit-mirror: 23.9.1 → 24.4.2](https://github.com/psf/black-pre-commit-mirror/compare/23.9.1...24.4.2) - [github.com/pre-commit/mirrors-clang-format: v13.0.1 → v18.1.7](https://github.com/pre-commit/mirrors-clang-format/compare/v13.0.1...v18.1.7) - [github.com/pre-commit/pre-commit-hooks: v4.3.0 → v4.6.0](https://github.com/pre-commit/pre-commit-hooks/compare/v4.3.0...v4.6.0) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	5 months ago
傅剑寒	121d7ad629	[Inference] Delete duplicated copy_vector (#5716 )	7 months ago
Steve Luo	7806842f2d	add paged-attetionv2: support seq length split across thread block (#5707 )	7 months ago
傅剑寒	50104ab340	[Inference/Feat] Add convert_fp8 op for fp8 test in the future (#5706 ) * add convert_fp8 op for fp8 test in the future * rerun ci	7 months ago
傅剑寒	1ace1065e6	[Inference/Feat] Add quant kvcache support for decode_kv_cache_memcpy (#5686 )	7 months ago
Steve Luo	725fbd2ed0	[Inference] Remove unnecessary float4_ and rename float8_ to float8 (#5679 )	7 months ago
傅剑寒	9df016fc45	[Inference] Fix quant bits order (#5681 )	7 months ago
傅剑寒	ef8e4ffe31	[Inference/Feat] Add kvcache quant support for fused_rotary_embedding_cache_copy (#5680 )	7 months ago
Steve Luo	5cd75ce4c7	[Inference/Kernel] refactor kvcache manager and rotary_embedding and kvcache_memcpy oper… (#5663 ) * refactor kvcache manager and rotary_embedding and kvcache_memcpy operator * refactor decode_kv_cache_memcpy * enable alibi in pagedattention	7 months ago
傅剑寒	808ee6e4ad	[Inference/Feat] Feat quant kvcache step2 (#5674 )	7 months ago
傅剑寒	8ccb6714e7	[Inference/Feat] Add kvcache quantization support for FlashDecoding (#5656 )	7 months ago
Steve Luo	a8fd3b0342	[Inference/Kernel] Optimize paged attention: Refactor key cache layout (#5643 ) * optimize flashdecodingattention: refactor code with different key cache layout(from [num_blocks, num_kv_heads, block_size, head_size] to [num_blocks, num_kv_heads, head_size/x, block_size, x]) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	7 months ago
傅剑寒	279300dc5f	[Inference/Refactor] Refactor compilation mechanism and unified multi hw (#5613 ) * refactor compilation mechanism and unified multi hw * fix file path bug * add init.py to make pybind a module to avoid relative path error caused by softlink * delete duplicated micros * fix micros bug in gcc	7 months ago
yuehuayingxueluo	12f10d5b0b	[Fix/Inference]Fix CUDA Rotary Rmbedding GQA (#5623 ) * fix rotary embedding GQA * change test_rotary_embdding_unpad.py KH	7 months ago
Steve Luo	ccf72797e3	feat baichuan2 rmsnorm whose hidden size equals to 5120 (#5611 )	7 months ago
Steve Luo	be396ad6cc	[Inference/Kernel] Add Paged Decoding kernel, sequence split within the same thread block (#5531 ) * feat flash decoding for paged attention * refactor flashdecodingattention * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	7 months ago
傅剑寒	d4cb023b62	[Inference/Refactor] Delete Duplicated code and refactor vec_copy utils and reduce utils (#5593 ) * delete duplicated code and refactor vec_copy utils and reduce utils * delete unused header file	7 months ago
傅剑寒	a21912339a	refactor csrc (#5582 )	8 months ago
pre-commit-ci[bot]	d78817539e	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	8 months ago
傅剑寒	7ebdf48ac5	add cast and op_functor for cuda build-in types (#5546 )	8 months ago
傅剑寒	a2878e39f4	[Inference] Add Reduce Utils (#5537 ) * add reduce utils * add using to delele namespace prefix	8 months ago
yuehuayingxueluo	04aca9e55b	[Inference/Kernel]Add get_cos_and_sin Kernel (#5528 ) * Add get_cos_and_sin kernel * fix code comments * fix code typos * merge common codes of get_cos_and_sin kernel. * Fixed a typo * Changed 'asset allclose' to 'assert equal'.	8 months ago
yuehuayingxueluo	934e31afb2	The writing style of tail processing and the logic related to macro definitions have been optimized. (#5519 )	8 months ago
yuehuayingxueluo	87079cffe8	[Inference]Support FP16/BF16 Flash Attention 2 And Add high_precision Flag To Rotary Embedding (#5461 ) * Support FP16/BF16 Flash Attention 2 * fix bugs in test_kv_cache_memcpy.py * add context_kv_cache_memcpy_kernel.cu * rm typename MT * add tail process * add high_precision * add high_precision to config.py * rm unused code * change the comment for the high_precision parameter * update test_rotary_embdding_unpad.py * fix vector_copy_utils.h * add comment for self.high_precision when using float32	8 months ago
傅剑寒	7ff42cc06d	add vec_type_trait implementation (#5473 )	8 months ago
xs_courtesy	48c4f29b27	refactor vector utils	8 months ago
xs_courtesy	5724b9e31e	add some comments	9 months ago
xs_courtesy	388e043930	add implementatino for GetGPULaunchConfig1D	9 months ago
yuehuayingxueluo	f366a5ea1f	[Inference/kernel]Add Fused Rotary Embedding and KVCache Memcopy CUDA Kernel (#5418 ) * add rotary embedding kernel * add rotary_embedding_kernel * add fused rotary_emb and kvcache memcopy * add fused_rotary_emb_and_cache_kernel.cu * add fused_rotary_emb_and_memcopy * fix bugs in fused_rotary_emb_and_cache_kernel.cu * fix ci bugs * use vec memcopy and opt the gloabl memory access * fix code style * fix test_rotary_embdding_unpad.py * codes revised based on the review comments * fix bugs about include path * rm inline	9 months ago
Steve Luo	ed431de4e4	fix rmsnorm template function invocation problem(template function partial specialization is not allowed in Cpp) and luckily pass e2e precision test (#5454 )	9 months ago
xs_courtesy	c1c45e9d8e	fix include path	9 months ago
Steve Luo	b699f54007	optimize rmsnorm: add vectorized elementwise op, feat loop unrolling (#5441 )	9 months ago
xs_courtesy	095c070a6e	refactor code	9 months ago
傅剑寒	21e1e3645c	Merge pull request #5435 from Courtesy-Xs/add_gpu_launch_config Add query and other components	9 months ago
Steve Luo	f7aecc0c6b	feat rmsnorm cuda kernel and add unittest, benchmark script (#5417 )	9 months ago
xs_courtesy	5eb5ff1464	refactor code	9 months ago
xs_courtesy	01d289d8e5	Merge branch 'feature/colossal-infer' of https://github.com/hpcaitech/ColossalAI into add_gpu_launch_config	9 months ago
xs_courtesy	a46598ac59	add reusable utils for cuda	9 months ago
xs_courtesy	95c21498d4	add silu_and_mul for infer	9 months ago
yuehuayingxueluo	600881a8ea	[Inference]Add CUDA KVCache Kernel (#5406 ) * add cuda KVCache kernel * annotation benchmark_kvcache_copy * add use cuda * fix import path * move benchmark scripts to example/ * rm benchmark codes in test_kv_cache_memcpy.py * rm redundancy codes * rm redundancy codes * pr was modified according to the review	9 months ago
Frank Lee	7cfed5f076	[feat] refactored extension module (#5298 ) * [feat] refactored extension module * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish	10 months ago

41 Commits (cf519dac6a5799b8f314aac6f510e2a98d3af9c6)