ColossalAI

Commit Graph

Author	SHA1	Message	Date
Hongxin Liu	80a8ca916a	[extension] hotfix compile check (#6099 )	2024-10-24 11:11:44 +08:00
Edenzzzz	f5c84af0b0	[Feature] Zigzag Ring attention (#5905 ) * halfway * fix cross-PP-stage position id length diff bug * fix typo * fix typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * unified cross entropy func for all shardformer models * remove redundant lines * add basic ring attn; debug cross entropy * fwd bwd logic complete * fwd bwd logic complete; add experimental triton rescale * precision tests passed * precision tests passed * fix typos and remove misc files * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add sp_mode to benchmark; fix varlen interface * update softmax_lse shape by new interface * change tester name * remove buffer clone; support packed seq layout * add varlen tests * fix typo * all tests passed * add dkv_group; fix mask * remove debug statements --------- Co-authored-by: Edenzzzz <wtan45@wisc.edu> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2024-08-16 13:56:38 +08:00
pre-commit-ci[bot]	7c2f79fa98	[pre-commit.ci] pre-commit autoupdate (#5572 ) * [pre-commit.ci] pre-commit autoupdate updates: - [github.com/PyCQA/autoflake: v2.2.1 → v2.3.1](https://github.com/PyCQA/autoflake/compare/v2.2.1...v2.3.1) - [github.com/pycqa/isort: 5.12.0 → 5.13.2](https://github.com/pycqa/isort/compare/5.12.0...5.13.2) - [github.com/psf/black-pre-commit-mirror: 23.9.1 → 24.4.2](https://github.com/psf/black-pre-commit-mirror/compare/23.9.1...24.4.2) - [github.com/pre-commit/mirrors-clang-format: v13.0.1 → v18.1.7](https://github.com/pre-commit/mirrors-clang-format/compare/v13.0.1...v18.1.7) - [github.com/pre-commit/pre-commit-hooks: v4.3.0 → v4.6.0](https://github.com/pre-commit/pre-commit-hooks/compare/v4.3.0...v4.6.0) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2024-07-01 17:16:41 +08:00
flybird11111	a1e39f4c0d	[install]fix setup (#5786 ) * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2024-06-06 11:47:48 +08:00
Charles Coulombe	c46e09715c	Allow building cuda extension without a device. (#5535 ) Added FORCE_CUDA environment variable support, to enable building extensions where a GPU device is not present but cuda libraries are.	2024-06-05 14:26:30 +08:00
傅剑寒	121d7ad629	[Inference] Delete duplicated copy_vector (#5716 )	2024-05-14 14:35:33 +08:00
Steve Luo	7806842f2d	add paged-attetionv2: support seq length split across thread block (#5707 )	2024-05-14 12:46:54 +08:00
傅剑寒	50104ab340	[Inference/Feat] Add convert_fp8 op for fp8 test in the future (#5706 ) * add convert_fp8 op for fp8 test in the future * rerun ci	2024-05-10 18:39:54 +08:00
傅剑寒	1ace1065e6	[Inference/Feat] Add quant kvcache support for decode_kv_cache_memcpy (#5686 )	2024-05-06 15:35:13 +08:00
Steve Luo	725fbd2ed0	[Inference] Remove unnecessary float4_ and rename float8_ to float8 (#5679 )	2024-05-06 10:55:34 +08:00
傅剑寒	9df016fc45	[Inference] Fix quant bits order (#5681 )	2024-04-30 19:38:00 +08:00
傅剑寒	ef8e4ffe31	[Inference/Feat] Add kvcache quant support for fused_rotary_embedding_cache_copy (#5680 )	2024-04-30 18:33:53 +08:00
Steve Luo	5cd75ce4c7	[Inference/Kernel] refactor kvcache manager and rotary_embedding and kvcache_memcpy oper… (#5663 ) * refactor kvcache manager and rotary_embedding and kvcache_memcpy operator * refactor decode_kv_cache_memcpy * enable alibi in pagedattention	2024-04-30 15:52:23 +08:00
傅剑寒	808ee6e4ad	[Inference/Feat] Feat quant kvcache step2 (#5674 )	2024-04-30 11:26:36 +08:00
傅剑寒	8ccb6714e7	[Inference/Feat] Add kvcache quantization support for FlashDecoding (#5656 )	2024-04-26 19:40:37 +08:00
Steve Luo	a8fd3b0342	[Inference/Kernel] Optimize paged attention: Refactor key cache layout (#5643 ) * optimize flashdecodingattention: refactor code with different key cache layout(from [num_blocks, num_kv_heads, block_size, head_size] to [num_blocks, num_kv_heads, head_size/x, block_size, x]) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2024-04-25 14:24:02 +08:00
傅剑寒	279300dc5f	[Inference/Refactor] Refactor compilation mechanism and unified multi hw (#5613 ) * refactor compilation mechanism and unified multi hw * fix file path bug * add init.py to make pybind a module to avoid relative path error caused by softlink * delete duplicated micros * fix micros bug in gcc	2024-04-24 14:17:54 +08:00
yuehuayingxueluo	12f10d5b0b	[Fix/Inference]Fix CUDA Rotary Rmbedding GQA (#5623 ) * fix rotary embedding GQA * change test_rotary_embdding_unpad.py KH	2024-04-23 13:44:49 +08:00
Steve Luo	ccf72797e3	feat baichuan2 rmsnorm whose hidden size equals to 5120 (#5611 )	2024-04-19 15:34:53 +08:00
Steve Luo	be396ad6cc	[Inference/Kernel] Add Paged Decoding kernel, sequence split within the same thread block (#5531 ) * feat flash decoding for paged attention * refactor flashdecodingattention * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2024-04-18 16:45:07 +08:00
傅剑寒	d4cb023b62	[Inference/Refactor] Delete Duplicated code and refactor vec_copy utils and reduce utils (#5593 ) * delete duplicated code and refactor vec_copy utils and reduce utils * delete unused header file	2024-04-15 10:57:51 +08:00
傅剑寒	a21912339a	refactor csrc (#5582 )	2024-04-11 15:41:36 +08:00
pre-commit-ci[bot]	d78817539e	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2024-04-08 08:41:09 +00:00
Yuanheng	ed5ebd1735	[Fix] resolve conflicts of merging main	2024-04-08 16:21:47 +08:00
Hongxin Liu	641b1ee71a	[devops] remove post commit ci (#5566 ) * [devops] remove post commit ci * [misc] run pre-commit on all files * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2024-04-08 15:09:40 +08:00
傅剑寒	7ebdf48ac5	add cast and op_functor for cuda build-in types (#5546 )	2024-04-08 11:38:05 +08:00
傅剑寒	a2878e39f4	[Inference] Add Reduce Utils (#5537 ) * add reduce utils * add using to delele namespace prefix	2024-04-01 15:34:25 +08:00
yuehuayingxueluo	04aca9e55b	[Inference/Kernel]Add get_cos_and_sin Kernel (#5528 ) * Add get_cos_and_sin kernel * fix code comments * fix code typos * merge common codes of get_cos_and_sin kernel. * Fixed a typo * Changed 'asset allclose' to 'assert equal'.	2024-04-01 13:47:14 +08:00
yuehuayingxueluo	934e31afb2	The writing style of tail processing and the logic related to macro definitions have been optimized. (#5519 )	2024-03-28 10:42:51 +08:00
Hongxin Liu	19e1a5cf16	[shardformer] update colo attention to support custom mask (#5510 ) * [feature] refactor colo attention (#5462) * [extension] update api * [feature] add colo attention * [feature] update sdpa * [feature] update npu attention * [feature] update flash-attn * [test] add flash attn test * [test] update flash attn test * [shardformer] update modeling to fit colo attention (#5465) * [misc] refactor folder structure * [shardformer] update llama flash-attn * [shardformer] fix llama policy * [devops] update tensornvme install * [test] update llama test * [shardformer] update colo attn kernel dispatch * [shardformer] update blip2 * [shardformer] update chatglm * [shardformer] update gpt2 * [shardformer] update gptj * [shardformer] update opt * [shardformer] update vit * [shardformer] update colo attention mask prep * [shardformer] update whisper * [test] fix shardformer tests (#5514) * [test] fix shardformer tests * [test] fix shardformer tests	2024-03-27 11:19:32 +08:00
yuehuayingxueluo	87079cffe8	[Inference]Support FP16/BF16 Flash Attention 2 And Add high_precision Flag To Rotary Embedding (#5461 ) * Support FP16/BF16 Flash Attention 2 * fix bugs in test_kv_cache_memcpy.py * add context_kv_cache_memcpy_kernel.cu * rm typename MT * add tail process * add high_precision * add high_precision to config.py * rm unused code * change the comment for the high_precision parameter * update test_rotary_embdding_unpad.py * fix vector_copy_utils.h * add comment for self.high_precision when using float32	2024-03-25 13:40:34 +08:00
傅剑寒	7ff42cc06d	add vec_type_trait implementation (#5473 )	2024-03-19 18:36:40 +08:00
xs_courtesy	48c4f29b27	refactor vector utils	2024-03-19 11:32:01 +08:00
xs_courtesy	5724b9e31e	add some comments	2024-03-15 11:18:57 +08:00
xs_courtesy	388e043930	add implementatino for GetGPULaunchConfig1D	2024-03-14 11:13:40 +08:00
yuehuayingxueluo	f366a5ea1f	[Inference/kernel]Add Fused Rotary Embedding and KVCache Memcopy CUDA Kernel (#5418 ) * add rotary embedding kernel * add rotary_embedding_kernel * add fused rotary_emb and kvcache memcopy * add fused_rotary_emb_and_cache_kernel.cu * add fused_rotary_emb_and_memcopy * fix bugs in fused_rotary_emb_and_cache_kernel.cu * fix ci bugs * use vec memcopy and opt the gloabl memory access * fix code style * fix test_rotary_embdding_unpad.py * codes revised based on the review comments * fix bugs about include path * rm inline	2024-03-13 17:20:03 +08:00
Steve Luo	ed431de4e4	fix rmsnorm template function invocation problem(template function partial specialization is not allowed in Cpp) and luckily pass e2e precision test (#5454 )	2024-03-13 16:00:55 +08:00
xs_courtesy	c1c45e9d8e	fix include path	2024-03-13 11:21:06 +08:00
Steve Luo	b699f54007	optimize rmsnorm: add vectorized elementwise op, feat loop unrolling (#5441 )	2024-03-12 17:48:02 +08:00
xs_courtesy	095c070a6e	refactor code	2024-03-11 17:06:57 +08:00
傅剑寒	21e1e3645c	Merge pull request #5435 from Courtesy-Xs/add_gpu_launch_config Add query and other components	2024-03-11 11:15:29 +08:00
Steve Luo	f7aecc0c6b	feat rmsnorm cuda kernel and add unittest, benchmark script (#5417 )	2024-03-08 16:21:12 +08:00
xs_courtesy	5eb5ff1464	refactor code	2024-03-08 15:41:14 +08:00
xs_courtesy	01d289d8e5	Merge branch 'feature/colossal-infer' of https://github.com/hpcaitech/ColossalAI into add_gpu_launch_config	2024-03-08 15:04:55 +08:00
xs_courtesy	a46598ac59	add reusable utils for cuda	2024-03-08 14:53:29 +08:00
xs_courtesy	95c21498d4	add silu_and_mul for infer	2024-03-07 16:57:49 +08:00
Hongxin Liu	070df689e6	[devops] fix extention building (#5427 )	2024-03-05 15:35:54 +08:00
FrankLeeeee	0310b76e9d	Merge branch 'main' into sync/main	2024-03-04 10:09:36 +08:00
yuehuayingxueluo	600881a8ea	[Inference]Add CUDA KVCache Kernel (#5406 ) * add cuda KVCache kernel * annotation benchmark_kvcache_copy * add use cuda * fix import path * move benchmark scripts to example/ * rm benchmark codes in test_kv_cache_memcpy.py * rm redundancy codes * rm redundancy codes * pr was modified according to the review	2024-02-28 14:36:50 +08:00
Hongxin Liu	ffffc32dc7	[checkpointio] fix gemini and hybrid parallel optim checkpoint (#5347 ) * [checkpointio] fix hybrid parallel optim checkpoint * [extension] fix cuda extension * [checkpointio] fix gemini optimizer checkpoint * polish code	2024-02-01 16:13:06 +08:00

1 2

54 Commits (30a94431323d71c5ef06bd4b7f047aced3312fdf)