ColossalAI

Commit Graph

Author	SHA1	Message	Date
Steve Luo	725fbd2ed0	[Inference] Remove unnecessary float4_ and rename float8_ to float8 (#5679 )	7 months ago
傅剑寒	ef8e4ffe31	[Inference/Feat] Add kvcache quant support for fused_rotary_embedding_cache_copy (#5680 )	7 months ago
傅剑寒	8ccb6714e7	[Inference/Feat] Add kvcache quantization support for FlashDecoding (#5656 )	7 months ago
傅剑寒	279300dc5f	[Inference/Refactor] Refactor compilation mechanism and unified multi hw (#5613 ) * refactor compilation mechanism and unified multi hw * fix file path bug * add init.py to make pybind a module to avoid relative path error caused by softlink * delete duplicated micros * fix micros bug in gcc	7 months ago
傅剑寒	a2878e39f4	[Inference] Add Reduce Utils (#5537 ) * add reduce utils * add using to delele namespace prefix	8 months ago
yuehuayingxueluo	934e31afb2	The writing style of tail processing and the logic related to macro definitions have been optimized. (#5519 )	8 months ago
yuehuayingxueluo	87079cffe8	[Inference]Support FP16/BF16 Flash Attention 2 And Add high_precision Flag To Rotary Embedding (#5461 ) * Support FP16/BF16 Flash Attention 2 * fix bugs in test_kv_cache_memcpy.py * add context_kv_cache_memcpy_kernel.cu * rm typename MT * add tail process * add high_precision * add high_precision to config.py * rm unused code * change the comment for the high_precision parameter * update test_rotary_embdding_unpad.py * fix vector_copy_utils.h * add comment for self.high_precision when using float32	8 months ago
傅剑寒	7ff42cc06d	add vec_type_trait implementation (#5473 )	8 months ago
xs_courtesy	48c4f29b27	refactor vector utils	8 months ago
xs_courtesy	388e043930	add implementatino for GetGPULaunchConfig1D	8 months ago
yuehuayingxueluo	f366a5ea1f	[Inference/kernel]Add Fused Rotary Embedding and KVCache Memcopy CUDA Kernel (#5418 ) * add rotary embedding kernel * add rotary_embedding_kernel * add fused rotary_emb and kvcache memcopy * add fused_rotary_emb_and_cache_kernel.cu * add fused_rotary_emb_and_memcopy * fix bugs in fused_rotary_emb_and_cache_kernel.cu * fix ci bugs * use vec memcopy and opt the gloabl memory access * fix code style * fix test_rotary_embdding_unpad.py * codes revised based on the review comments * fix bugs about include path * rm inline	8 months ago
Steve Luo	b699f54007	optimize rmsnorm: add vectorized elementwise op, feat loop unrolling (#5441 )	9 months ago
xs_courtesy	5eb5ff1464	refactor code	9 months ago
xs_courtesy	a46598ac59	add reusable utils for cuda	9 months ago

14 Commits (3568df498ab9ab2241ba2968de614bdc070ccbc9)