ColossalAI

Commit Graph

Author	SHA1	Message	Date
yuehuayingxueluo	04aca9e55b	[Inference/Kernel]Add get_cos_and_sin Kernel (#5528 ) * Add get_cos_and_sin kernel * fix code comments * fix code typos * merge common codes of get_cos_and_sin kernel. * Fixed a typo * Changed 'asset allclose' to 'assert equal'.	2024-04-01 13:47:14 +08:00
Runyu Lu	68e9396bc0	[fix] merge conflicts	2024-03-25 14:48:28 +08:00
yuehuayingxueluo	87079cffe8	[Inference]Support FP16/BF16 Flash Attention 2 And Add high_precision Flag To Rotary Embedding (#5461 ) * Support FP16/BF16 Flash Attention 2 * fix bugs in test_kv_cache_memcpy.py * add context_kv_cache_memcpy_kernel.cu * rm typename MT * add tail process * add high_precision * add high_precision to config.py * rm unused code * change the comment for the high_precision parameter * update test_rotary_embdding_unpad.py * fix vector_copy_utils.h * add comment for self.high_precision when using float32	2024-03-25 13:40:34 +08:00
Runyu Lu	9fe61b4475	[fix]	2024-03-25 11:37:58 +08:00
Runyu Lu	aabc9fb6aa	[feat] add use_cuda_kernel option	2024-03-19 13:24:25 +08:00
Runyu Lu	d02e257abd	Merge branch 'feature/colossal-infer' into colossal-infer-cuda-graph	2024-03-14 10:37:05 +08:00
Runyu Lu	ae24b4f025	diverse tests	2024-03-14 10:35:08 +08:00
Runyu Lu	1821a6dab0	[fix] pytest and fix dyn grid bug	2024-03-13 17:28:32 +08:00
yuehuayingxueluo	f366a5ea1f	[Inference/kernel]Add Fused Rotary Embedding and KVCache Memcopy CUDA Kernel (#5418 ) * add rotary embedding kernel * add rotary_embedding_kernel * add fused rotary_emb and kvcache memcopy * add fused_rotary_emb_and_cache_kernel.cu * add fused_rotary_emb_and_memcopy * fix bugs in fused_rotary_emb_and_cache_kernel.cu * fix ci bugs * use vec memcopy and opt the gloabl memory access * fix code style * fix test_rotary_embdding_unpad.py * codes revised based on the review comments * fix bugs about include path * rm inline	2024-03-13 17:20:03 +08:00
Steve Luo	ed431de4e4	fix rmsnorm template function invocation problem(template function partial specialization is not allowed in Cpp) and luckily pass e2e precision test (#5454 )	2024-03-13 16:00:55 +08:00
Steve Luo	f7aecc0c6b	feat rmsnorm cuda kernel and add unittest, benchmark script (#5417 )	2024-03-08 16:21:12 +08:00
xs_courtesy	95c21498d4	add silu_and_mul for infer	2024-03-07 16:57:49 +08:00
yuehuayingxueluo	0aa27f1961	[Inference]Move benchmark-related code to the example directory. (#5408 ) * move benchmark-related code to the example directory. * fix bugs in test_fused_rotary_embedding.py	2024-02-28 16:46:03 +08:00
yuehuayingxueluo	600881a8ea	[Inference]Add CUDA KVCache Kernel (#5406 ) * add cuda KVCache kernel * annotation benchmark_kvcache_copy * add use cuda * fix import path * move benchmark scripts to example/ * rm benchmark codes in test_kv_cache_memcpy.py * rm redundancy codes * rm redundancy codes * pr was modified according to the review	2024-02-28 14:36:50 +08:00
Yuanheng Zhao	19061188c3	[Infer/Fix] Fix Dependency in test - RMSNorm kernel (#5399 ) fix dependency in pytest	2024-02-26 16:17:47 +08:00
yuehuayingxueluo	bc1da87366	[Fix/Inference] Fix format of input prompts and input model in inference engine (#5395 ) * Fix bugs in inference_engine * fix bugs in engine.py * rm CUDA_VISIBLE_DEVICES * add request_ids in generate * fix bug in engine.py * add logger.debug for BatchBucket	2024-02-23 10:51:35 +08:00
yuehuayingxueluo	2a718c8be8	Optimized the execution interval time between cuda kernels caused by view and memcopy (#5390 ) * opt_view_and_memcopy * fix bugs in ci * fix ci bugs * update benchmark scripts * fix ci bugs	2024-02-21 13:23:57 +08:00
Jianghai	730103819d	[Inference]Fused kv copy into rotary calculation (#5383 ) * revise rotary embedding * remove useless print * adapt * fix * add * fix * modeling * fix * fix * fix * fused kv copy * fused copy * colossalai/kernel/triton/no_pad_rotary_embedding.py * del padding llama * del	2024-02-21 11:31:48 +08:00
Yuanheng Zhao	b21aac5bae	[Inference] Optimize and Refactor Inference Batching/Scheduling (#5367 ) * add kvcache manager funcs for batching * add batch bucket for batching * revise RunningList struct in handler * add kvcache/batch funcs for compatibility * use new batching methods * fix indexing bugs * revise abort logic * use cpu seq lengths/block tables * rm unused attr in Sequence * fix type conversion/default arg * add and revise pytests * revise pytests, rm unused tests * rm unused statements * fix pop finished indexing issue * fix: use index in batch when retrieving inputs/update seqs * use dict instead of odict in batch struct * arg type hinting * fix make compress * refine comments * fix: pop_n_seqs to pop the first n seqs * add check in request handler * remove redundant conversion * fix test for request handler * fix pop method in batch bucket * fix prefill adding	2024-02-19 17:18:20 +08:00
Jianghai	1f8c7e7046	[Inference] User Experience: update the logic of default tokenizer and generation config. (#5337 ) * add * fix * fix * pause * fix * fix pytest * align * fix * license * fix * fix * fix readme * fix some bugs * remove tokenizer config	2024-02-07 17:55:48 +08:00
yuehuayingxueluo	6fb4bcbb24	[Inference/opt] Fused KVCahce Memcopy (#5374 ) * fused kv memcopy * add TODO in test_kvcache_copy.py	2024-02-07 17:15:42 +08:00
Frank Lee	58740b5f68	[inference] added inference template (#5375 )	2024-02-07 17:11:43 +08:00
Frank Lee	8106ede07f	Revert "[Inference] Adapt to Fused rotary (#5348 )" (#5373 ) This reverts commit `9f4ab2eb92`.	2024-02-07 14:27:04 +08:00
Jianghai	9f4ab2eb92	[Inference] Adapt to Fused rotary (#5348 ) * revise rotary embedding * remove useless print * adapt * fix * add * fix * modeling * fix * fix * fix	2024-02-07 11:36:04 +08:00
yuehuayingxueluo	631862f339	[Inference]Optimize generation process of inference engine (#5356 ) * opt inference engine * fix run_benchmark.sh * fix generate in engine.py * rollback tesh_inference_engine.py	2024-02-02 15:38:21 +08:00
Frank Lee	e76acbb076	[inference] moved ops tests to test_infer (#5354 )	2024-02-02 13:51:22 +08:00
Frank Lee	db1a763307	[inference] removed redundancy init_batch (#5353 )	2024-02-02 11:44:15 +08:00
Frank Lee	f8e456d202	[inference] simplified config verification (#5346 ) * [inference] simplified config verification * polish * polish	2024-02-01 15:31:01 +08:00
Yuanheng Zhao	5f98a9d68a	[Infer] Optimize Blocked KVCache And Kernels Using It (#5325 ) * revise shape of kvcache (context attn kernel) * revise shape of kvcache (flash decoding kernel) * revise shape of kvcache (kvcache copy) and attn func * init of kvcache in kvcache manager * revise llama modeling * revise block size retrieval * use torch for rms_norm benchmarking * revise block size retrieval	2024-01-30 16:06:09 +08:00
yuehuayingxueluo	4f28cb43c0	[inference]Optimize the usage of the mid tensors space in flash attn (#5304 ) * opt flash attn * opt tmp tensor * fix benchmark_llama * fix code style * fix None logic for output tensor * fix adapted to get_xine_cache * add comment * fix ci bugs * fix some codes * rm duplicated codes * rm duplicated codes * fix code style * add _get_dtype in config.py	2024-01-26 14:00:10 +08:00
Jianghai	9e2342bde2	[Hotfix] Fix bugs in testing continuous batching (#5270 ) * fix bug * fix bugs * fix bugs * fix bugs and add padding * add funcs and fix bugs * fix typos * fix bugs * add func	2024-01-18 16:31:14 +08:00
FrankLeeeee	1ded7e81ef	[git] fixed rebased files	2024-01-11 13:50:45 +00:00
yuehuayingxueluo	fab294c7f4	fix CI bugs	2024-01-11 13:46:14 +00:00
Jianghai	e545a871b8	[Hotfix] Fix accuracy and align attention method api with Triton kernel (#5229 ) * fix accuracy * alignment in attention * fix attention * fix * fix bugs * fix bugs * fix bugs	2024-01-11 13:46:14 +00:00
yuehuayingxueluo	fa4fbdbffb	adapted to pad_context_forward	2024-01-11 13:44:06 +00:00
yuehuayingxueluo	47e53eaa1c	fix bugs in attention.py and request_handler.py	2024-01-11 13:44:06 +00:00
Jianghai	bfd9b1b494	[Inference] Pytorch Attention func, pad&nopad input support (#5219 ) * add attn * add attention test * fix attn forward * fix decoding	2024-01-11 13:44:06 +00:00
yuehuayingxueluo	bbfebfb9fc	fix bugs in sampler	2024-01-11 13:39:56 +00:00
yuehuayingxueluo	02c1bf8b2a	add context_attention_unpadded	2024-01-11 13:39:56 +00:00
yuehuayingxueluo	4df8876fca	Fixed a writing error	2024-01-11 13:39:56 +00:00
yuehuayingxueluo	9489dc64d8	precision alignment	2024-01-11 13:39:56 +00:00
yuehuayingxueluo	62968588d1	fix bugs in request_handler	2024-01-11 13:39:56 +00:00
yuehuayingxueluo	62fd08ee44	Fixed a bug in the inference frame	2024-01-11 13:39:56 +00:00
Jianghai	0e616462a7	[Inference] add logit processor and request handler (#5166 ) * add logit processor and request handler * add * add * add * fix * add search tokens and update func * finish request handler * add running list test * fix test * fix some bug * add * add * fix bugs * fix some bugs * fix bug * fix * fix * add copy fun * del useless attn * fix request status --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com>	2024-01-11 13:39:56 +00:00
yuehuayingxueluo	8daee26989	[Inference] Add the logic of the inference engine (#5173 ) * add infer_struct and infer_config * update codes * change InferConfig * Add hf_model_config to the engine * rm _get_hf_model_config * update codes * made adjustments according to the feedback from the reviewer. * update codes * add ci test for config and struct * Add the logic of the inference engine * update engine and test * Recover cache_manager.py * add logger * fix conflict * update codes * update codes * update model and tokenizer * fix add the logic about shardformer * change kvcache_manager docstring * add policy * fix ci bug in test_kvcache_manager.py * remove codes related o tokenizer and move model_policy * fix code style * add ordered_set to requirements-infer.txt * Delete extra empty lines * add ordered_set to requirements-test.txt	2024-01-11 13:39:56 +00:00
Jianghai	93aeacca34	[Inference]Update inference config and fix test (#5178 ) * unify the config setting * fix test * fix import * fix test * fix * fix * add logger * revise log info --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com>	2024-01-11 13:39:29 +00:00
Yuanheng Zhao	3de2e62299	[Inference] Add CacheBlock and KV-Cache Manager (#5156 ) * [Inference] Add KVCache Manager * function refactored * add test for KVCache Manager * add attr beam width * Revise alloc func in CacheManager * Fix docs and pytests * add tp slicing for head number * optimize shapes of tensors used as physical cache * Apply using InferenceConfig on KVCacheManager * rm duplicate config file * Optimize cache allocation: use contiguous cache * Fix config in pytest (and config)	2024-01-11 13:39:29 +00:00
yuehuayingxueluo	fab9b931d9	[Inference]Add BatchInferState, Sequence and InferConfig (#5149 ) * add infer_struct and infer_config * update codes * change InferConfig * Add hf_model_config to the engine * rm _get_hf_model_config * update codes * made adjustments according to the feedback from the reviewer. * update codes * add ci test for config and struct	2024-01-11 13:39:29 +00:00
Yuanheng Zhao	2bb92243d4	[Inference/NFC] Clean outdated inference tests and deprecated kernels (#5159 ) * [inference/nfc] remove outdated inference tests * remove outdated kernel tests * remove deprecated triton kernels * remove imports from deprecated kernels	2024-01-11 13:39:29 +00:00
Zhongkai Zhao	75af66cd81	[Hotfix] Fix model policy matching strategy in ShardFormer (#5064 ) * hotfix/Fix get model policy strategy in ShardFormer * fix bug in auto policy	2023-11-22 11:19:39 +08:00

1 2

70 Commits (ce9401ad52b870012846abcde120f1e87d5da7fe)