ColossalAI

Commit Graph

Author	SHA1	Message	Date
yuehuayingxueluo	e8f0642f28	[Inference]Add Nopadding Llama Modeling (#5327 ) * add nopadding llama modeling * add nopadding_llama.py * rm unused codes * fix bugs in test_xine_copy.py * fix code style	2024-01-30 10:31:46 +08:00
yuehuayingxueluo	4f28cb43c0	[inference]Optimize the usage of the mid tensors space in flash attn (#5304 ) * opt flash attn * opt tmp tensor * fix benchmark_llama * fix code style * fix None logic for output tensor * fix adapted to get_xine_cache * add comment * fix ci bugs * fix some codes * rm duplicated codes * rm duplicated codes * fix code style * add _get_dtype in config.py	2024-01-26 14:00:10 +08:00
Jianghai	9e2342bde2	[Hotfix] Fix bugs in testing continuous batching (#5270 ) * fix bug * fix bugs * fix bugs * fix bugs and add padding * add funcs and fix bugs * fix typos * fix bugs * add func	2024-01-18 16:31:14 +08:00
yuehuayingxueluo	86b63f720c	[Inference]Adapted to the triton attn kernels (#5264 ) * adapted to the triton attn kernels * fix pad input * adapted to copy_kv_to_blocked_cache * fix ci test * update kv memcpy * remove print	2024-01-17 16:03:10 +08:00
Jianghai	d8db500efc	[Inference] Fix request handler and add recycle logic (#5260 ) * fix request handler * fix comment	2024-01-15 17:50:46 +08:00
yuehuayingxueluo	fa4fbdbffb	adapted to pad_context_forward	2024-01-11 13:44:06 +00:00
yuehuayingxueluo	47e53eaa1c	fix bugs in attention.py and request_handler.py	2024-01-11 13:44:06 +00:00
yuehuayingxueluo	9489dc64d8	precision alignment	2024-01-11 13:39:56 +00:00
yuehuayingxueluo	62968588d1	fix bugs in request_handler	2024-01-11 13:39:56 +00:00
yuehuayingxueluo	62fd08ee44	Fixed a bug in the inference frame	2024-01-11 13:39:56 +00:00
yuehuayingxueluo	86853a37d5	Add padding llama model	2024-01-11 13:39:56 +00:00
Jianghai	0e616462a7	[Inference] add logit processor and request handler (#5166 ) * add logit processor and request handler * add * add * add * fix * add search tokens and update func * finish request handler * add running list test * fix test * fix some bug * add * add * fix bugs * fix some bugs * fix bug * fix * fix * add copy fun * del useless attn * fix request status --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com>	2024-01-11 13:39:56 +00:00
yuehuayingxueluo	8daee26989	[Inference] Add the logic of the inference engine (#5173 ) * add infer_struct and infer_config * update codes * change InferConfig * Add hf_model_config to the engine * rm _get_hf_model_config * update codes * made adjustments according to the feedback from the reviewer. * update codes * add ci test for config and struct * Add the logic of the inference engine * update engine and test * Recover cache_manager.py * add logger * fix conflict * update codes * update codes * update model and tokenizer * fix add the logic about shardformer * change kvcache_manager docstring * add policy * fix ci bug in test_kvcache_manager.py * remove codes related o tokenizer and move model_policy * fix code style * add ordered_set to requirements-infer.txt * Delete extra empty lines * add ordered_set to requirements-test.txt	2024-01-11 13:39:56 +00:00
Jianghai	93aeacca34	[Inference]Update inference config and fix test (#5178 ) * unify the config setting * fix test * fix import * fix test * fix * fix * add logger * revise log info --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com>	2024-01-11 13:39:29 +00:00

14 Commits (e8f0642f2841f6aeb6ed0e6695ff9d9ef14f198b)