ColossalAI

Commit Graph

Author	SHA1	Message	Date
yuehuayingxueluo	4f28cb43c0	[inference]Optimize the usage of the mid tensors space in flash attn (#5304 ) * opt flash attn * opt tmp tensor * fix benchmark_llama * fix code style * fix None logic for output tensor * fix adapted to get_xine_cache * add comment * fix ci bugs * fix some codes * rm duplicated codes * rm duplicated codes * fix code style * add _get_dtype in config.py	10 months ago
Jianghai	9e2342bde2	[Hotfix] Fix bugs in testing continuous batching (#5270 ) * fix bug * fix bugs * fix bugs * fix bugs and add padding * add funcs and fix bugs * fix typos * fix bugs * add func	10 months ago
yuehuayingxueluo	86b63f720c	[Inference]Adapted to the triton attn kernels (#5264 ) * adapted to the triton attn kernels * fix pad input * adapted to copy_kv_to_blocked_cache * fix ci test * update kv memcpy * remove print	11 months ago
Jianghai	d8db500efc	[Inference] Fix request handler and add recycle logic (#5260 ) * fix request handler * fix comment	11 months ago
FrankLeeeee	1ded7e81ef	[git] fixed rebased files	11 months ago
yuehuayingxueluo	d40eb26029	fix bugs in request_handler.py and engine.py	11 months ago
yuehuayingxueluo	10e3c9f923	rm torch.cuda.synchronize	11 months ago
yuehuayingxueluo	fab294c7f4	fix CI bugs	11 months ago
yuehuayingxueluo	fa4fbdbffb	adapted to pad_context_forward	11 months ago
yuehuayingxueluo	47e53eaa1c	fix bugs in attention.py and request_handler.py	11 months ago
yuehuayingxueluo	bbfebfb9fc	fix bugs in sampler	11 months ago
yuehuayingxueluo	02c1bf8b2a	add context_attention_unpadded	11 months ago
yuehuayingxueluo	62968588d1	fix bugs in request_handler	11 months ago
yuehuayingxueluo	62fd08ee44	Fixed a bug in the inference frame	11 months ago
Jianghai	0e616462a7	[Inference] add logit processor and request handler (#5166 ) * add logit processor and request handler * add * add * add * fix * add search tokens and update func * finish request handler * add running list test * fix test * fix some bug * add * add * fix bugs * fix some bugs * fix bug * fix * fix * add copy fun * del useless attn * fix request status --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com>	11 months ago
yuehuayingxueluo	8daee26989	[Inference] Add the logic of the inference engine (#5173 ) * add infer_struct and infer_config * update codes * change InferConfig * Add hf_model_config to the engine * rm _get_hf_model_config * update codes * made adjustments according to the feedback from the reviewer. * update codes * add ci test for config and struct * Add the logic of the inference engine * update engine and test * Recover cache_manager.py * add logger * fix conflict * update codes * update codes * update model and tokenizer * fix add the logic about shardformer * change kvcache_manager docstring * add policy * fix ci bug in test_kvcache_manager.py * remove codes related o tokenizer and move model_policy * fix code style * add ordered_set to requirements-infer.txt * Delete extra empty lines * add ordered_set to requirements-test.txt	11 months ago
Jianghai	56e75eeb06	[Inference] Add readme (roadmap) and fulfill request handler (#5147 ) * request handler * add readme --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com>	11 months ago
Jianghai	4cf4682e70	[Inference] First PR for rebuild colossal-infer (#5143 ) * add engine and scheduler * add dirs --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com>	11 months ago

18 Commits (249644c23b0402ccf9d0908f13ed15b41b95145f)