ColossalAI

Commit Graph

Author	SHA1	Message	Date
Jianghai	1f8c7e7046	[Inference] User Experience: update the logic of default tokenizer and generation config. (#5337 ) * add * fix * fix * pause * fix * fix pytest * align * fix * license * fix * fix * fix readme * fix some bugs * remove tokenizer config	2024-02-07 17:55:48 +08:00
Frank Lee	58740b5f68	[inference] added inference template (#5375 )	2024-02-07 17:11:43 +08:00
yuehuayingxueluo	631862f339	[Inference]Optimize generation process of inference engine (#5356 ) * opt inference engine * fix run_benchmark.sh * fix generate in engine.py * rollback tesh_inference_engine.py	2024-02-02 15:38:21 +08:00
Frank Lee	f8e456d202	[inference] simplified config verification (#5346 ) * [inference] simplified config verification * polish * polish	2024-02-01 15:31:01 +08:00
yuehuayingxueluo	4f28cb43c0	[inference]Optimize the usage of the mid tensors space in flash attn (#5304 ) * opt flash attn * opt tmp tensor * fix benchmark_llama * fix code style * fix None logic for output tensor * fix adapted to get_xine_cache * add comment * fix ci bugs * fix some codes * rm duplicated codes * rm duplicated codes * fix code style * add _get_dtype in config.py	2024-01-26 14:00:10 +08:00
FrankLeeeee	1ded7e81ef	[git] fixed rebased files	2024-01-11 13:50:45 +00:00
yuehuayingxueluo	fab294c7f4	fix CI bugs	2024-01-11 13:46:14 +00:00
Jianghai	e545a871b8	[Hotfix] Fix accuracy and align attention method api with Triton kernel (#5229 ) * fix accuracy * alignment in attention * fix attention * fix * fix bugs * fix bugs * fix bugs	2024-01-11 13:46:14 +00:00
yuehuayingxueluo	fa4fbdbffb	adapted to pad_context_forward	2024-01-11 13:44:06 +00:00
yuehuayingxueluo	47e53eaa1c	fix bugs in attention.py and request_handler.py	2024-01-11 13:44:06 +00:00
yuehuayingxueluo	bbfebfb9fc	fix bugs in sampler	2024-01-11 13:39:56 +00:00
yuehuayingxueluo	02c1bf8b2a	add context_attention_unpadded	2024-01-11 13:39:56 +00:00
yuehuayingxueluo	4df8876fca	Fixed a writing error	2024-01-11 13:39:56 +00:00
yuehuayingxueluo	9489dc64d8	precision alignment	2024-01-11 13:39:56 +00:00
yuehuayingxueluo	62968588d1	fix bugs in request_handler	2024-01-11 13:39:56 +00:00
yuehuayingxueluo	62fd08ee44	Fixed a bug in the inference frame	2024-01-11 13:39:56 +00:00
Jianghai	0e616462a7	[Inference] add logit processor and request handler (#5166 ) * add logit processor and request handler * add * add * add * fix * add search tokens and update func * finish request handler * add running list test * fix test * fix some bug * add * add * fix bugs * fix some bugs * fix bug * fix * fix * add copy fun * del useless attn * fix request status --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com>	2024-01-11 13:39:56 +00:00
yuehuayingxueluo	8daee26989	[Inference] Add the logic of the inference engine (#5173 ) * add infer_struct and infer_config * update codes * change InferConfig * Add hf_model_config to the engine * rm _get_hf_model_config * update codes * made adjustments according to the feedback from the reviewer. * update codes * add ci test for config and struct * Add the logic of the inference engine * update engine and test * Recover cache_manager.py * add logger * fix conflict * update codes * update codes * update model and tokenizer * fix add the logic about shardformer * change kvcache_manager docstring * add policy * fix ci bug in test_kvcache_manager.py * remove codes related o tokenizer and move model_policy * fix code style * add ordered_set to requirements-infer.txt * Delete extra empty lines * add ordered_set to requirements-test.txt	2024-01-11 13:39:56 +00:00

18 Commits (2a718c8be89918ec70b88f1f059148a7294dbccb)