ColossalAI

Commit Graph

Author	SHA1	Message	Date
Yuanheng Zhao	1dedb57747	[Fix/Infer] Remove unused deps and revise requirements (#5341 ) * remove flash-attn dep * rm padding llama * revise infer requirements * move requirements out of module	2024-02-06 17:27:45 +08:00
yuehuayingxueluo	8daee26989	[Inference] Add the logic of the inference engine (#5173 ) * add infer_struct and infer_config * update codes * change InferConfig * Add hf_model_config to the engine * rm _get_hf_model_config * update codes * made adjustments according to the feedback from the reviewer. * update codes * add ci test for config and struct * Add the logic of the inference engine * update engine and test * Recover cache_manager.py * add logger * fix conflict * update codes * update codes * update model and tokenizer * fix add the logic about shardformer * change kvcache_manager docstring * add policy * fix ci bug in test_kvcache_manager.py * remove codes related o tokenizer and move model_policy * fix code style * add ordered_set to requirements-infer.txt * Delete extra empty lines * add ordered_set to requirements-test.txt	2024-01-11 13:39:56 +00:00
Hongxin Liu	1cd7efc520	[inference] refactor examples and fix schedule (#5077 ) * [setup] refactor infer setup * [hotfix] fix infenrece behavior on 1 1 gpu * [exmaple] refactor inference examples	2023-11-21 10:46:03 +08:00
Xu Kai	fb103cfd6e	[inference] update examples and engine (#5073 ) * update examples and engine * fix choices * update example	2023-11-20 19:44:52 +08:00
Cuiqing Li (李崔卿)	bce919708f	[Kernels]added flash-decoidng of triton (#5063 ) * added flash-decoidng of triton based on lightllm kernel * add req * clean * clean * delete build.sh --------- Co-authored-by: cuiqing.li <lixx336@gmail.com>	2023-11-20 13:58:29 +08:00
Xu Kai	fd6482ad8c	[inference] Refactor inference architecture (#5057 ) * [inference] support only TP (#4998) * support only tp * enable tp * add support for bloom (#5008) * [refactor] refactor gptq and smoothquant llama (#5012) * refactor gptq and smoothquant llama * fix import error * fix linear import torch-int * fix smoothquant llama import error * fix import accelerate error * fix bug * fix import smooth cuda * fix smoothcuda * [Inference Refactor] Merge chatglm2 with pp and tp (#5023) merge chatglm with pp and tp * [Refactor] remove useless inference code (#5022) * remove useless code * fix quant model * fix test import bug * mv original inference legacy * fix chatglm2 * [Refactor] refactor policy search and quant type controlling in inference (#5035) * [Refactor] refactor policy search and quant type controling in inference * [inference] update readme (#5051) * update readme * update readme * fix architecture * fix table * fix table * [inference] udpate example (#5053) * udpate example * fix run.sh * fix rebase bug * fix some errors * update readme * add some features * update interface * update readme * update benchmark * add requirements-infer --------- Co-authored-by: Bin Jia <45593998+FoolPlayer@users.noreply.github.com> Co-authored-by: Zhongkai Zhao <kanezz620@gmail.com>	2023-11-19 21:05:05 +08:00

6 Commits (5be590b99eb6c58c3aa809d453680139fdd2b9f7)