ColossalAI

Commit Graph

Author	SHA1	Message	Date
Yuanheng Zhao	a37f82629d	[Inference/SpecDec] Add Speculative Decoding Implementation (#5423 ) * fix flash decoding mask during verification * add spec-dec * add test for spec-dec * revise drafter init * remove drafter sampling * retire past kv in drafter * (trivial) rename attrs * (trivial) rename arg * revise how we enable/disable spec-dec	8 months ago
Yuanheng Zhao	5a9b05f7b2	[Inference/SpecDec] Add Basic Drafter Model Container (#5405 ) * [Infer/Fix] Fix Dependency in test - RMSNorm kernel (#5399) fix dependency in pytest * add drafter model container (basic ver)	8 months ago
Yuanheng Zhao	4bb5d8923a	[Fix/Inference] Remove unused and non-functional functions (#5543 ) * [fix] remove unused func * rm non-functional partial	8 months ago
yuehuayingxueluo	04aca9e55b	[Inference/Kernel]Add get_cos_and_sin Kernel (#5528 ) * Add get_cos_and_sin kernel * fix code comments * fix code typos * merge common codes of get_cos_and_sin kernel. * Fixed a typo * Changed 'asset allclose' to 'assert equal'.	8 months ago
傅剑寒	e6496dd371	[Inference] Optimize request handler of llama (#5512 ) * optimize request_handler * fix ways of writing	8 months ago
Runyu Lu	6251d68dc9	[fix] PR #5354 (#5501 ) * [fix] * [fix] * Update config.py docstring * [fix] docstring align * [fix] docstring align * [fix] docstring align	8 months ago
Runyu Lu	68e9396bc0	[fix] merge conflicts	8 months ago
yuehuayingxueluo	87079cffe8	[Inference]Support FP16/BF16 Flash Attention 2 And Add high_precision Flag To Rotary Embedding (#5461 ) * Support FP16/BF16 Flash Attention 2 * fix bugs in test_kv_cache_memcpy.py * add context_kv_cache_memcpy_kernel.cu * rm typename MT * add tail process * add high_precision * add high_precision to config.py * rm unused code * change the comment for the high_precision parameter * update test_rotary_embdding_unpad.py * fix vector_copy_utils.h * add comment for self.high_precision when using float32	8 months ago
Runyu Lu	ff4998c6f3	[fix] remove unused comment	8 months ago
Runyu Lu	5b017d6324	[fix]	8 months ago
Runyu Lu	4eafe0c814	[fix] unused option	8 months ago
Runyu Lu	aabc9fb6aa	[feat] add use_cuda_kernel option	8 months ago
Runyu Lu	6e30248683	[fix] tmp for test	9 months ago
Runyu Lu	d02e257abd	Merge branch 'feature/colossal-infer' into colossal-infer-cuda-graph	9 months ago
Runyu Lu	ae24b4f025	diverse tests	9 months ago
Runyu Lu	1821a6dab0	[fix] pytest and fix dyn grid bug	9 months ago
yuehuayingxueluo	f366a5ea1f	[Inference/kernel]Add Fused Rotary Embedding and KVCache Memcopy CUDA Kernel (#5418 ) * add rotary embedding kernel * add rotary_embedding_kernel * add fused rotary_emb and kvcache memcopy * add fused_rotary_emb_and_cache_kernel.cu * add fused_rotary_emb_and_memcopy * fix bugs in fused_rotary_emb_and_cache_kernel.cu * fix ci bugs * use vec memcopy and opt the gloabl memory access * fix code style * fix test_rotary_embdding_unpad.py * codes revised based on the review comments * fix bugs about include path * rm inline	9 months ago
Runyu Lu	633e95b301	[doc] add doc	9 months ago
Runyu Lu	9dec66fad6	[fix] multi graphs capture error	9 months ago
Runyu Lu	b2c0d9ff2b	[fix] multi graphs capture error	9 months ago
Steve Luo	f7aecc0c6b	feat rmsnorm cuda kernel and add unittest, benchmark script (#5417 )	9 months ago
Runyu Lu	cefaeb5fdd	[feat] cuda graph support and refactor non-functional api	9 months ago
yuehuayingxueluo	600881a8ea	[Inference]Add CUDA KVCache Kernel (#5406 ) * add cuda KVCache kernel * annotation benchmark_kvcache_copy * add use cuda * fix import path * move benchmark scripts to example/ * rm benchmark codes in test_kv_cache_memcpy.py * rm redundancy codes * rm redundancy codes * pr was modified according to the review	9 months ago
yuehuayingxueluo	bc1da87366	[Fix/Inference] Fix format of input prompts and input model in inference engine (#5395 ) * Fix bugs in inference_engine * fix bugs in engine.py * rm CUDA_VISIBLE_DEVICES * add request_ids in generate * fix bug in engine.py * add logger.debug for BatchBucket	9 months ago
yuehuayingxueluo	2a718c8be8	Optimized the execution interval time between cuda kernels caused by view and memcopy (#5390 ) * opt_view_and_memcopy * fix bugs in ci * fix ci bugs * update benchmark scripts * fix ci bugs	9 months ago
Jianghai	730103819d	[Inference]Fused kv copy into rotary calculation (#5383 ) * revise rotary embedding * remove useless print * adapt * fix * add * fix * modeling * fix * fix * fix * fused kv copy * fused copy * colossalai/kernel/triton/no_pad_rotary_embedding.py * del padding llama * del	9 months ago
Yuanheng Zhao	b21aac5bae	[Inference] Optimize and Refactor Inference Batching/Scheduling (#5367 ) * add kvcache manager funcs for batching * add batch bucket for batching * revise RunningList struct in handler * add kvcache/batch funcs for compatibility * use new batching methods * fix indexing bugs * revise abort logic * use cpu seq lengths/block tables * rm unused attr in Sequence * fix type conversion/default arg * add and revise pytests * revise pytests, rm unused tests * rm unused statements * fix pop finished indexing issue * fix: use index in batch when retrieving inputs/update seqs * use dict instead of odict in batch struct * arg type hinting * fix make compress * refine comments * fix: pop_n_seqs to pop the first n seqs * add check in request handler * remove redundant conversion * fix test for request handler * fix pop method in batch bucket * fix prefill adding	9 months ago
yuehuayingxueluo	8c69debdc7	[Inference]Support vllm testing in benchmark scripts (#5379 ) * add vllm benchmark scripts * fix code style * update run_benchmark.sh * fix code style	10 months ago
Frank Lee	9afa52061f	[inference] refactored config (#5376 )	10 months ago
Jianghai	1f8c7e7046	[Inference] User Experience: update the logic of default tokenizer and generation config. (#5337 ) * add * fix * fix * pause * fix * fix pytest * align * fix * license * fix * fix * fix readme * fix some bugs * remove tokenizer config	10 months ago
yuehuayingxueluo	6fb4bcbb24	[Inference/opt] Fused KVCahce Memcopy (#5374 ) * fused kv memcopy * add TODO in test_kvcache_copy.py	10 months ago
Frank Lee	58740b5f68	[inference] added inference template (#5375 )	10 months ago
Frank Lee	8106ede07f	Revert "[Inference] Adapt to Fused rotary (#5348 )" (#5373 ) This reverts commit `9f4ab2eb92`.	10 months ago
Jianghai	9f4ab2eb92	[Inference] Adapt to Fused rotary (#5348 ) * revise rotary embedding * remove useless print * adapt * fix * add * fix * modeling * fix * fix * fix	10 months ago
yuehuayingxueluo	35382a7fbf	[Inference]Fused the gate and up proj in mlp，and optimized the autograd process. (#5365 ) * fused the gate and up proj in mlp * fix code styles * opt auto_grad * rollback test_inference_engine.py * modifications based on the review feedback. * fix bugs in flash attn * Change reshape to view * fix test_rmsnorm_triton.py	10 months ago
Yuanheng Zhao	1dedb57747	[Fix/Infer] Remove unused deps and revise requirements (#5341 ) * remove flash-attn dep * rm padding llama * revise infer requirements * move requirements out of module	10 months ago
yuehuayingxueluo	631862f339	[Inference]Optimize generation process of inference engine (#5356 ) * opt inference engine * fix run_benchmark.sh * fix generate in engine.py * rollback tesh_inference_engine.py	10 months ago
yuehuayingxueluo	21ad4a27f9	[Inference/opt]Optimize the mid tensor of RMS Norm (#5350 ) * opt rms_norm * fix bugs in rms_layernorm	10 months ago
Frank Lee	027aa1043f	[doc] updated inference readme (#5343 )	10 months ago
Frank Lee	db1a763307	[inference] removed redundancy init_batch (#5353 )	10 months ago
yuehuayingxueluo	249644c23b	[Inference]Repalce Attention layer and MLP layer by shardformer to optimize the weight transpose operation，add fused_qkv and fused linear_add (#5340 ) * add fused qkv * replace attn and mlp by shardformer * fix bugs in mlp * add docstrings * fix test_inference_engine.py * add optimize unbind * add fused_addmm * rm squeeze(1) * refactor codes * fix ci bugs * rename ShardFormerLlamaMLP and ShardFormerLlamaAttention * Removed the dependency on LlamaFlashAttention2 * rollback test_inference_engine.py	10 months ago
Frank Lee	f8e456d202	[inference] simplified config verification (#5346 ) * [inference] simplified config verification * polish * polish	10 months ago
Yuanheng Zhao	5f98a9d68a	[Infer] Optimize Blocked KVCache And Kernels Using It (#5325 ) * revise shape of kvcache (context attn kernel) * revise shape of kvcache (flash decoding kernel) * revise shape of kvcache (kvcache copy) and attn func * init of kvcache in kvcache manager * revise llama modeling * revise block size retrieval * use torch for rms_norm benchmarking * revise block size retrieval	10 months ago
yuehuayingxueluo	e8f0642f28	[Inference]Add Nopadding Llama Modeling (#5327 ) * add nopadding llama modeling * add nopadding_llama.py * rm unused codes * fix bugs in test_xine_copy.py * fix code style	10 months ago
Jianghai	c7c104cb7c	[DOC] Update inference readme (#5280 ) * add readme * add readme * 1 * update engine * finish readme * add readme	10 months ago
yuehuayingxueluo	4f28cb43c0	[inference]Optimize the usage of the mid tensors space in flash attn (#5304 ) * opt flash attn * opt tmp tensor * fix benchmark_llama * fix code style * fix None logic for output tensor * fix adapted to get_xine_cache * add comment * fix ci bugs * fix some codes * rm duplicated codes * rm duplicated codes * fix code style * add _get_dtype in config.py	10 months ago
Yuanheng Zhao	3da9993b0d	[Kernel/Fix] Revise flash attention triton kernel API and add benchmark (#5301 ) * fix decoding kernel pytest * revise and add triton context attn benchmark	10 months ago
yuehuayingxueluo	cea9c86e45	add utils.py	10 months ago
yuehuayingxueluo	bfff9254ac	[inference] Adapted to Rotary Embedding and RMS Norm (#5283 ) * adapted to rotary_embedding * adapted to nopad rms norm * fix bugs in benchmark * fix flash_decoding.py	10 months ago
Yuanheng Zhao	6e487e7d3c	[kernel/fix] Performance Optimization for Decoding Kernel and Benchmarking (#5274 ) * prevent re-creating intermediate tensors * add singleton class holding intermediate values * fix triton kernel api * add benchmark in pytest * fix kernel api and add benchmark * revise flash decoding triton kernel in/out shapes * fix calling of triton kernel in modeling * fix pytest: extract to util functions	10 months ago
Jianghai	9e2342bde2	[Hotfix] Fix bugs in testing continuous batching (#5270 ) * fix bug * fix bugs * fix bugs * fix bugs and add padding * add funcs and fix bugs * fix typos * fix bugs * add func	10 months ago
yuehuayingxueluo	86b63f720c	[Inference]Adapted to the triton attn kernels (#5264 ) * adapted to the triton attn kernels * fix pad input * adapted to copy_kv_to_blocked_cache * fix ci test * update kv memcpy * remove print	10 months ago
Jianghai	d8db500efc	[Inference] Fix request handler and add recycle logic (#5260 ) * fix request handler * fix comment	11 months ago
Frank Lee	c597678da4	[doc] updated inference readme (#5269 )	11 months ago
Yuanheng Zhao	fa85e02b3b	[kernel] Add KV cache copy kernel during decoding (#5261 ) * add kv copy triton kernel during decoding stage * add pytest and fix kernel * fix test utilities * revise kernel config * add benchmark for kvcache copy	11 months ago
FrankLeeeee	1ded7e81ef	[git] fixed rebased files	11 months ago
yuehuayingxueluo	d40eb26029	fix bugs in request_handler.py and engine.py	11 months ago
yuehuayingxueluo	10e3c9f923	rm torch.cuda.synchronize	11 months ago
yuehuayingxueluo	fab294c7f4	fix CI bugs	11 months ago
yuehuayingxueluo	2a73e828eb	fix bugs related to processing padding mask	11 months ago
Jianghai	e545a871b8	[Hotfix] Fix accuracy and align attention method api with Triton kernel (#5229 ) * fix accuracy * alignment in attention * fix attention * fix * fix bugs * fix bugs * fix bugs	11 months ago
yuehuayingxueluo	fa4fbdbffb	adapted to pad_context_forward	11 months ago
yuehuayingxueluo	47e53eaa1c	fix bugs in attention.py and request_handler.py	11 months ago
Jianghai	bfd9b1b494	[Inference] Pytorch Attention func, pad&nopad input support (#5219 ) * add attn * add attention test * fix attn forward * fix decoding	11 months ago
yuehuayingxueluo	3ad1f3b78b	fix beam_width	11 months ago
yuehuayingxueluo	b2eb9cd186	Fixed a typo	11 months ago
yuehuayingxueluo	bbfebfb9fc	fix bugs in sampler	11 months ago
yuehuayingxueluo	02c1bf8b2a	add context_attention_unpadded	11 months ago
yuehuayingxueluo	9489dc64d8	precision alignment	11 months ago
yuehuayingxueluo	62968588d1	fix bugs in request_handler	11 months ago
yuehuayingxueluo	62fd08ee44	Fixed a bug in the inference frame	11 months ago
yuehuayingxueluo	86853a37d5	Add padding llama model	11 months ago
Jianghai	0e616462a7	[Inference] add logit processor and request handler (#5166 ) * add logit processor and request handler * add * add * add * fix * add search tokens and update func * finish request handler * add running list test * fix test * fix some bug * add * add * fix bugs * fix some bugs * fix bug * fix * fix * add copy fun * del useless attn * fix request status --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com>	11 months ago
yuehuayingxueluo	8daee26989	[Inference] Add the logic of the inference engine (#5173 ) * add infer_struct and infer_config * update codes * change InferConfig * Add hf_model_config to the engine * rm _get_hf_model_config * update codes * made adjustments according to the feedback from the reviewer. * update codes * add ci test for config and struct * Add the logic of the inference engine * update engine and test * Recover cache_manager.py * add logger * fix conflict * update codes * update codes * update model and tokenizer * fix add the logic about shardformer * change kvcache_manager docstring * add policy * fix ci bug in test_kvcache_manager.py * remove codes related o tokenizer and move model_policy * fix code style * add ordered_set to requirements-infer.txt * Delete extra empty lines * add ordered_set to requirements-test.txt	11 months ago
Jianghai	93aeacca34	[Inference]Update inference config and fix test (#5178 ) * unify the config setting * fix test * fix import * fix test * fix * fix * add logger * revise log info --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com>	11 months ago
Yuanheng Zhao	3de2e62299	[Inference] Add CacheBlock and KV-Cache Manager (#5156 ) * [Inference] Add KVCache Manager * function refactored * add test for KVCache Manager * add attr beam width * Revise alloc func in CacheManager * Fix docs and pytests * add tp slicing for head number * optimize shapes of tensors used as physical cache * Apply using InferenceConfig on KVCacheManager * rm duplicate config file * Optimize cache allocation: use contiguous cache * Fix config in pytest (and config)	11 months ago
yuehuayingxueluo	fab9b931d9	[Inference]Add BatchInferState, Sequence and InferConfig (#5149 ) * add infer_struct and infer_config * update codes * change InferConfig * Add hf_model_config to the engine * rm _get_hf_model_config * update codes * made adjustments according to the feedback from the reviewer. * update codes * add ci test for config and struct	11 months ago
Jianghai	56e75eeb06	[Inference] Add readme (roadmap) and fulfill request handler (#5147 ) * request handler * add readme --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com>	11 months ago
Jianghai	4cf4682e70	[Inference] First PR for rebuild colossal-infer (#5143 ) * add engine and scheduler * add dirs --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com>	11 months ago
Zhongkai Zhao	75af66cd81	[Hotfix] Fix model policy matching strategy in ShardFormer (#5064 ) * hotfix/Fix get model policy strategy in ShardFormer * fix bug in auto policy	1 year ago
Hongxin Liu	1cd7efc520	[inference] refactor examples and fix schedule (#5077 ) * [setup] refactor infer setup * [hotfix] fix infenrece behavior on 1 1 gpu * [exmaple] refactor inference examples	1 year ago
Xu Kai	fb103cfd6e	[inference] update examples and engine (#5073 ) * update examples and engine * fix choices * update example	1 year ago
Bin Jia	0c7d8bebd5	[hotfix/hybridengine] fix bug when tp*pp size = 1 (#5069 )	1 year ago
Cuiqing Li (李崔卿)	bce919708f	[Kernels]added flash-decoidng of triton (#5063 ) * added flash-decoidng of triton based on lightllm kernel * add req * clean * clean * delete build.sh --------- Co-authored-by: cuiqing.li <lixx336@gmail.com>	1 year ago
Xu Kai	fd6482ad8c	[inference] Refactor inference architecture (#5057 ) * [inference] support only TP (#4998) * support only tp * enable tp * add support for bloom (#5008) * [refactor] refactor gptq and smoothquant llama (#5012) * refactor gptq and smoothquant llama * fix import error * fix linear import torch-int * fix smoothquant llama import error * fix import accelerate error * fix bug * fix import smooth cuda * fix smoothcuda * [Inference Refactor] Merge chatglm2 with pp and tp (#5023) merge chatglm with pp and tp * [Refactor] remove useless inference code (#5022) * remove useless code * fix quant model * fix test import bug * mv original inference legacy * fix chatglm2 * [Refactor] refactor policy search and quant type controlling in inference (#5035) * [Refactor] refactor policy search and quant type controling in inference * [inference] update readme (#5051) * update readme * update readme * fix architecture * fix table * fix table * [inference] udpate example (#5053) * udpate example * fix run.sh * fix rebase bug * fix some errors * update readme * add some features * update interface * update readme * update benchmark * add requirements-infer --------- Co-authored-by: Bin Jia <45593998+FoolPlayer@users.noreply.github.com> Co-authored-by: Zhongkai Zhao <kanezz620@gmail.com>	1 year ago
Cuiqing Li (李崔卿)	28052a71fb	[Kernels]Update triton kernels into 2.1.0 (#5046 ) * update flash-context-attention * adding kernels * fix * reset * add build script * add building process * add llama2 exmaple * add colossal-llama2 test * clean * fall back test setting * fix test file * clean * clean * clean --------- Co-authored-by: cuiqing.li <lixx336@gmail.com>	1 year ago
Zhongkai Zhao	70885d707d	[hotfix] Suport extra_kwargs in ShardConfig (#5031 ) * [refactor]: replace inference args with extra_kwargs in ShardConfig * modify shardconfig * polish code * fix policy bug in llama * fix bug in auto policy * remove setattr in ShardConfig	1 year ago
Xuanlei Zhao	f71e63b0f3	[moe] support optimizer checkpoint (#5015 ) * Refactor MoE Manager setup method * unshard optim ckpt * optim io * update transformer version * update requirements * update ckpt * update ckpt * update ckpt * fix engine * fix engine	1 year ago
Jianghai	ef4c14a5e2	[Inference] Fix bug in ChatGLM2 Tensor Parallelism (#5014 ) * fix bug * fix * fix multiquery * fix multiquery --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com>	1 year ago
github-actions[bot]	c36e782d80	[format] applied code formatting on changed files in pull request 4926 (#5007 ) Co-authored-by: github-actions <github-actions@github.com>	1 year ago
littsk	1a3315e336	[hotfix] Add layer norm gradients all-reduce for sequence parallel (#4926 ) * [hotfix] Add layer norm gradients all-reduce for sequence parallel. (#4915) * Add layer norm gradients all-reduce for sequence parallel. * skip pipeline inference test * [hotfix] fixing polices of sequence parallel (#4922) * Add layer norm gradients all-reduce for sequence parallel. * fix parameter passing when calling get_autopolicy --------- Co-authored-by: littsk <1214689160@qq.com> * Hotfix/add grad all reduce for sequence parallel (#4927) * Add layer norm gradients all-reduce for sequence parallel. * fix parameter passing when calling get_autopolicy * fix bug using wrong variables --------- Co-authored-by: littsk <1214689160@qq.com> * fix policy initialization * fix bloom and chatglm policices * polish code of handling layernorm * fix moe module * polish code of class initializing --------- Co-authored-by: Zhongkai Zhao <kanezz620@gmail.com>	1 year ago
Bin Jia	b6696beb04	[Pipeline Inference] Merge pp with tp (#4993 ) * refactor pipeline into new CaiInferEngine * updata llama modeling forward * merge tp with pp * update docstring * optimize test workflow and example * fix typo * add assert and todo	1 year ago
Cuiqing Li (李崔卿)	4f0234f236	[doc]Update doc for colossal-inference (#4989 ) * update doc * Update README.md --------- Co-authored-by: cuiqing.li <lixx336@gmail.com>	1 year ago
Cuiqing Li	459a88c806	[Kernels]Updated Triton kernels into 2.1.0 and adding flash-decoding for llama token attention (#4965 ) * adding flash-decoding * clean * adding kernel * adding flash-decoding * add integration * add * adding kernel * adding kernel * adding triton 2.1.0 features for inference * update bloom triton kernel * remove useless vllm kernels * clean codes * fix * adding files * fix readme * update llama flash-decoding --------- Co-authored-by: cuiqing.li <lixx336@gmail.com>	1 year ago
Jianghai	cf579ff46d	[Inference] Dynamic Batching Inference, online and offline (#4953 ) * [inference] Dynamic Batching for Single and Multiple GPUs (#4831) * finish batch manager * 1 * first * fix * fix dynamic batching * llama infer * finish test * support different lengths generating * del prints * del prints * fix * fix bug --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com> * [inference] Async dynamic batching (#4894) * finish input and output logic * add generate * test forward * 1 * [inference]Re push async dynamic batching (#4901) * adapt to ray server * finish async * finish test * del test --------- Co-authored-by: yuehuayingxueluo <867460659@qq.com> * Revert "[inference]Re push async dynamic batching (#4901)" (#4905) This reverts commit `fbf3c09e67`. * Revert "[inference] Async dynamic batching (#4894)" This reverts commit `fced140250`. * Revert "[inference] Async dynamic batching (#4894)" (#4909) This reverts commit `fced140250`. * Add Ray Distributed Environment Init Scripts * support DynamicBatchManager base function * revert _set_tokenizer version * add driver async generate * add async test * fix bugs in test_ray_dist.py * add get_tokenizer.py * fix code style * fix bugs about No module named 'pydantic' in ci test * fix bugs in ci test * fix bugs in ci test * fix bugs in ci test * [infer]Add Ray Distributed Environment Init Scripts (#4911) * Revert "[inference] Async dynamic batching (#4894)" This reverts commit `fced140250`. * Add Ray Distributed Environment Init Scripts * support DynamicBatchManager base function * revert _set_tokenizer version * add driver async generate * add async test * fix bugs in test_ray_dist.py * add get_tokenizer.py * fix code style * fix bugs about No module named 'pydantic' in ci test * fix bugs in ci test * fix bugs in ci test * fix bugs in ci test * support dynamic batch for bloom model and is_running function * [Inference]Test for new Async engine (#4935) * infer engine * infer engine * test engine * test engine * new manager * change step * add * test * fix * fix * finish test * finish test * finish test * finish test * add license --------- Co-authored-by: yuehuayingxueluo <867460659@qq.com> * add assertion for config (#4947) * [Inference] Finish dynamic batching offline test (#4948) * test * fix test * fix quant * add default * fix * fix some bugs * fix some bugs * fix * fix bug * fix bugs * reset param --------- Co-authored-by: yuehuayingxueluo <867460659@qq.com> Co-authored-by: Cuiqing Li <lixx3527@gmail.com> Co-authored-by: CjhHa1 <cjh18671720497outlook.com>	1 year ago
Bin Jia	1db6727678	[Pipeline inference] Combine kvcache with pipeline inference (#4938 ) * merge kvcache with pipeline inference and refactor the code structure * support ppsize > 2 * refactor pipeline code * do pre-commit * modify benchmark * fix bench mark * polish code * add docstring and update readme * refactor the code * fix some logic bug of ppinfer * polish readme * fix typo * skip infer test	1 year ago
Xu Kai	785802e809	[inference] add reference and fix some bugs (#4937 ) * add reference and fix some bugs * update gptq init --------- Co-authored-by: Xu Kai <xukai16@foxamil.com>	1 year ago
Cuiqing Li	3a41e8304e	[Refactor] Integrated some lightllm kernels into token-attention (#4946 ) * add some req for inference * clean codes * add codes * add some lightllm deps * clean codes * hello * delete rms files * add some comments * add comments * add doc * add lightllm deps * add lightllm cahtglm2 kernels * add lightllm cahtglm2 kernels * replace rotary embedding with lightllm kernel * add some commnets * add some comments * add some comments * add * replace fwd kernel att1 * fix a arg * add * add * fix token attention * add some comments * clean codes * modify comments * fix readme * fix bug * fix bug --------- Co-authored-by: cuiqing.li <lixx336@gmail.com> Co-authored-by: CjhHa1 <cjh18671720497@outlook.com>	1 year ago
digger yu	11009103be	[nfc] fix some typo with colossalai/ docs/ etc. (#4920 )	1 year ago
github-actions[bot]	486d06a2d5	[format] applied code formatting on changed files in pull request 4820 (#4886 ) Co-authored-by: github-actions <github-actions@github.com>	1 year ago

1 2 3 4

160 Commits (457a0de79fd2d3602eba0ac78e606acb6401fc60)