ColossalAI

Commit Graph

Author	SHA1	Message	Date
yuehuayingxueluo	de4bf3dedf	[Inference]Adapt repetition_penalty and no_repeat_ngram_size (#5708 ) * Adapt repetition_penalty and no_repeat_ngram_size * fix no_repeat_ngram_size_logit_process * remove batch_updated * fix annotation * modified codes based on the review feedback. * rm get_batch_token_ids	7 months ago
傅剑寒	bfad39357b	[Inference/Feat] Add quant kvcache interface (#5700 ) * add quant kvcache interface * delete unused output * complete args comments	7 months ago
CjhHa1	bc9063adf1	resolve rebase conflicts on Branch feat/online-serving	7 months ago
Jianghai	61a1b2e798	[Inference] Fix bugs and docs for feat/online-server (#5598 ) * fix test bugs * add do sample test * del useless lines * fix comments * fix tests * delete version tag * delete version tag * add * del test sever * fix test * fix * Revert "add" This reverts commit `b9305fb024`.	7 months ago
CjhHa1	7bbb28e48b	[Inference] resolve rebase conflicts fix	7 months ago
Jianghai	c064032865	[Online Server] Chat Api for streaming and not streaming response (#5470 ) * fix bugs * fix bugs * fix api server * fix api server * add chat api and test * del request.n	7 months ago
Jianghai	de378cd2ab	[Inference] Finish Online Serving Test, add streaming output api, continuous batching test and example (#5432 ) * finish online test and add examples * fix test_contionus_batching * fix some bugs * fix bash * fix * fix inference * finish revision * fix typos * revision	7 months ago
Jianghai	69cd7e069d	[Inference] ADD async and sync Api server using FastAPI (#5396 ) * add api server * fix * add * add completion service and fix bug * add generation config * revise shardformer * fix bugs * add docstrings and fix some bugs * fix bugs and add choices for prompt template	7 months ago
yuehuayingxueluo	d482922035	[Inference] Support the logic related to ignoring EOS token (#5693 ) * Adapt temperature processing logic * add ValueError for top_p and top_k * add GQA Test * fix except_msg * support ignore EOS token * change variable's name * fix annotation	7 months ago
yuehuayingxueluo	9c2fe7935f	[Inference]Adapt temperature processing logic (#5689 ) * Adapt temperature processing logic * add ValueError for top_p and top_k * add GQA Test * fix except_msg	7 months ago
Yuanheng Zhao	55cc7f3df7	[Fix] Fix Inference Example, Tests, and Requirements (#5688 ) * clean requirements * modify example inference struct * add test ci scripts * mark test_infer as submodule * rm deprecated cls & deps * import of HAS_FLASH_ATTN * prune inference tests to be run * prune triton kernel tests * increment pytest timeout mins * revert import path in openmoe	7 months ago
Yuanheng Zhao	f9afe0addd	[hotfix] Fix KV Heads Number Assignment in KVCacheManager (#5695 ) - Fix key value number assignment in KVCacheManager, as well as method of accessing	7 months ago
Yuanheng Zhao	8754abae24	[Fix] Fix & Update Inference Tests (compatibility w/ main)	7 months ago
yuehuayingxueluo	f79963199c	[inference]Add alibi to flash attn function (#5678 ) * add alibi to flash attn function * rm redundant modifications	7 months ago
Steve Luo	5cd75ce4c7	[Inference/Kernel] refactor kvcache manager and rotary_embedding and kvcache_memcpy oper… (#5663 ) * refactor kvcache manager and rotary_embedding and kvcache_memcpy operator * refactor decode_kv_cache_memcpy * enable alibi in pagedattention	7 months ago
yuehuayingxueluo	5f00002e43	[Inference] Adapt Baichuan2-13B TP (#5659 ) * adapt to baichuan2 13B * add baichuan2 13B TP * update baichuan tp logic * rm unused code * Fix TP logic * fix alibi slopes tp logic * rm nn.Module * Polished the code. * change BAICHUAN_MODEL_NAME_OR_PATH * Modified the logic for loading Baichuan weights. * fix typos	7 months ago
yuehuayingxueluo	3c91e3f176	[Inference]Adapt to baichuan2 13B (#5614 ) * adapt to baichuan2 13B * adapt to baichuan2 13B * change BAICHUAN_MODEL_NAME_OR_PATH * fix test_decoding_attn.py * Modifications based on review comments. * change BAICHUAN_MODEL_NAME_OR_PATH * mv attn mask processes to test flash decoding * mv get_alibi_slopes baichuan modeling * fix bugs in test_baichuan.py	7 months ago
Steve Luo	a8fd3b0342	[Inference/Kernel] Optimize paged attention: Refactor key cache layout (#5643 ) * optimize flashdecodingattention: refactor code with different key cache layout(from [num_blocks, num_kv_heads, block_size, head_size] to [num_blocks, num_kv_heads, head_size/x, block_size, x]) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	7 months ago
Yuanheng Zhao	04863a9b14	[example] Update Llama Inference example (#5629 ) * [example] add infernece benchmark llama3 * revise inference config - arg * remove unused args * add llama generation demo script * fix init rope in llama policy * add benchmark-llama3 - cleanup	7 months ago
Yuanheng Zhao	5d4c1fe8f5	[Fix/Inference] Fix GQA Triton and Support Llama3 (#5624 ) * [fix] GQA calling of flash decoding triton * fix kv cache alloc shape * fix rotary triton - GQA * fix sequence max length assigning * Sequence max length logic * fix scheduling and spec-dec * skip without import error * fix pytest - skip without ImportError --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	7 months ago
Runyu Lu	e37ee2fb65	[Feat]Tensor Model Parallel Support For Inference (#5563 ) * tensor parallel support naive source * [fix]precision, model load and refactor the framework * add tp unit test * docstring * fix do_sample	7 months ago
Steve Luo	be396ad6cc	[Inference/Kernel] Add Paged Decoding kernel, sequence split within the same thread block (#5531 ) * feat flash decoding for paged attention * refactor flashdecodingattention * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	7 months ago
yuehuayingxueluo	56b222eff8	[inference/model]Adapted to the baichuan2-7B model (#5591 ) * Adapted to the baichuan2-7B model * modified according to the review comments. * Modified the method of obtaining random weights. * modified according to the review comments. * change mlp layewr 'NOTE'	8 months ago
Yuanheng	f8598e3ec5	[Fix] Llama Modeling Control with Spec-Dec (#5580 ) - fix ref before asgmt - fall back to use triton kernels when using spec-dec	8 months ago
Yuanheng Zhao	e60d430cf5	[Fix] resolve conflicts of rebasing feat/speculative-decoding (#5557 ) - resolve conflicts of rebasing feat/speculative-decoding	8 months ago
Yuanheng Zhao	e1acb58423	[doc] Add inference/speculative-decoding README (#5552 ) * add README for spec-dec * update roadmap	8 months ago
Yuanheng Zhao	d85d91435a	[Inference/SpecDec] Support GLIDE Drafter Model (#5455 ) * add glide-llama policy and modeling * update glide modeling, compitable with transformers 4.36.2 * revise glide llama modeling/usage * fix issues of glimpsing large kv * revise the way re-loading params for glide drafter * fix drafter and engine tests * enable convert to glide strict=False * revise glide llama modeling * revise vicuna prompt template * revise drafter and tests * apply usage of glide model in engine	8 months ago
Yuanheng Zhao	912e24b2aa	[SpecDec] Fix inputs for speculation and revise past KV trimming (#5449 ) * fix drafter pastkv and usage of batch bucket	8 months ago
Yuanheng Zhao	a37f82629d	[Inference/SpecDec] Add Speculative Decoding Implementation (#5423 ) * fix flash decoding mask during verification * add spec-dec * add test for spec-dec * revise drafter init * remove drafter sampling * retire past kv in drafter * (trivial) rename attrs * (trivial) rename arg * revise how we enable/disable spec-dec	8 months ago
Yuanheng Zhao	5a9b05f7b2	[Inference/SpecDec] Add Basic Drafter Model Container (#5405 ) * [Infer/Fix] Fix Dependency in test - RMSNorm kernel (#5399) fix dependency in pytest * add drafter model container (basic ver)	8 months ago
Yuanheng Zhao	4bb5d8923a	[Fix/Inference] Remove unused and non-functional functions (#5543 ) * [fix] remove unused func * rm non-functional partial	8 months ago
yuehuayingxueluo	04aca9e55b	[Inference/Kernel]Add get_cos_and_sin Kernel (#5528 ) * Add get_cos_and_sin kernel * fix code comments * fix code typos * merge common codes of get_cos_and_sin kernel. * Fixed a typo * Changed 'asset allclose' to 'assert equal'.	8 months ago
傅剑寒	e6496dd371	[Inference] Optimize request handler of llama (#5512 ) * optimize request_handler * fix ways of writing	8 months ago
Runyu Lu	6251d68dc9	[fix] PR #5354 (#5501 ) * [fix] * [fix] * Update config.py docstring * [fix] docstring align * [fix] docstring align * [fix] docstring align	8 months ago
Runyu Lu	68e9396bc0	[fix] merge conflicts	8 months ago
yuehuayingxueluo	87079cffe8	[Inference]Support FP16/BF16 Flash Attention 2 And Add high_precision Flag To Rotary Embedding (#5461 ) * Support FP16/BF16 Flash Attention 2 * fix bugs in test_kv_cache_memcpy.py * add context_kv_cache_memcpy_kernel.cu * rm typename MT * add tail process * add high_precision * add high_precision to config.py * rm unused code * change the comment for the high_precision parameter * update test_rotary_embdding_unpad.py * fix vector_copy_utils.h * add comment for self.high_precision when using float32	8 months ago
Runyu Lu	ff4998c6f3	[fix] remove unused comment	8 months ago
Runyu Lu	5b017d6324	[fix]	8 months ago
Runyu Lu	4eafe0c814	[fix] unused option	8 months ago
Runyu Lu	aabc9fb6aa	[feat] add use_cuda_kernel option	8 months ago
Runyu Lu	6e30248683	[fix] tmp for test	9 months ago
Runyu Lu	d02e257abd	Merge branch 'feature/colossal-infer' into colossal-infer-cuda-graph	9 months ago
Runyu Lu	ae24b4f025	diverse tests	9 months ago
Runyu Lu	1821a6dab0	[fix] pytest and fix dyn grid bug	9 months ago
yuehuayingxueluo	f366a5ea1f	[Inference/kernel]Add Fused Rotary Embedding and KVCache Memcopy CUDA Kernel (#5418 ) * add rotary embedding kernel * add rotary_embedding_kernel * add fused rotary_emb and kvcache memcopy * add fused_rotary_emb_and_cache_kernel.cu * add fused_rotary_emb_and_memcopy * fix bugs in fused_rotary_emb_and_cache_kernel.cu * fix ci bugs * use vec memcopy and opt the gloabl memory access * fix code style * fix test_rotary_embdding_unpad.py * codes revised based on the review comments * fix bugs about include path * rm inline	9 months ago
Runyu Lu	633e95b301	[doc] add doc	9 months ago
Runyu Lu	9dec66fad6	[fix] multi graphs capture error	9 months ago
Runyu Lu	b2c0d9ff2b	[fix] multi graphs capture error	9 months ago
Steve Luo	f7aecc0c6b	feat rmsnorm cuda kernel and add unittest, benchmark script (#5417 )	9 months ago
Runyu Lu	cefaeb5fdd	[feat] cuda graph support and refactor non-functional api	9 months ago

1 2 3

138 Commits (de4bf3dedf2c7cb7ba6c3044745bab3c3ef6352d)