ColossalAI

Commit Graph

Author	SHA1	Message	Date
Hongxin Liu	646b3c5a90	[shardformer] fix linear 1d row and support uneven splits for fused qkv linear (#6084 ) * [tp] hotfix linear row * [tp] support uneven split for fused linear * [tp] support sp for fused linear * [tp] fix gpt2 mlp policy * [tp] fix gather fused and add fused linear row	2 months ago
Gao, Ruiyuan	e9032fb0b2	[colossalai/checkpoint_io/...] fix bug in load_state_dict_into_model; format error msg (#6020 ) * fix bug in load_state_dict_into_model; format error msg * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update utils.py to support checking missing_keys * Update general_checkpoint_io.py fix bug in missing_keys error message * retrigger tests --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	3 months ago
Runyu Lu	bcf0181ecd	[Feat] Distrifusion Acceleration Support for Diffusion Inference (#5895 ) * Distrifusion Support source * comp comm overlap optimization * sd3 benchmark * pixart distrifusion bug fix * sd3 bug fix and benchmark * generation bug fix * naming fix * add docstring, fix counter and shape error * add reference * readme and requirement	4 months ago
Runyu Lu	66abf1c6e8	[HotFix] CI,import,requirements-test for #5838 (#5892 ) * [Hot Fix] CI,import,requirements-test --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	5 months ago
Runyu Lu	cba20525a8	[Feat] Diffusion Model(PixArtAlpha/StableDiffusion3) Support (#5838 ) * Diffusion Model Inference support * Stable Diffusion 3 Support * pixartalpha support	5 months ago
pre-commit-ci[bot]	7c2f79fa98	[pre-commit.ci] pre-commit autoupdate (#5572 ) * [pre-commit.ci] pre-commit autoupdate updates: - [github.com/PyCQA/autoflake: v2.2.1 → v2.3.1](https://github.com/PyCQA/autoflake/compare/v2.2.1...v2.3.1) - [github.com/pycqa/isort: 5.12.0 → 5.13.2](https://github.com/pycqa/isort/compare/5.12.0...5.13.2) - [github.com/psf/black-pre-commit-mirror: 23.9.1 → 24.4.2](https://github.com/psf/black-pre-commit-mirror/compare/23.9.1...24.4.2) - [github.com/pre-commit/mirrors-clang-format: v13.0.1 → v18.1.7](https://github.com/pre-commit/mirrors-clang-format/compare/v13.0.1...v18.1.7) - [github.com/pre-commit/pre-commit-hooks: v4.3.0 → v4.6.0](https://github.com/pre-commit/pre-commit-hooks/compare/v4.3.0...v4.6.0) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	5 months ago
Runyu Lu	3c7cda0c9a	[Inference]Lazy Init Support (#5785 ) * lazy init support * lazy init llama support * :lazy init support for baichuan * aligh rpc * add note for baichuan --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	5 months ago
Yuanheng Zhao	7b249c76e5	[Fix] Fix spec-dec Glide LlamaModel for compatibility with transformers (#5837 ) * fix glide llama model * revise	5 months ago
flybird11111	2ddf624a86	[shardformer] upgrade transformers to 4.39.3 (#5815 ) * [shardformer]upgrade transformers for gpt2/gptj/whisper (#5807) * [shardformer] fix modeling of gpt2 and gptj * [shardformer] fix whisper modeling * [misc] update requirements --------- Co-authored-by: ver217 <lhx0217@gmail.com> * [shardformer]upgrade transformers for mistral (#5808) * upgrade transformers for mistral * fix * fix * [shardformer]upgrade transformers for llama (#5809) * update transformers fix * fix * fix * [inference] upgrade transformers (#5810) * update transformers fix * fix * fix * fix * fix * [gemini] update transformers for gemini (#5814) --------- Co-authored-by: ver217 <lhx0217@gmail.com>	5 months ago
Li Xingjian	8554585a5f	[Inference] Fix flash-attn import and add model test (#5794 ) * Fix torch int32 dtype Signed-off-by: char-1ee <xingjianli59@gmail.com> * Fix flash-attn import Signed-off-by: char-1ee <xingjianli59@gmail.com> * Add generalized model test Signed-off-by: char-1ee <xingjianli59@gmail.com> * Remove exposed path to model Signed-off-by: char-1ee <xingjianli59@gmail.com> * Add default value for use_flash_attn Signed-off-by: char-1ee <xingjianli59@gmail.com> * Rename model test Signed-off-by: char-1ee <xingjianli59@gmail.com> --------- Signed-off-by: char-1ee <xingjianli59@gmail.com>	6 months ago
Runyu Lu	c0948aff97	[Inference]refactor baichuan (#5791 ) * refactor baichuan * remove unused code and add TODO for lazyinit	6 months ago
char-1ee	f5981e808e	Remove flash attention backend Signed-off-by: char-1ee <xingjianli59@gmail.com>	6 months ago
char-1ee	5f398fc000	Pass inference model shard configs for module init Signed-off-by: char-1ee <xingjianli59@gmail.com>	6 months ago
char-1ee	eec77e5702	Fix tests and naming Signed-off-by: char-1ee <xingjianli59@gmail.com>	6 months ago
char-1ee	04386d9eff	Refactor modeling by adding attention backend Signed-off-by: char-1ee <xingjianli59@gmail.com>	6 months ago
yuehuayingxueluo	b45000f839	[Inference]Add Streaming LLM (#5745 ) * Add Streaming LLM * add some parameters to llama_generation.py * verify streamingllm config * add test_streamingllm.py * modified according to the opinions of review * add Citation * change _block_tables tolist	6 months ago
Yuanheng Zhao	406443200f	[Hotfix] Add missing init file in inference.executor (#5774 )	6 months ago
Jianghai	85946d4236	[Inference]Fix readme and example for API server (#5742 ) * fix chatapi readme and example * updating doc * add an api and change the doc * remove * add credits and del 'API' heading * readme * readme	6 months ago
binmakeswell	4647ec28c8	[inference] release (#5747 ) * [inference] release * [inference] release * [inference] release * [inference] release * [inference] release * [inference] release * [inference] release	6 months ago
Yuanheng Zhao	d8b1ea4ac9	[doc] Update Inference Readme (#5736 ) * [doc] update inference readme * add contents * trivial	6 months ago
Yuanheng Zhao	bdf9a001d6	[Fix/Inference] Add unsupported auto-policy error message (#5730 ) * [fix] auto policy error message * trivial	6 months ago
Yuanheng Zhao	283c407a19	[Inference] Fix Inference Generation Config and Sampling (#5710 ) * refactor and add * config default values * fix gen config passing * fix rpc generation config	6 months ago
Yuanheng Zhao	8bcfe360fd	[example] Update Inference Example (#5725 ) * [example] update inference example	6 months ago
Jianghai	f47f2fbb24	[Inference] Fix API server, test and example (#5712 ) * fix api server * fix generation config * fix api server * fix comments * fix infer hanging bug * resolve comments, change backend to free port	6 months ago
Runyu Lu	74c47921fa	[Fix] Llama3 Load/Omit CheckpointIO Temporarily (#5717 ) * Fix Llama3 Load error * Omit Checkpoint IO Temporarily	6 months ago
Steve Luo	7806842f2d	add paged-attetionv2: support seq length split across thread block (#5707 )	7 months ago
Runyu Lu	18d67d0e8e	[Feat]Inference RPC Server Support (#5705 ) * rpc support source * kv cache logical/physical disaggregation * sampler refactor * colossalai launch built in * Unitest * Rpyc support --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	7 months ago
yuehuayingxueluo	de4bf3dedf	[Inference]Adapt repetition_penalty and no_repeat_ngram_size (#5708 ) * Adapt repetition_penalty and no_repeat_ngram_size * fix no_repeat_ngram_size_logit_process * remove batch_updated * fix annotation * modified codes based on the review feedback. * rm get_batch_token_ids	7 months ago
傅剑寒	bfad39357b	[Inference/Feat] Add quant kvcache interface (#5700 ) * add quant kvcache interface * delete unused output * complete args comments	7 months ago
CjhHa1	bc9063adf1	resolve rebase conflicts on Branch feat/online-serving	7 months ago
Jianghai	61a1b2e798	[Inference] Fix bugs and docs for feat/online-server (#5598 ) * fix test bugs * add do sample test * del useless lines * fix comments * fix tests * delete version tag * delete version tag * add * del test sever * fix test * fix * Revert "add" This reverts commit `b9305fb024`.	7 months ago
CjhHa1	7bbb28e48b	[Inference] resolve rebase conflicts fix	7 months ago
Jianghai	c064032865	[Online Server] Chat Api for streaming and not streaming response (#5470 ) * fix bugs * fix bugs * fix api server * fix api server * add chat api and test * del request.n	7 months ago
Jianghai	de378cd2ab	[Inference] Finish Online Serving Test, add streaming output api, continuous batching test and example (#5432 ) * finish online test and add examples * fix test_contionus_batching * fix some bugs * fix bash * fix * fix inference * finish revision * fix typos * revision	7 months ago
Jianghai	69cd7e069d	[Inference] ADD async and sync Api server using FastAPI (#5396 ) * add api server * fix * add * add completion service and fix bug * add generation config * revise shardformer * fix bugs * add docstrings and fix some bugs * fix bugs and add choices for prompt template	7 months ago
yuehuayingxueluo	d482922035	[Inference] Support the logic related to ignoring EOS token (#5693 ) * Adapt temperature processing logic * add ValueError for top_p and top_k * add GQA Test * fix except_msg * support ignore EOS token * change variable's name * fix annotation	7 months ago
yuehuayingxueluo	9c2fe7935f	[Inference]Adapt temperature processing logic (#5689 ) * Adapt temperature processing logic * add ValueError for top_p and top_k * add GQA Test * fix except_msg	7 months ago
Yuanheng Zhao	55cc7f3df7	[Fix] Fix Inference Example, Tests, and Requirements (#5688 ) * clean requirements * modify example inference struct * add test ci scripts * mark test_infer as submodule * rm deprecated cls & deps * import of HAS_FLASH_ATTN * prune inference tests to be run * prune triton kernel tests * increment pytest timeout mins * revert import path in openmoe	7 months ago
Yuanheng Zhao	f9afe0addd	[hotfix] Fix KV Heads Number Assignment in KVCacheManager (#5695 ) - Fix key value number assignment in KVCacheManager, as well as method of accessing	7 months ago
Yuanheng Zhao	8754abae24	[Fix] Fix & Update Inference Tests (compatibility w/ main)	7 months ago
yuehuayingxueluo	f79963199c	[inference]Add alibi to flash attn function (#5678 ) * add alibi to flash attn function * rm redundant modifications	7 months ago
Steve Luo	5cd75ce4c7	[Inference/Kernel] refactor kvcache manager and rotary_embedding and kvcache_memcpy oper… (#5663 ) * refactor kvcache manager and rotary_embedding and kvcache_memcpy operator * refactor decode_kv_cache_memcpy * enable alibi in pagedattention	7 months ago
yuehuayingxueluo	5f00002e43	[Inference] Adapt Baichuan2-13B TP (#5659 ) * adapt to baichuan2 13B * add baichuan2 13B TP * update baichuan tp logic * rm unused code * Fix TP logic * fix alibi slopes tp logic * rm nn.Module * Polished the code. * change BAICHUAN_MODEL_NAME_OR_PATH * Modified the logic for loading Baichuan weights. * fix typos	7 months ago
yuehuayingxueluo	3c91e3f176	[Inference]Adapt to baichuan2 13B (#5614 ) * adapt to baichuan2 13B * adapt to baichuan2 13B * change BAICHUAN_MODEL_NAME_OR_PATH * fix test_decoding_attn.py * Modifications based on review comments. * change BAICHUAN_MODEL_NAME_OR_PATH * mv attn mask processes to test flash decoding * mv get_alibi_slopes baichuan modeling * fix bugs in test_baichuan.py	7 months ago
Steve Luo	a8fd3b0342	[Inference/Kernel] Optimize paged attention: Refactor key cache layout (#5643 ) * optimize flashdecodingattention: refactor code with different key cache layout(from [num_blocks, num_kv_heads, block_size, head_size] to [num_blocks, num_kv_heads, head_size/x, block_size, x]) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	7 months ago
Yuanheng Zhao	04863a9b14	[example] Update Llama Inference example (#5629 ) * [example] add infernece benchmark llama3 * revise inference config - arg * remove unused args * add llama generation demo script * fix init rope in llama policy * add benchmark-llama3 - cleanup	7 months ago
Yuanheng Zhao	5d4c1fe8f5	[Fix/Inference] Fix GQA Triton and Support Llama3 (#5624 ) * [fix] GQA calling of flash decoding triton * fix kv cache alloc shape * fix rotary triton - GQA * fix sequence max length assigning * Sequence max length logic * fix scheduling and spec-dec * skip without import error * fix pytest - skip without ImportError --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	7 months ago
Runyu Lu	e37ee2fb65	[Feat]Tensor Model Parallel Support For Inference (#5563 ) * tensor parallel support naive source * [fix]precision, model load and refactor the framework * add tp unit test * docstring * fix do_sample	7 months ago
Steve Luo	be396ad6cc	[Inference/Kernel] Add Paged Decoding kernel, sequence split within the same thread block (#5531 ) * feat flash decoding for paged attention * refactor flashdecodingattention * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	7 months ago
yuehuayingxueluo	56b222eff8	[inference/model]Adapted to the baichuan2-7B model (#5591 ) * Adapted to the baichuan2-7B model * modified according to the review comments. * Modified the method of obtaining random weights. * modified according to the review comments. * change mlp layewr 'NOTE'	7 months ago

1 2 3 4

165 Commits (30a94431323d71c5ef06bd4b7f047aced3312fdf)