ColossalAI

Commit Graph

Author	SHA1	Message	Date
Runyu Lu	18d67d0e8e	[Feat]Inference RPC Server Support (#5705 ) * rpc support source * kv cache logical/physical disaggregation * sampler refactor * colossalai launch built in * Unitest * Rpyc support --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	6 months ago
hugo-syn	393c8f5b7f	[hotfix] fix inference typo (#5438 )	7 months ago
Edenzzzz	785cd9a9c9	[misc] Update PyTorch version in docs (#5711 ) Co-authored-by: Edenzzzz <wtan45@wisc.edu>	7 months ago
yuehuayingxueluo	de4bf3dedf	[Inference]Adapt repetition_penalty and no_repeat_ngram_size (#5708 ) * Adapt repetition_penalty and no_repeat_ngram_size * fix no_repeat_ngram_size_logit_process * remove batch_updated * fix annotation * modified codes based on the review feedback. * rm get_batch_token_ids	7 months ago
傅剑寒	50104ab340	[Inference/Feat] Add convert_fp8 op for fp8 test in the future (#5706 ) * add convert_fp8 op for fp8 test in the future * rerun ci	7 months ago
Wang Binluo	537f6a3855	[Shardformer]fix the num_heads assert for llama model and qwen model (#5704 ) * fix the num_heads assert * fix the transformers import * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix the import --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	7 months ago
Wang Binluo	a3cc68ca93	[Shardformer] Support the Qwen2 model (#5699 ) * feat: support qwen2 model * fix: modify model config and add Qwen2RMSNorm * fix qwen2 model conflicts * test: add qwen2 shard test * to: add qwen2 auto policy * support qwen model * fix the conflicts * add try catch * add transformers version for qwen2 * add the ColoAttention for the qwen2 model * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add the unit test version check * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix the test input bug * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix the version check * fix the version check --------- Co-authored-by: Wenhao Chen <cwher@outlook.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	7 months ago
傅剑寒	bfad39357b	[Inference/Feat] Add quant kvcache interface (#5700 ) * add quant kvcache interface * delete unused output * complete args comments	7 months ago
Jianghai	492520dbdb	Merge pull request #5588 from hpcaitech/feat/online-serving [Feature]Online Serving	7 months ago
CjhHa1	5d9a49483d	[Inference] Add example test_ci script	7 months ago
flybird11111	d4c5ef441e	[gemini]remove registered gradients hooks (#5696 ) * fix gemini fix gemini * fix fix	7 months ago
CjhHa1	bc9063adf1	resolve rebase conflicts on Branch feat/online-serving	7 months ago
Jianghai	61a1b2e798	[Inference] Fix bugs and docs for feat/online-server (#5598 ) * fix test bugs * add do sample test * del useless lines * fix comments * fix tests * delete version tag * delete version tag * add * del test sever * fix test * fix * Revert "add" This reverts commit `b9305fb024`.	7 months ago
CjhHa1	7bbb28e48b	[Inference] resolve rebase conflicts fix	7 months ago
Jianghai	c064032865	[Online Server] Chat Api for streaming and not streaming response (#5470 ) * fix bugs * fix bugs * fix api server * fix api server * add chat api and test * del request.n	7 months ago
Jianghai	de378cd2ab	[Inference] Finish Online Serving Test, add streaming output api, continuous batching test and example (#5432 ) * finish online test and add examples * fix test_contionus_batching * fix some bugs * fix bash * fix * fix inference * finish revision * fix typos * revision	7 months ago
Jianghai	69cd7e069d	[Inference] ADD async and sync Api server using FastAPI (#5396 ) * add api server * fix * add * add completion service and fix bug * add generation config * revise shardformer * fix bugs * add docstrings and fix some bugs * fix bugs and add choices for prompt template	7 months ago
yuehuayingxueluo	d482922035	[Inference] Support the logic related to ignoring EOS token (#5693 ) * Adapt temperature processing logic * add ValueError for top_p and top_k * add GQA Test * fix except_msg * support ignore EOS token * change variable's name * fix annotation	7 months ago
yuehuayingxueluo	9c2fe7935f	[Inference]Adapt temperature processing logic (#5689 ) * Adapt temperature processing logic * add ValueError for top_p and top_k * add GQA Test * fix except_msg	7 months ago
Yuanheng Zhao	12e7c28d5e	[hotfix] fix OpenMOE example import path (#5697 )	7 months ago
Wang Binluo	22297789ab	Merge pull request #5684 from wangbluo/parallel_output [Shardformer] Add Parallel output for shardformer models	7 months ago
Yuanheng Zhao	55cc7f3df7	[Fix] Fix Inference Example, Tests, and Requirements (#5688 ) * clean requirements * modify example inference struct * add test ci scripts * mark test_infer as submodule * rm deprecated cls & deps * import of HAS_FLASH_ATTN * prune inference tests to be run * prune triton kernel tests * increment pytest timeout mins * revert import path in openmoe	7 months ago
Yuanheng Zhao	f9afe0addd	[hotfix] Fix KV Heads Number Assignment in KVCacheManager (#5695 ) - Fix key value number assignment in KVCacheManager, as well as method of accessing	7 months ago
wangbluo	4e50cce26b	fix the mistral model	7 months ago
wangbluo	a8408b4d31	remove comment code	7 months ago
pre-commit-ci[bot]	ca56b93d83	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	7 months ago
wangbluo	108ddfb795	add parallel_output for the opt model	7 months ago
pre-commit-ci[bot]	88f057ce7c	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	7 months ago
Edenzzzz	58954b2986	[misc] Add an existing issue checkbox in bug report (#5691 ) Co-authored-by: Wenxuan(Eden) Tan <wtan45@wisc.edu>	7 months ago
flybird11111	77ec773388	[zero]remove registered gradients hooks (#5687 ) * remove registered hooks fix fix fix zero fix fix fix fix fix zero fix zero fix fix fix * fix fix fix	7 months ago
Edenzzzz	c25f83c85f	fix missing pad token (#5690 ) Co-authored-by: Edenzzzz <wtan45@wisc.edu>	7 months ago
傅剑寒	1ace1065e6	[Inference/Feat] Add quant kvcache support for decode_kv_cache_memcpy (#5686 )	7 months ago
Yuanheng Zhao	db7b3051f4	[Sync] Update from main to feature/colossal-infer (Merge pull request #5685 ) [Sync] Update from main to feature/colossal-infer - Merge pull request #5685 from yuanheng-zhao/inference/merge/main	7 months ago
Steve Luo	725fbd2ed0	[Inference] Remove unnecessary float4_ and rename float8_ to float8 (#5679 )	7 months ago
Yuanheng Zhao	8754abae24	[Fix] Fix & Update Inference Tests (compatibility w/ main)	7 months ago
Yuanheng Zhao	56ed09aba5	[sync] resolve conflicts of merging main	7 months ago
Yuanheng Zhao	537a3cbc4d	[kernel] Support New KCache Layout - Triton Kernel (#5677 ) * kvmemcpy triton for new kcache layout * revise tests for new kcache layout * naive triton flash decoding - new kcache layout * rotary triton kernel - new kcache layout * remove redundancy - triton decoding * remove redundancy - triton kvcache copy * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	7 months ago
wangbluo	2632916329	remove useless code	7 months ago
傅剑寒	9df016fc45	[Inference] Fix quant bits order (#5681 )	7 months ago
yuehuayingxueluo	f79963199c	[inference]Add alibi to flash attn function (#5678 ) * add alibi to flash attn function * rm redundant modifications	7 months ago
傅剑寒	ef8e4ffe31	[Inference/Feat] Add kvcache quant support for fused_rotary_embedding_cache_copy (#5680 )	7 months ago
wangbluo	9efc79ef24	add parallel output for mistral model	7 months ago
Steve Luo	5cd75ce4c7	[Inference/Kernel] refactor kvcache manager and rotary_embedding and kvcache_memcpy oper… (#5663 ) * refactor kvcache manager and rotary_embedding and kvcache_memcpy operator * refactor decode_kv_cache_memcpy * enable alibi in pagedattention	7 months ago
yuehuayingxueluo	5f00002e43	[Inference] Adapt Baichuan2-13B TP (#5659 ) * adapt to baichuan2 13B * add baichuan2 13B TP * update baichuan tp logic * rm unused code * Fix TP logic * fix alibi slopes tp logic * rm nn.Module * Polished the code. * change BAICHUAN_MODEL_NAME_OR_PATH * Modified the logic for loading Baichuan weights. * fix typos	7 months ago
傅剑寒	808ee6e4ad	[Inference/Feat] Feat quant kvcache step2 (#5674 )	7 months ago
Wang Binluo	d3f34ee8cc	[Shardformer] add assert for num of attention heads divisible by tp_size (#5670 ) * add assert for num of attention heads divisible by tp_size * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	7 months ago
flybird11111	6af6d6fc9f	[shardformer] support bias_gelu_jit_fused for models (#5647 ) * support gelu_bias_fused for gpt2 * support gelu_bias_fused for gpt2 fix fix fix * fix fix * fix	7 months ago
Hongxin Liu	7f8b16635b	[misc] refactor launch API and tensor constructor (#5666 ) * [misc] remove config arg from initialize * [misc] remove old tensor contrusctor * [plugin] add npu support for ddp * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [devops] fix doc test ci * [test] fix test launch * [doc] update launch doc --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	7 months ago
linsj20	91fa553775	[Feature] qlora support (#5586 ) * [feature] qlora support * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * qlora follow commit * migrate qutization folder to colossalai/ * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fixes --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	7 months ago
flybird11111	8954a0c2e2	[LowLevelZero] low level zero support lora (#5153 ) * low level zero support lora low level zero support lora * add checkpoint test * add checkpoint test * fix * fix * fix * fix fix fix fix * fix * fix fix fix fix fix fix fix * fix * fix fix fix fix fix fix fix * fix * test ci * git # This is a combination of 3 commits. Update low_level_zero_plugin.py Update low_level_zero_plugin.py fix fix fix * fix naming fix naming fix naming fix	7 months ago

... 5 6 7 8 9 ...

3578 Commits (ColossalChat) All Branches Search

3578 Commits (ColossalChat)

All Branches