Yuanheng Zhao
bdf9a001d6
[Fix/Inference] Add unsupported auto-policy error message ( #5730 )
...
* [fix] auto policy error message
* trivial
6 months ago
Yuanheng Zhao
283c407a19
[Inference] Fix Inference Generation Config and Sampling ( #5710 )
...
* refactor and add
* config default values
* fix gen config passing
* fix rpc generation config
6 months ago
Yuanheng Zhao
8bcfe360fd
[example] Update Inference Example ( #5725 )
...
* [example] update inference example
6 months ago
傅剑寒
a8d459f99a
【Inference] Delete duplicated package ( #5723 )
6 months ago
Jianghai
f47f2fbb24
[Inference] Fix API server, test and example ( #5712 )
...
* fix api server
* fix generation config
* fix api server
* fix comments
* fix infer hanging bug
* resolve comments, change backend to free port
6 months ago
Runyu Lu
74c47921fa
[Fix] Llama3 Load/Omit CheckpointIO Temporarily ( #5717 )
...
* Fix Llama3 Load error
* Omit Checkpoint IO Temporarily
7 months ago
Yuanheng Zhao
5bbab1533a
[ci] Fix example tests ( #5714 )
...
* [fix] revise timeout value on example CI
* trivial
7 months ago
傅剑寒
121d7ad629
[Inference] Delete duplicated copy_vector ( #5716 )
7 months ago
Steve Luo
7806842f2d
add paged-attetionv2: support seq length split across thread block ( #5707 )
7 months ago
Runyu Lu
18d67d0e8e
[Feat]Inference RPC Server Support ( #5705 )
...
* rpc support source
* kv cache logical/physical disaggregation
* sampler refactor
* colossalai launch built in
* Unitest
* Rpyc support
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
7 months ago
yuehuayingxueluo
de4bf3dedf
[Inference]Adapt repetition_penalty and no_repeat_ngram_size ( #5708 )
...
* Adapt repetition_penalty and no_repeat_ngram_size
* fix no_repeat_ngram_size_logit_process
* remove batch_updated
* fix annotation
* modified codes based on the review feedback.
* rm get_batch_token_ids
7 months ago
傅剑寒
50104ab340
[Inference/Feat] Add convert_fp8 op for fp8 test in the future ( #5706 )
...
* add convert_fp8 op for fp8 test in the future
* rerun ci
7 months ago
傅剑寒
bfad39357b
[Inference/Feat] Add quant kvcache interface ( #5700 )
...
* add quant kvcache interface
* delete unused output
* complete args comments
7 months ago
Jianghai
492520dbdb
Merge pull request #5588 from hpcaitech/feat/online-serving
...
[Feature]Online Serving
7 months ago
CjhHa1
5d9a49483d
[Inference] Add example test_ci script
7 months ago
CjhHa1
bc9063adf1
resolve rebase conflicts on Branch feat/online-serving
7 months ago
Jianghai
61a1b2e798
[Inference] Fix bugs and docs for feat/online-server ( #5598 )
...
* fix test bugs
* add do sample test
* del useless lines
* fix comments
* fix tests
* delete version tag
* delete version tag
* add
* del test sever
* fix test
* fix
* Revert "add"
This reverts commit b9305fb024
.
7 months ago
CjhHa1
7bbb28e48b
[Inference] resolve rebase conflicts
...
fix
7 months ago
Jianghai
c064032865
[Online Server] Chat Api for streaming and not streaming response ( #5470 )
...
* fix bugs
* fix bugs
* fix api server
* fix api server
* add chat api and test
* del request.n
7 months ago
Jianghai
de378cd2ab
[Inference] Finish Online Serving Test, add streaming output api, continuous batching test and example ( #5432 )
...
* finish online test and add examples
* fix test_contionus_batching
* fix some bugs
* fix bash
* fix
* fix inference
* finish revision
* fix typos
* revision
7 months ago
Jianghai
69cd7e069d
[Inference] ADD async and sync Api server using FastAPI ( #5396 )
...
* add api server
* fix
* add
* add completion service and fix bug
* add generation config
* revise shardformer
* fix bugs
* add docstrings and fix some bugs
* fix bugs and add choices for prompt template
7 months ago
yuehuayingxueluo
d482922035
[Inference] Support the logic related to ignoring EOS token ( #5693 )
...
* Adapt temperature processing logic
* add ValueError for top_p and top_k
* add GQA Test
* fix except_msg
* support ignore EOS token
* change variable's name
* fix annotation
7 months ago
yuehuayingxueluo
9c2fe7935f
[Inference]Adapt temperature processing logic ( #5689 )
...
* Adapt temperature processing logic
* add ValueError for top_p and top_k
* add GQA Test
* fix except_msg
7 months ago
Yuanheng Zhao
12e7c28d5e
[hotfix] fix OpenMOE example import path ( #5697 )
7 months ago
Yuanheng Zhao
55cc7f3df7
[Fix] Fix Inference Example, Tests, and Requirements ( #5688 )
...
* clean requirements
* modify example inference struct
* add test ci scripts
* mark test_infer as submodule
* rm deprecated cls & deps
* import of HAS_FLASH_ATTN
* prune inference tests to be run
* prune triton kernel tests
* increment pytest timeout mins
* revert import path in openmoe
7 months ago
Yuanheng Zhao
f9afe0addd
[hotfix] Fix KV Heads Number Assignment in KVCacheManager ( #5695 )
...
- Fix key value number assignment in KVCacheManager, as well as method of accessing
7 months ago
傅剑寒
1ace1065e6
[Inference/Feat] Add quant kvcache support for decode_kv_cache_memcpy ( #5686 )
7 months ago
Yuanheng Zhao
db7b3051f4
[Sync] Update from main to feature/colossal-infer (Merge pull request #5685 )
...
[Sync] Update from main to feature/colossal-infer
- Merge pull request #5685 from yuanheng-zhao/inference/merge/main
7 months ago
Steve Luo
725fbd2ed0
[Inference] Remove unnecessary float4_ and rename float8_ to float8 ( #5679 )
7 months ago
Yuanheng Zhao
8754abae24
[Fix] Fix & Update Inference Tests (compatibility w/ main)
7 months ago
Yuanheng Zhao
56ed09aba5
[sync] resolve conflicts of merging main
7 months ago
Yuanheng Zhao
537a3cbc4d
[kernel] Support New KCache Layout - Triton Kernel ( #5677 )
...
* kvmemcpy triton for new kcache layout
* revise tests for new kcache layout
* naive triton flash decoding - new kcache layout
* rotary triton kernel - new kcache layout
* remove redundancy - triton decoding
* remove redundancy - triton kvcache copy
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
7 months ago
傅剑寒
9df016fc45
[Inference] Fix quant bits order ( #5681 )
7 months ago
yuehuayingxueluo
f79963199c
[inference]Add alibi to flash attn function ( #5678 )
...
* add alibi to flash attn function
* rm redundant modifications
7 months ago
傅剑寒
ef8e4ffe31
[Inference/Feat] Add kvcache quant support for fused_rotary_embedding_cache_copy ( #5680 )
7 months ago
Steve Luo
5cd75ce4c7
[Inference/Kernel] refactor kvcache manager and rotary_embedding and kvcache_memcpy oper… ( #5663 )
...
* refactor kvcache manager and rotary_embedding and kvcache_memcpy operator
* refactor decode_kv_cache_memcpy
* enable alibi in pagedattention
7 months ago
yuehuayingxueluo
5f00002e43
[Inference] Adapt Baichuan2-13B TP ( #5659 )
...
* adapt to baichuan2 13B
* add baichuan2 13B TP
* update baichuan tp logic
* rm unused code
* Fix TP logic
* fix alibi slopes tp logic
* rm nn.Module
* Polished the code.
* change BAICHUAN_MODEL_NAME_OR_PATH
* Modified the logic for loading Baichuan weights.
* fix typos
7 months ago
傅剑寒
808ee6e4ad
[Inference/Feat] Feat quant kvcache step2 ( #5674 )
7 months ago
Wang Binluo
d3f34ee8cc
[Shardformer] add assert for num of attention heads divisible by tp_size ( #5670 )
...
* add assert for num of attention heads divisible by tp_size
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
7 months ago
flybird11111
6af6d6fc9f
[shardformer] support bias_gelu_jit_fused for models ( #5647 )
...
* support gelu_bias_fused for gpt2
* support gelu_bias_fused for gpt2
fix
fix
fix
* fix
fix
* fix
7 months ago
Hongxin Liu
7f8b16635b
[misc] refactor launch API and tensor constructor ( #5666 )
...
* [misc] remove config arg from initialize
* [misc] remove old tensor contrusctor
* [plugin] add npu support for ddp
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [devops] fix doc test ci
* [test] fix test launch
* [doc] update launch doc
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
7 months ago
linsj20
91fa553775
[Feature] qlora support ( #5586 )
...
* [feature] qlora support
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* qlora follow commit
* migrate qutization folder to colossalai/
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* minor fixes
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
7 months ago
flybird11111
8954a0c2e2
[LowLevelZero] low level zero support lora ( #5153 )
...
* low level zero support lora
low level zero support lora
* add checkpoint test
* add checkpoint test
* fix
* fix
* fix
* fix
fix
fix
fix
* fix
* fix
fix
fix
fix
fix
fix
fix
* fix
* fix
fix
fix
fix
fix
fix
fix
* fix
* test ci
* git # This is a combination of 3 commits.
Update low_level_zero_plugin.py
Update low_level_zero_plugin.py
fix
fix
fix
* fix naming
fix naming
fix naming
fix
7 months ago
Baizhou Zhang
14b0d4c7e5
[lora] add lora APIs for booster, support lora for TorchDDP ( #4981 )
...
* add apis and peft requirement
* add liscense and implement apis
* add checkpointio apis
* add torchddp fwd_bwd test
* add support_lora methods
* add checkpointio test and debug
* delete unneeded codes
* remove peft from LICENSE
* add concrete methods for enable_lora
* simplify enable_lora api
* fix requirements
7 months ago
Hongxin Liu
c1594e4bad
[devops] fix release docker ci ( #5665 )
7 months ago
Hongxin Liu
4cfbf30a5e
[release] update version ( #5654 )
7 months ago
Tong Li
68ec99e946
[hotfix] add soft link to support required files ( #5661 )
7 months ago
傅剑寒
8ccb6714e7
[Inference/Feat] Add kvcache quantization support for FlashDecoding ( #5656 )
7 months ago
Yuanheng Zhao
5be590b99e
[kernel] Support new KCache Layout - Context Attention Triton Kernel ( #5658 )
...
* add context attn triton kernel - new kcache layout
* add benchmark triton
* tiny revise
* trivial - code style, comment
7 months ago
binmakeswell
b8a711aa2d
[news] llama3 and open-sora v1.1 ( #5655 )
...
* [news] llama3 and open-sora v1.1
* [news] llama3 and open-sora v1.1
7 months ago