Yuanheng Zhao
|
283c407a19
|
[Inference] Fix Inference Generation Config and Sampling (#5710)
* refactor and add
* config default values
* fix gen config passing
* fix rpc generation config
|
6 months ago |
Runyu Lu
|
18d67d0e8e
|
[Feat]Inference RPC Server Support (#5705)
* rpc support source
* kv cache logical/physical disaggregation
* sampler refactor
* colossalai launch built in
* Unitest
* Rpyc support
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
|
6 months ago |
yuehuayingxueluo
|
de4bf3dedf
|
[Inference]Adapt repetition_penalty and no_repeat_ngram_size (#5708)
* Adapt repetition_penalty and no_repeat_ngram_size
* fix no_repeat_ngram_size_logit_process
* remove batch_updated
* fix annotation
* modified codes based on the review feedback.
* rm get_batch_token_ids
|
7 months ago |
yuehuayingxueluo
|
9c2fe7935f
|
[Inference]Adapt temperature processing logic (#5689)
* Adapt temperature processing logic
* add ValueError for top_p and top_k
* add GQA Test
* fix except_msg
|
7 months ago |
傅剑寒
|
e6496dd371
|
[Inference] Optimize request handler of llama (#5512)
* optimize request_handler
* fix ways of writing
|
8 months ago |
Jianghai
|
0e616462a7
|
[Inference] add logit processor and request handler (#5166)
* add logit processor and request handler
* add
* add
* add
* fix
* add search tokens and update func
* finish request handler
* add running list test
* fix test
* fix some bug
* add
* add
* fix bugs
* fix some bugs
* fix bug
* fix
* fix
* add copy fun
* del useless attn
* fix request status
---------
Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
|
11 months ago |