Runyu Lu
|
18d67d0e8e
|
[Feat]Inference RPC Server Support (#5705)
* rpc support source
* kv cache logical/physical disaggregation
* sampler refactor
* colossalai launch built in
* Unitest
* Rpyc support
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
|
2024-05-14 10:00:55 +08:00 |
yuehuayingxueluo
|
de4bf3dedf
|
[Inference]Adapt repetition_penalty and no_repeat_ngram_size (#5708)
* Adapt repetition_penalty and no_repeat_ngram_size
* fix no_repeat_ngram_size_logit_process
* remove batch_updated
* fix annotation
* modified codes based on the review feedback.
* rm get_batch_token_ids
|
2024-05-11 15:13:25 +08:00 |
yuehuayingxueluo
|
9c2fe7935f
|
[Inference]Adapt temperature processing logic (#5689)
* Adapt temperature processing logic
* add ValueError for top_p and top_k
* add GQA Test
* fix except_msg
|
2024-05-08 17:58:29 +08:00 |
傅剑寒
|
e6496dd371
|
[Inference] Optimize request handler of llama (#5512)
* optimize request_handler
* fix ways of writing
|
2024-03-26 16:37:14 +08:00 |
Jianghai
|
0e616462a7
|
[Inference] add logit processor and request handler (#5166)
* add logit processor and request handler
* add
* add
* add
* fix
* add search tokens and update func
* finish request handler
* add running list test
* fix test
* fix some bug
* add
* add
* fix bugs
* fix some bugs
* fix bug
* fix
* fix
* add copy fun
* del useless attn
* fix request status
---------
Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
|
2024-01-11 13:39:56 +00:00 |