Commit Graph

9 Commits (pre-commit-ci-update-config)

Author SHA1 Message Date
Yuanheng Zhao 283c407a19
[Inference] Fix Inference Generation Config and Sampling (#5710)
* refactor and add

* config default values

* fix gen config passing

* fix rpc generation config
2024-05-19 15:08:42 +08:00
Runyu Lu 18d67d0e8e
[Feat]Inference RPC Server Support (#5705)
* rpc support source
* kv cache logical/physical disaggregation
* sampler refactor
* colossalai launch built in
* Unitest
* Rpyc support

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-05-14 10:00:55 +08:00
yuehuayingxueluo 35382a7fbf
[Inference]Fused the gate and up proj in mlp,and optimized the autograd process. (#5365)
* fused the gate and up proj in mlp

* fix code styles

* opt auto_grad

* rollback test_inference_engine.py

* modifications based on the review feedback.

* fix bugs in flash attn

* Change reshape to view

* fix test_rmsnorm_triton.py
2024-02-06 19:38:25 +08:00
yuehuayingxueluo fa4fbdbffb adapted to pad_context_forward 2024-01-11 13:44:06 +00:00
yuehuayingxueluo 3ad1f3b78b fix beam_width 2024-01-11 13:39:56 +00:00
yuehuayingxueluo bbfebfb9fc fix bugs in sampler 2024-01-11 13:39:56 +00:00
yuehuayingxueluo 02c1bf8b2a add context_attention_unpadded 2024-01-11 13:39:56 +00:00
yuehuayingxueluo 9489dc64d8 precision alignment 2024-01-11 13:39:56 +00:00
Jianghai 0e616462a7 [Inference] add logit processor and request handler (#5166)
* add logit processor and request handler

* add

* add

* add

* fix

* add search tokens and update func

* finish request handler

* add running list test

* fix test

* fix some bug

* add

* add

* fix bugs

* fix some bugs

* fix bug

* fix

* fix

* add copy fun

* del useless attn

* fix request status

---------

Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
2024-01-11 13:39:56 +00:00