Commit Graph

6 Commits (c0948aff9777dd18ed5d6adf10671e03e438f6eb)

Author SHA1 Message Date
char-1ee 5f398fc000 Pass inference model shard configs for module init
Signed-off-by: char-1ee <xingjianli59@gmail.com>
2024-06-07 08:33:52 +00:00
char-1ee 04386d9eff Refactor modeling by adding attention backend
Signed-off-by: char-1ee <xingjianli59@gmail.com>
2024-06-07 08:33:47 +00:00
Runyu Lu 18d67d0e8e
[Feat]Inference RPC Server Support (#5705)
* rpc support source
* kv cache logical/physical disaggregation
* sampler refactor
* colossalai launch built in
* Unitest
* Rpyc support

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-05-14 10:00:55 +08:00
Runyu Lu e37ee2fb65
[Feat]Tensor Model Parallel Support For Inference (#5563)
* tensor parallel support naive source

* [fix]precision, model load and refactor the framework

* add tp unit test

* docstring

* fix do_sample
2024-04-18 16:56:46 +08:00
yuehuayingxueluo f366a5ea1f
[Inference/kernel]Add Fused Rotary Embedding and KVCache Memcopy CUDA Kernel (#5418)
* add rotary embedding kernel

* add rotary_embedding_kernel

* add fused rotary_emb and kvcache memcopy

* add fused_rotary_emb_and_cache_kernel.cu

* add fused_rotary_emb_and_memcopy

* fix bugs in fused_rotary_emb_and_cache_kernel.cu

* fix ci bugs

* use vec memcopy and opt the  gloabl memory access

* fix code style

* fix test_rotary_embdding_unpad.py

* codes revised based on the review comments

* fix bugs about include path

* rm inline
2024-03-13 17:20:03 +08:00
yuehuayingxueluo cea9c86e45 add utils.py 2024-01-22 16:06:27 +08:00