char-1ee
|
5f398fc000
|
Pass inference model shard configs for module init
Signed-off-by: char-1ee <xingjianli59@gmail.com>
|
2024-06-07 08:33:52 +00:00 |
char-1ee
|
04386d9eff
|
Refactor modeling by adding attention backend
Signed-off-by: char-1ee <xingjianli59@gmail.com>
|
2024-06-07 08:33:47 +00:00 |
Runyu Lu
|
18d67d0e8e
|
[Feat]Inference RPC Server Support (#5705)
* rpc support source
* kv cache logical/physical disaggregation
* sampler refactor
* colossalai launch built in
* Unitest
* Rpyc support
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
|
2024-05-14 10:00:55 +08:00 |
Runyu Lu
|
e37ee2fb65
|
[Feat]Tensor Model Parallel Support For Inference (#5563)
* tensor parallel support naive source
* [fix]precision, model load and refactor the framework
* add tp unit test
* docstring
* fix do_sample
|
2024-04-18 16:56:46 +08:00 |
yuehuayingxueluo
|
f366a5ea1f
|
[Inference/kernel]Add Fused Rotary Embedding and KVCache Memcopy CUDA Kernel (#5418)
* add rotary embedding kernel
* add rotary_embedding_kernel
* add fused rotary_emb and kvcache memcopy
* add fused_rotary_emb_and_cache_kernel.cu
* add fused_rotary_emb_and_memcopy
* fix bugs in fused_rotary_emb_and_cache_kernel.cu
* fix ci bugs
* use vec memcopy and opt the gloabl memory access
* fix code style
* fix test_rotary_embdding_unpad.py
* codes revised based on the review comments
* fix bugs about include path
* rm inline
|
2024-03-13 17:20:03 +08:00 |
yuehuayingxueluo
|
cea9c86e45
|
add utils.py
|
2024-01-22 16:06:27 +08:00 |