Runyu Lu
|
66abf1c6e8
|
[HotFix] CI,import,requirements-test for #5838 (#5892)
* [Hot Fix] CI,import,requirements-test
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
|
5 months ago |
Runyu Lu
|
cba20525a8
|
[Feat] Diffusion Model(PixArtAlpha/StableDiffusion3) Support (#5838)
* Diffusion Model Inference support
* Stable Diffusion 3 Support
* pixartalpha support
|
5 months ago |
pre-commit-ci[bot]
|
7c2f79fa98
|
[pre-commit.ci] pre-commit autoupdate (#5572)
* [pre-commit.ci] pre-commit autoupdate
updates:
- [github.com/PyCQA/autoflake: v2.2.1 → v2.3.1](https://github.com/PyCQA/autoflake/compare/v2.2.1...v2.3.1)
- [github.com/pycqa/isort: 5.12.0 → 5.13.2](https://github.com/pycqa/isort/compare/5.12.0...5.13.2)
- [github.com/psf/black-pre-commit-mirror: 23.9.1 → 24.4.2](https://github.com/psf/black-pre-commit-mirror/compare/23.9.1...24.4.2)
- [github.com/pre-commit/mirrors-clang-format: v13.0.1 → v18.1.7](https://github.com/pre-commit/mirrors-clang-format/compare/v13.0.1...v18.1.7)
- [github.com/pre-commit/pre-commit-hooks: v4.3.0 → v4.6.0](https://github.com/pre-commit/pre-commit-hooks/compare/v4.3.0...v4.6.0)
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
|
5 months ago |
Li Xingjian
|
8554585a5f
|
[Inference] Fix flash-attn import and add model test (#5794)
* Fix torch int32 dtype
Signed-off-by: char-1ee <xingjianli59@gmail.com>
* Fix flash-attn import
Signed-off-by: char-1ee <xingjianli59@gmail.com>
* Add generalized model test
Signed-off-by: char-1ee <xingjianli59@gmail.com>
* Remove exposed path to model
Signed-off-by: char-1ee <xingjianli59@gmail.com>
* Add default value for use_flash_attn
Signed-off-by: char-1ee <xingjianli59@gmail.com>
* Rename model test
Signed-off-by: char-1ee <xingjianli59@gmail.com>
---------
Signed-off-by: char-1ee <xingjianli59@gmail.com>
|
6 months ago |
char-1ee
|
5f398fc000
|
Pass inference model shard configs for module init
Signed-off-by: char-1ee <xingjianli59@gmail.com>
|
6 months ago |
char-1ee
|
04386d9eff
|
Refactor modeling by adding attention backend
Signed-off-by: char-1ee <xingjianli59@gmail.com>
|
6 months ago |
Runyu Lu
|
18d67d0e8e
|
[Feat]Inference RPC Server Support (#5705)
* rpc support source
* kv cache logical/physical disaggregation
* sampler refactor
* colossalai launch built in
* Unitest
* Rpyc support
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
|
7 months ago |
Runyu Lu
|
e37ee2fb65
|
[Feat]Tensor Model Parallel Support For Inference (#5563)
* tensor parallel support naive source
* [fix]precision, model load and refactor the framework
* add tp unit test
* docstring
* fix do_sample
|
7 months ago |
yuehuayingxueluo
|
f366a5ea1f
|
[Inference/kernel]Add Fused Rotary Embedding and KVCache Memcopy CUDA Kernel (#5418)
* add rotary embedding kernel
* add rotary_embedding_kernel
* add fused rotary_emb and kvcache memcopy
* add fused_rotary_emb_and_cache_kernel.cu
* add fused_rotary_emb_and_memcopy
* fix bugs in fused_rotary_emb_and_cache_kernel.cu
* fix ci bugs
* use vec memcopy and opt the gloabl memory access
* fix code style
* fix test_rotary_embdding_unpad.py
* codes revised based on the review comments
* fix bugs about include path
* rm inline
|
9 months ago |
yuehuayingxueluo
|
cea9c86e45
|
add utils.py
|
10 months ago |