ColossalAI/colossalai
Jianghai cf579ff46d
[Inference] Dynamic Batching Inference, online and offline (#4953)
* [inference] Dynamic Batching for Single and Multiple GPUs (#4831)

* finish batch manager

* 1

* first

* fix

* fix dynamic batching

* llama infer

* finish test

* support different lengths generating

* del prints

* del prints

* fix

* fix bug

---------

Co-authored-by: CjhHa1 <cjh18671720497outlook.com>

* [inference] Async dynamic batching  (#4894)

* finish input and output logic

* add generate

* test forward

* 1

* [inference]Re push async dynamic batching (#4901)

* adapt to ray server

* finish async

* finish test

* del test

---------

Co-authored-by: yuehuayingxueluo <867460659@qq.com>

* Revert "[inference]Re push async dynamic batching (#4901)" (#4905)

This reverts commit fbf3c09e67.

* Revert "[inference] Async dynamic batching  (#4894)"

This reverts commit fced140250.

* Revert "[inference] Async dynamic batching  (#4894)" (#4909)

This reverts commit fced140250.

* Add Ray Distributed Environment Init Scripts

* support DynamicBatchManager base function

* revert _set_tokenizer version

* add driver async generate

* add async test

* fix bugs in test_ray_dist.py

* add get_tokenizer.py

* fix code style

* fix bugs about No module named 'pydantic' in ci test

* fix bugs in ci test

* fix bugs in ci test

* fix bugs in ci test

* [infer]Add Ray Distributed Environment Init Scripts (#4911)

* Revert "[inference] Async dynamic batching  (#4894)"

This reverts commit fced140250.

* Add Ray Distributed Environment Init Scripts

* support DynamicBatchManager base function

* revert _set_tokenizer version

* add driver async generate

* add async test

* fix bugs in test_ray_dist.py

* add get_tokenizer.py

* fix code style

* fix bugs about No module named 'pydantic' in ci test

* fix bugs in ci test

* fix bugs in ci test

* fix bugs in ci test

* support dynamic batch for bloom model and is_running function

* [Inference]Test for new Async engine (#4935)

* infer engine

* infer engine

* test engine

* test engine

* new manager

* change step

* add

* test

* fix

* fix

* finish test

* finish test

* finish test

* finish test

* add license

---------

Co-authored-by: yuehuayingxueluo <867460659@qq.com>

* add assertion for config (#4947)

* [Inference] Finish dynamic batching offline test (#4948)

* test

* fix test

* fix quant

* add default

* fix

* fix some bugs

* fix some bugs

* fix

* fix bug

* fix bugs

* reset param

---------

Co-authored-by: yuehuayingxueluo <867460659@qq.com>
Co-authored-by: Cuiqing Li <lixx3527@gmail.com>
Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
2023-10-30 10:52:19 +08:00
..
_C [setup] support pre-build and jit-build of cuda kernels (#2374) 2023-01-06 20:50:26 +08:00
_analyzer [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
amp [feature] Add clip_grad_norm for hybrid_parallel_plugin (#4837) 2023-10-12 11:32:37 +08:00
auto_parallel [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
autochunk [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
booster [gemini] support gradient accumulation (#4869) 2023-10-17 14:07:21 +08:00
checkpoint_io [checkpointio] hotfix torch 2.0 compatibility (#4824) 2023-10-07 10:45:52 +08:00
cli [bug] Fix the version check bug in colossalai run when generating the cmd. (#4713) 2023-09-22 10:50:47 +08:00
cluster [doc] polish shardformer doc (#4779) 2023-09-26 10:57:47 +08:00
context [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
device [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
fx [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
inference [Inference] Dynamic Batching Inference, online and offline (#4953) 2023-10-30 10:52:19 +08:00
interface [lazy] support from_pretrained (#4801) 2023-09-26 11:04:11 +08:00
kernel [Inference] Dynamic Batching Inference, online and offline (#4953) 2023-10-30 10:52:19 +08:00
lazy [doc] add lazy init docs (#4808) 2023-09-27 10:24:04 +08:00
legacy [hotfix] fix torch 2.0 compatibility (#4936) 2023-10-18 11:05:25 +08:00
logging [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
nn [test] add no master test for low level zero plugin (#4934) 2023-10-18 11:41:23 +08:00
pipeline [Pipeline inference] Combine kvcache with pipeline inference (#4938) 2023-10-27 16:19:54 +08:00
shardformer [Pipeline inference] Combine kvcache with pipeline inference (#4938) 2023-10-27 16:19:54 +08:00
tensor [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
testing [test] merge old components to test to model zoo (#4945) 2023-10-20 10:35:08 +08:00
utils [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
zero [gemini] support gradient accumulation (#4869) 2023-10-17 14:07:21 +08:00
__init__.py [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00
initialize.py [misc] update pre-commit and run all files (#4752) 2023-09-19 14:20:26 +08:00