Commit Graph

19 Commits (df5e9c53cf23d44656470cc319ee0b470c40712f)

Author SHA1 Message Date
Jianghai cf579ff46d
[Inference] Dynamic Batching Inference, online and offline (#4953)
* [inference] Dynamic Batching for Single and Multiple GPUs (#4831)

* finish batch manager

* 1

* first

* fix

* fix dynamic batching

* llama infer

* finish test

* support different lengths generating

* del prints

* del prints

* fix

* fix bug

---------

Co-authored-by: CjhHa1 <cjh18671720497outlook.com>

* [inference] Async dynamic batching  (#4894)

* finish input and output logic

* add generate

* test forward

* 1

* [inference]Re push async dynamic batching (#4901)

* adapt to ray server

* finish async

* finish test

* del test

---------

Co-authored-by: yuehuayingxueluo <867460659@qq.com>

* Revert "[inference]Re push async dynamic batching (#4901)" (#4905)

This reverts commit fbf3c09e67.

* Revert "[inference] Async dynamic batching  (#4894)"

This reverts commit fced140250.

* Revert "[inference] Async dynamic batching  (#4894)" (#4909)

This reverts commit fced140250.

* Add Ray Distributed Environment Init Scripts

* support DynamicBatchManager base function

* revert _set_tokenizer version

* add driver async generate

* add async test

* fix bugs in test_ray_dist.py

* add get_tokenizer.py

* fix code style

* fix bugs about No module named 'pydantic' in ci test

* fix bugs in ci test

* fix bugs in ci test

* fix bugs in ci test

* [infer]Add Ray Distributed Environment Init Scripts (#4911)

* Revert "[inference] Async dynamic batching  (#4894)"

This reverts commit fced140250.

* Add Ray Distributed Environment Init Scripts

* support DynamicBatchManager base function

* revert _set_tokenizer version

* add driver async generate

* add async test

* fix bugs in test_ray_dist.py

* add get_tokenizer.py

* fix code style

* fix bugs about No module named 'pydantic' in ci test

* fix bugs in ci test

* fix bugs in ci test

* fix bugs in ci test

* support dynamic batch for bloom model and is_running function

* [Inference]Test for new Async engine (#4935)

* infer engine

* infer engine

* test engine

* test engine

* new manager

* change step

* add

* test

* fix

* fix

* finish test

* finish test

* finish test

* finish test

* add license

---------

Co-authored-by: yuehuayingxueluo <867460659@qq.com>

* add assertion for config (#4947)

* [Inference] Finish dynamic batching offline test (#4948)

* test

* fix test

* fix quant

* add default

* fix

* fix some bugs

* fix some bugs

* fix

* fix bug

* fix bugs

* reset param

---------

Co-authored-by: yuehuayingxueluo <867460659@qq.com>
Co-authored-by: Cuiqing Li <lixx3527@gmail.com>
Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
2023-10-30 10:52:19 +08:00
Cuiqing Li 3a41e8304e
[Refactor] Integrated some lightllm kernels into token-attention (#4946)
* add some req for inference

* clean codes

* add codes

* add some lightllm deps

* clean codes

* hello

* delete rms files

* add some comments

* add comments

* add doc

* add lightllm deps

* add lightllm cahtglm2 kernels

* add lightllm cahtglm2 kernels

* replace rotary embedding with lightllm kernel

* add some commnets

* add some comments

* add some comments

* add

* replace fwd kernel att1

* fix a arg

* add

* add

* fix token attention

* add some comments

* clean codes

* modify comments

* fix readme

* fix bug

* fix bug

---------

Co-authored-by: cuiqing.li <lixx336@gmail.com>
Co-authored-by: CjhHa1 <cjh18671720497@outlook.com>
2023-10-19 22:22:47 +08:00
ver217 922302263b [misc] update requirements 2023-08-15 23:25:14 +08:00
flybird1111 458ae331ad
[kernel] updated unittests for coloattention (#4389)
Updated coloattention tests of checking outputs and gradients
2023-08-09 14:24:45 +08:00
binmakeswell 089c365fa0
[doc] add Series A Funding and NeurIPS news (#4377)
* [doc] add Series A Funding and NeurIPS news

* [kernal] fix mha kernal

* [CI] skip moe

* [CI] fix requirements
2023-08-04 17:42:07 +08:00
Frank Lee 1beb85cc25
[checkpoint] refactored the API and added safetensors support (#3427)
* [checkpoint] refactored the API and added safetensors support

* polish code
2023-04-04 15:23:01 +08:00
アマデウス e78a1e949a
fix torch 2.0 compatibility (#3346) 2023-03-30 15:25:24 +08:00
CsRic 052b03e83f
limit torch version (#3213)
Co-authored-by: csric <richcsr256@gmail.com>
2023-03-24 13:36:16 +08:00
Frank Lee 93fdd35b5e
[build] fixed the doc build process (#2618) 2023-02-07 14:36:34 +08:00
Jiarui Fang bc0e271e71
[buider] use builder() for cpu adam and fused optim in setup.py (#2187) 2022-12-23 16:05:13 +08:00
Frank Lee 81e0da7fa8
[setup] supported conda-installed torch (#2048)
* [setup] supported conda-installed torch

* polish code
2022-11-30 16:45:15 +08:00
Jiarui Fang 504419d261
[FAW] add cache manager for the cached embedding (#1419) 2022-08-09 15:17:17 +08:00
Frank Lee cf6d1c9284
[CLI] refactored the launch CLI and fixed bugs in multi-node launching (#844)
* [cli] fixed multi-node job launching

* [cli] fixed a bug in version comparison

* [cli] support launching with env var

* [cli] fixed multi-node job launching

* [cli] fixed a bug in version comparison

* [cli] support launching with env var

* added docstring

* [cli] added extra launch arguments

* [cli] added default launch rdzv args

* [cli] fixed version comparison

* [cli] added docstring examples and requierment

* polish docstring

* polish code

* polish code
2022-04-24 13:26:26 +08:00
Frank Lee 01e9f834f5
[dependency] removed torchvision (#833)
* [dependency] removed torchvision

* fixed transforms
2022-04-22 15:24:35 +08:00
Frank Lee 05d9ae5999
[cli] add missing requirement (#805) 2022-04-19 13:56:59 +08:00
Jiarui Fang 54229cd33e
[log] better logging display with rich (#426)
* better logger using rich

* remove deepspeed in zero requirements
2022-03-16 09:51:15 +08:00
BoxiangW a2f1565672
Update GitHub action and pre-commit settings (#196)
* Update GitHub action and pre-commit settings

* Update GitHub action and pre-commit settings (#198)
2022-01-28 16:59:53 +08:00
Frank Lee 3defa32aee
Support TP-compatible Torch AMP and Update trainer API (#27)
* Add gradient accumulation, fix lr scheduler

* fix FP16 optimizer and adapted torch amp with tensor parallel (#18)

* fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes

* fixed trainer

* Revert "fixed trainer"

This reverts commit 2e0b0b7699.

* improved consistency between trainer, engine and schedule (#23)

Co-authored-by: 1SAA <c2h214748@gmail.com>

Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: ver217 <lhx0217@gmail.com>
2021-11-18 19:45:06 +08:00
zbian 404ecbdcc6 Migrated project 2021-10-28 18:21:23 +02:00