Commit Graph

2161 Commits (162251ab7844e4116a36d6e0fec2ac7ccd03f74d)

Author SHA1 Message Date
botbw 13b48ac0aa [zero] solve hang 2024-08-01 10:06:59 +08:00
botbw b5bfeb2efd [moe] implement transit between non moe tp and ep 2024-08-01 10:06:59 +08:00
botbw 37443cc7e4 [test] pass mixtral shardformer test 2024-08-01 10:06:59 +08:00
hxwang 46c069b0db [zero] solve hang 2024-08-01 10:06:59 +08:00
hxwang 0fad23c691 [chore] handle non member group 2024-08-01 10:06:59 +08:00
hxwang a249e71946 [test] mixtra pp shard test 2024-08-01 10:06:59 +08:00
hxwang 8ae8525bdf [moe] fix plugin 2024-08-01 10:06:59 +08:00
hxwang 0b76b57cd6 [test] add mixtral transformer test 2024-08-01 10:06:59 +08:00
hxwang f9b6fcf81f [test] add mixtral for sequence classification 2024-08-01 10:06:59 +08:00
Hongxin Liu 060892162a
[zero] hotfix update master params (#5951) 2024-07-30 13:36:00 +08:00
Runyu Lu bcf0181ecd
[Feat] Distrifusion Acceleration Support for Diffusion Inference (#5895)
* Distrifusion Support source

* comp comm overlap optimization

* sd3 benchmark

* pixart distrifusion bug fix

* sd3 bug fix and benchmark

* generation bug fix

* naming fix

* add docstring, fix counter and shape error

* add reference

* readme and requirement
2024-07-30 10:43:26 +08:00
Hongxin Liu 7b38964e3a
[shardformer] hotfix attn mask (#5947) 2024-07-29 19:10:06 +08:00
Hongxin Liu 9664b1bc19
[shardformer] hotfix attn mask (#5945) 2024-07-29 13:58:27 +08:00
Edenzzzz 2069472e96
[Hotfix] Fix ZeRO typo #5936
Co-authored-by: Edenzzzz <wtan45@wisc.edu>
2024-07-25 09:59:58 +08:00
Hongxin Liu 5fd0592767
[fp8] support all-gather flat tensor (#5932) 2024-07-24 16:55:20 +08:00
Gao, Ruiyuan 5fb958cc83
[FIX BUG] convert env param to int in (#5934) 2024-07-24 10:30:40 +08:00
Insu Jang a521ffc9f8
Add n_fused as an input from native_module (#5894) 2024-07-23 23:15:39 +08:00
Hongxin Liu e86127925a
[plugin] support all-gather overlap for hybrid parallel (#5919)
* [plugin] fixed all-gather overlap support for hybrid parallel
2024-07-18 15:33:03 +08:00
GuangyaoZhang 5b969fd831 fix shardformer fp8 communication training degradation 2024-07-18 07:16:36 +00:00
GuangyaoZhang 6a20f07b80 remove all to all 2024-07-17 07:14:55 +00:00
GuangyaoZhang 5a310b9ee1 fix rebase 2024-07-17 03:43:23 +00:00
GuangyaoZhang 457a0de79f shardformer fp8 2024-07-16 06:56:51 +00:00
アマデウス 530283dba0 fix object_to_tensor usage when torch>=2.3.0 (#5820) 2024-07-16 13:59:25 +08:00
Guangyao Zhang 2e28c793ce [compatibility] support torch 2.2 (#5875)
* Support Pytorch 2.2.2

* keep build_on_pr file and update .compatibility
2024-07-16 13:59:25 +08:00
Guangyao Zhang 1c961b20f3
[ShardFormer] fix qwen2 sp (#5903) 2024-07-15 13:58:06 +08:00
Stephan Kö 45c49dde96
[Auto Parallel]: Speed up intra-op plan generation by 44% (#5446)
* Remove unnecessary calls to deepcopy

* Build DimSpec's difference dict only once

This change considerably speeds up construction speed of DimSpec objects. The difference_dict is the same for each DimSpec object, so a single copy of it is enough.

* Fix documentation of DimSpec's difference method
2024-07-15 12:05:06 +08:00
pre-commit-ci[bot] 51f916b11d [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2024-07-12 07:33:45 +00:00
BurkeHulk 1f1b856354 Merge remote-tracking branch 'origin/feature/fp8_comm' into feature/fp8_comm
# Conflicts:
#	colossalai/quantization/fp8.py
2024-07-12 15:29:41 +08:00
BurkeHulk e88190184a support fp8 communication in pipeline parallelism 2024-07-12 15:25:25 +08:00
BurkeHulk 1e1959467e fix scaling algorithm in FP8 casting 2024-07-12 15:23:37 +08:00
Hongxin Liu c068ef0fa0
[zero] support all-gather overlap (#5898)
* [zero] support all-gather overlap

* [zero] add overlap all-gather flag

* [misc] fix typo

* [zero] update api
2024-07-11 18:59:59 +08:00
GuangyaoZhang dbfa7d39fc fix typo 2024-07-10 08:13:26 +00:00
Guangyao Zhang 669849d74b
[ShardFormer] Add Ulysses Sequence Parallelism support for Command-R, Qwen2 and ChatGLM (#5897) 2024-07-10 11:34:25 +08:00
Edenzzzz fbf33ecd01
[Feature] Enable PP + SP for llama (#5868)
* fix cross-PP-stage position id length diff bug

* fix typo

* fix typo

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use a one cross entropy func for all shardformer models

---------

Co-authored-by: Edenzzzz <wtan45@wisc.edu>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-07-09 18:05:20 +08:00
Runyu Lu 66abf1c6e8
[HotFix] CI,import,requirements-test for #5838 (#5892)
* [Hot Fix] CI,import,requirements-test

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-07-08 22:32:06 +08:00
Runyu Lu cba20525a8
[Feat] Diffusion Model(PixArtAlpha/StableDiffusion3) Support (#5838)
* Diffusion Model Inference support

* Stable Diffusion 3 Support

* pixartalpha support
2024-07-08 16:02:07 +08:00
Edenzzzz 8ec24b6a4d
[Hoxfix] Fix CUDA_DEVICE_MAX_CONNECTIONS for comm overlap
Co-authored-by: Edenzzzz <wtan45@wisc.edu>
2024-07-05 20:02:36 +08:00
Haze188 3420921101
[shardformer] DeepseekMoE support (#5871)
* [Feature] deepseek moe expert parallel implement

* [misc] fix typo, remove redundant file (#5867)

* [misc] fix typo

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [Feature] deepseek support & unit test

* [misc] remove debug code & useless print

* [misc] fix typos (#5872)

* [Feature] remove modeling file, use auto config. (#5884)

* [misc] fix typos

* [Feature] deepseek support via auto model, remove modeling file

* [misc] delete useless file

* [misc] fix typos

* [Deepseek] remove redundant code (#5888)

* [misc] fix typos

* [Feature] deepseek support via auto model, remove modeling file

* [misc] delete useless file

* [misc] fix typos

* [misc] remove redundant code

* [Feature/deepseek] resolve comment. (#5889)

* [misc] fix typos

* [Feature] deepseek support via auto model, remove modeling file

* [misc] delete useless file

* [misc] fix typos

* [misc] remove redundant code

* [misc] mv module replacement into if branch

* [misc] add some warning message and modify some code in unit test

* [misc] fix typos

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-07-05 16:13:58 +08:00
pre-commit-ci[bot] e17f835df7 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2024-07-04 12:47:17 +00:00
Hanks 6991819a97
Merge branch 'hpcaitech:main' into feature/fp8_comm 2024-07-04 20:34:41 +08:00
Hongxin Liu 7afbc81d62
[quant] fix bitsandbytes version check (#5882)
* [quant] fix bitsandbytes version check

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-07-04 11:33:23 +08:00
Wang Binluo 6cd4c32be4
[shardformer] fix the moe (#5883) 2024-07-03 20:02:19 +08:00
Edenzzzz eb24fcd914
[Hotfix] Fix OPT gradient checkpointing forward
Co-authored-by: Edenzzzz <wtan45@wisc.edu>
2024-07-03 14:57:57 +08:00
Haze188 ea94c07b95
[hotfix] fix the bug that large tensor exceed the maximum capacity of TensorBucket (#5879) 2024-07-02 12:42:02 +08:00
pre-commit-ci[bot] 7c2f79fa98
[pre-commit.ci] pre-commit autoupdate (#5572)
* [pre-commit.ci] pre-commit autoupdate

updates:
- [github.com/PyCQA/autoflake: v2.2.1 → v2.3.1](https://github.com/PyCQA/autoflake/compare/v2.2.1...v2.3.1)
- [github.com/pycqa/isort: 5.12.0 → 5.13.2](https://github.com/pycqa/isort/compare/5.12.0...5.13.2)
- [github.com/psf/black-pre-commit-mirror: 23.9.1 → 24.4.2](https://github.com/psf/black-pre-commit-mirror/compare/23.9.1...24.4.2)
- [github.com/pre-commit/mirrors-clang-format: v13.0.1 → v18.1.7](https://github.com/pre-commit/mirrors-clang-format/compare/v13.0.1...v18.1.7)
- [github.com/pre-commit/pre-commit-hooks: v4.3.0 → v4.6.0](https://github.com/pre-commit/pre-commit-hooks/compare/v4.3.0...v4.6.0)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-07-01 17:16:41 +08:00
Jianghai 8ab46b4000
[Shardformer] change qwen2 modeling into gradient checkpointing style (#5874) 2024-07-01 16:45:09 +08:00
HangXu f5a52e1600
fp8 operators for compressed communication
cast_to_fp8, cast_from_fp8, all_reduce_fp8
2024-07-01 13:44:21 +08:00
Haze188 416580b314
[MoE/ZeRO] Moe refactor with zero refactor (#5821)
* [moe] removed openmoe-coupled code and rectify mixstral code (#5471)

* [Feauture] MoE refractor; Intergration with Mixtral  (#5682)

* cherry pick from refractor-moe branch

* tests passed

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* support ep + zero

---------

Co-authored-by: Edenzzzz <wtan45@wisc.edu>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* add mixtral auto policy & move pipeline forward code to modeling folder

* [moe refactor] modify kernel test without Route Class

* [moe refactor] add moe tensor test path environment variable to github workflow

* fix typos

* fix moe test bug due to the code rebase

* [moe refactor] fix moe zero test, and little bug in low level zero

* fix typo

* add moe tensor path to github workflow

* remove some useless code

* fix typo & unify global variable XX_AXIS logic without using -1

* fix typo & prettifier the code

* remove print code & support zero 2 test

* remove useless code

* reanme function

* fix typo

* fix typo

* Further improve the test code

* remove print code

* [moe refactor] change test model from fake moe model to mixtral moe layer and remove useless test

* [moe refactor] skip some unit test which will be refactored later

* [moe refactor] fix unit import error

* [moe refactor] fix circular import issues

* [moe refactor] remove debug code

* [moe refactor] update github workflow

* [moe/zero] refactor low level optimizer (#5767)

* [zero] refactor low level optimizer

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [Feature] MoE refactor with newest version of ZeRO (#5801)

* [zero] remove redundant members in BucketStore (#5802)

* [zero] align api with previous version

* [Moe/Zero] Update MoeHybridParallelPlugin with refactored ZeRO and Fix Zero bug (#5819)

* [moe refactor] update unit test with the refactored ZeRO and remove useless test

* move moe checkpoint to checkpoint folder and exchange global axis to class member

* update moe hybrid parallel plugin with newest version of zero & fix zero working/master params bug

* fix zero unit test

* Add an assertion to prevent users from using it incorrectly

* [hotfix]Solve the compatibility issue of zero refactor (#5823)

* [moe refactor] update unit test with the refactored ZeRO and remove useless test

* move moe checkpoint to checkpoint folder and exchange global axis to class member

* update moe hybrid parallel plugin with newest version of zero & fix zero working/master params bug

* fix zero unit test

* Add an assertion to prevent users from using it incorrectly

* Modify function parameter names to resolve compatibility issues

* [zero] fix missing hook removal (#5824)

* [MoE] Resolve .github conflict (#5829)

* [Fix/Example] Fix Llama Inference Loading Data Type (#5763)

* [fix/example] fix llama inference loading dtype

* revise loading dtype of benchmark llama3

* [release] update version (#5752)

* [release] update version

* [devops] update compatibility test

* [devops] update compatibility test

* [devops] update compatibility test

* [devops] update compatibility test

* [test] fix ddp plugin test

* [test] fix gptj and rpc test

* [devops] fix cuda ext compatibility

* [inference] fix flash decoding test

* [inference] fix flash decoding test

* fix (#5765)

* [test] Fix/fix testcase (#5770)

* [fix] branch for fix testcase;

* [fix] fix test_analyzer & test_auto_parallel;

* [fix] remove local change about moe;

* [fix] rm local change moe;

* [Hotfix] Add missing init file in inference.executor (#5774)

* [CI/tests] simplify some test case to reduce testing time (#5755)

* [ci/tests] simplify some test case to reduce testing time

* [ci/tests] continue to remove test case to reduce ci time cost

* restore some test config

* [ci/tests] continue to reduce ci time cost

* [misc] update dockerfile (#5776)

* [misc] update dockerfile

* [misc] update dockerfile

* [devops] fix docker ci (#5780)

* [Inference]Add Streaming LLM (#5745)

* Add Streaming LLM

* add some parameters to llama_generation.py

* verify streamingllm config

* add test_streamingllm.py

* modified according to the opinions of review

* add Citation

* change _block_tables tolist

* [hotfix] fix llama flash attention forward (#5777)

* [misc] Accelerate CI for zero and dist optim (#5758)

* remove fp16 from lamb

* remove d2h copy in checking states

---------

Co-authored-by: Edenzzzz <wtan45@wisc.edu>

* [Test/CI] remove test cases to reduce CI duration (#5753)

* [test] smaller gpt2 test case

* [test] reduce test cases: tests/test_zero/test_gemini/test_zeroddp_state_dict.py

* [test] reduce test cases: tests/test_zero/test_gemini/test_grad_accum.py

* [test] reduce test cases tests/test_zero/test_gemini/test_optim.py

* Revert "[test] smaller gpt2 test case"

Some tests might depend on the size of model (num of chunks)

This reverts commit df705a5210.

* [test] reduce test cases: tests/test_checkpoint_io/test_gemini_checkpoint_io.py

* [CI] smaller test model for two mwo the two modifid cases

* [CI] hardcode gpt model for tests/test_zero/test_gemini/test_search.py since we need a fixed answer there

* [hotfix] fix testcase in test_fx/test_tracer (#5779)

* [fix] branch for fix testcase;

* [fix] fix test_analyzer & test_auto_parallel;

* [fix] remove local change about moe;

* [fix] rm local change moe;

* [fix] fix test_deepfm_model & test_dlrf_model;

* [fix] fix test_hf_albert & test_hf_gpt;

* [gemini] optimize reduce scatter d2h copy (#5760)

* [gemini] optimize reduce scatter d2h copy

* [fix] fix missing reduce variable

* [refactor] remove legacy async reduce scatter code

* [gemini] missing sync

* Revert "[refactor] remove legacy async reduce scatter code"

This reverts commit 58ad76d466.

* [gemini] further optimize with async all reduce

* [fix] pass flag from manager to chunk

* Allow building cuda extension without a device. (#5535)

Added FORCE_CUDA environment variable support, to enable building extensions where a GPU device is not present but cuda libraries are.

* [misc] fix dist logger (#5782)

* [install]fix setup (#5786)

* fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [misc] update requirements (#5787)

* [shardformer] fix import (#5788)

* upgrade colossal-chat support tp_group>1, add sp for sft

* upgrade ppo dpo rm script

* run pre-commit

* moupdate ci tests, st ci test cases passed, tp failed in generation for ppo, sp is buggy

* fix training script

* fix ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix transformers version

* remove duplicated test

* fix datasets version

* remove models that require huggingface auth from ci

* remove local data path

* update ci

* remove baichuan from template test due to transformer version conflict

* merge

* Refactor modeling by adding attention backend

Signed-off-by: char-1ee <xingjianli59@gmail.com>

* Fix tests and naming

Signed-off-by: char-1ee <xingjianli59@gmail.com>

* Pass inference model shard configs for module init

Signed-off-by: char-1ee <xingjianli59@gmail.com>

* Clean up

Signed-off-by: char-1ee <xingjianli59@gmail.com>

* replace the customized dataloader setup with the build-in one

* replace the customized dataloader setup with the build-in one

* Remove flash attention backend

Signed-off-by: char-1ee <xingjianli59@gmail.com>

* fix readme

* Fix test import

Signed-off-by: char-1ee <xingjianli59@gmail.com>

* update sft trainning script

* [Inference]refactor baichuan (#5791)

* refactor baichuan

* remove unused code and add TODO for lazyinit

* [test] fix chatglm test kit (#5793)

* [shardformer] fix modeling of bloom and falcon (#5796)

* [test] fix qwen2 pytest distLarge (#5797)

* [Inference] Fix flash-attn import and add model test (#5794)

* Fix torch int32 dtype

Signed-off-by: char-1ee <xingjianli59@gmail.com>

* Fix flash-attn import

Signed-off-by: char-1ee <xingjianli59@gmail.com>

* Add generalized model test

Signed-off-by: char-1ee <xingjianli59@gmail.com>

* Remove exposed path to model

Signed-off-by: char-1ee <xingjianli59@gmail.com>

* Add default value for use_flash_attn

Signed-off-by: char-1ee <xingjianli59@gmail.com>

* Rename model test

Signed-off-by: char-1ee <xingjianli59@gmail.com>

---------

Signed-off-by: char-1ee <xingjianli59@gmail.com>

* [Gemini] Use async stream to prefetch and h2d data moving (#5781)

* use async stream to prefetch and h2d data moving

* Remove redundant code

* [gemini] quick fix on possible async operation (#5803)

* [gemini] quick fix on possible async operation

* [gemini] quick fix on possible async operation

* [shardformer] upgrade transformers to 4.39.3 (#5815)

* [shardformer]upgrade transformers for gpt2/gptj/whisper (#5807)

* [shardformer] fix modeling of gpt2 and gptj

* [shardformer] fix whisper modeling

* [misc] update requirements

---------

Co-authored-by: ver217 <lhx0217@gmail.com>

* [shardformer]upgrade transformers for mistral (#5808)

* upgrade transformers for mistral

* fix

* fix

* [shardformer]upgrade transformers for llama (#5809)

* update transformers

fix

* fix

* fix

* [inference] upgrade transformers (#5810)

* update transformers

fix

* fix

* fix

* fix

* fix

* [gemini] update transformers for gemini (#5814)

---------

Co-authored-by: ver217 <lhx0217@gmail.com>

* Support 4d parallel + flash attention (#5789)

* support tp + sp + pp

* remove comments

---------

Co-authored-by: Edenzzzz <wtan45@wisc.edu>

---------

Signed-off-by: char-1ee <xingjianli59@gmail.com>
Co-authored-by: Yuanheng Zhao <54058983+yuanheng-zhao@users.noreply.github.com>
Co-authored-by: Hongxin Liu <lhx0217@gmail.com>
Co-authored-by: flybird11111 <1829166702@qq.com>
Co-authored-by: duanjunwen <935724073@qq.com>
Co-authored-by: yuehuayingxueluo <867460659@qq.com>
Co-authored-by: Edenzzzz <wenxuan.tan@wisc.edu>
Co-authored-by: Edenzzzz <wtan45@wisc.edu>
Co-authored-by: botbw <wang1570@e.ntu.edu.sg>
Co-authored-by: Charles Coulombe <ccoulombe@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: YeAnbang <anbangy2@outlook.com>
Co-authored-by: char-1ee <xingjianli59@gmail.com>
Co-authored-by: Runyu Lu <77330637+LRY89757@users.noreply.github.com>
Co-authored-by: YeAnbang <44796419+YeAnbang@users.noreply.github.com>
Co-authored-by: Guangyao Zhang <xjtu521@qq.com>

* [zero] fix hook bug

* [zero] add low level optimizer back (#5839)

* [zero] fix param & refactor

* [zero] add back original low level opt

* [zero] remove moe related

* [zero] pass zero tests

* [zero] refactor

* [chore] add del func back

* [zero] comments and naming (#5840)

* [zero] modify api (#5843)

* [zero] modify api

* [test] remove _grad_store access in tests

* [test] fix (#5857)

* [CI] skip openmoe CI check

* [CI] fox pre-commit

* [zero] remove redundant memebr init (#5862)

* [misc] remove useless code, modify the pg mesh implementation

* [misc] remove useless code, modify the pg mesh implementation

* [misc] use tempfile

* resolve conflict with main branch

* [misc] use tempfile in test_moe_checkpoint.py

* [misc] remove useless code, add assertion about sequence parallel, move logger into function

* [misc] remove useless code

---------

Signed-off-by: char-1ee <xingjianli59@gmail.com>
Co-authored-by: Frank Lee <somerlee.9@gmail.com>
Co-authored-by: Edenzzzz <wenxuan.tan@wisc.edu>
Co-authored-by: Edenzzzz <wtan45@wisc.edu>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: botbw <wang1570@e.ntu.edu.sg>
Co-authored-by: Yuanheng Zhao <54058983+yuanheng-zhao@users.noreply.github.com>
Co-authored-by: Hongxin Liu <lhx0217@gmail.com>
Co-authored-by: flybird11111 <1829166702@qq.com>
Co-authored-by: duanjunwen <935724073@qq.com>
Co-authored-by: yuehuayingxueluo <867460659@qq.com>
Co-authored-by: Charles Coulombe <ccoulombe@users.noreply.github.com>
Co-authored-by: YeAnbang <anbangy2@outlook.com>
Co-authored-by: char-1ee <xingjianli59@gmail.com>
Co-authored-by: Runyu Lu <77330637+LRY89757@users.noreply.github.com>
Co-authored-by: YeAnbang <44796419+YeAnbang@users.noreply.github.com>
Co-authored-by: Guangyao Zhang <xjtu521@qq.com>
2024-06-28 14:00:08 +08:00
flybird11111 773d9f964a
[shardformer]delete xformers (#5859)
* delete xformers

* fix

* fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-06-28 11:20:04 +08:00
Runyu Lu 3c7cda0c9a
[Inference]Lazy Init Support (#5785)
* lazy init support

* lazy init llama support

* :lazy init support for baichuan

* aligh rpc

* add note for baichuan

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-06-27 18:02:15 +08:00
Guangyao Zhang d9d5e7ea1f
[shardformer] Support the T5ForTokenClassification model (#5816)
* t5 token, still pytest fail

* Resolve T5 Pytest Failure

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typos

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-06-27 16:40:38 +08:00
Hongxin Liu 5dfbcd7746
[zero] use bucket during allgather (#5860)
* [zero] use bucket during allgather

* [zero] rename api
2024-06-27 16:34:44 +08:00
botbw 8e718a1421
[gemini] fixes for benchmarking (#5847)
* [gemini] fix missing return

* [gemini] fix missing arg pass

* [gemini] use gather tensor instead of list

* [test] enable flash attention for benchmark by default

* [test] enable flash attention for benchmark by default

---------

Co-authored-by: genghaozhe <939857490@qq.com>
2024-06-26 15:52:09 +08:00
Edenzzzz 2a25a2aff7
[Feature] optimize PP overlap (#5735)
* update to fully overlap, still debugging

* improve interface

* fixed deadlock bug

* debug NaN loss

* (experimental) use one comm group for send_fw_recv_fw to fix NaN

* cleaned up interfaces; use one batch p2p for all

* clean up; removed the double p2p batch case

* p2p test passsed

* improve overlap: send fwd before backward

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* tentatively use 2 p2p batches

* remove two p2p batches

* fix typos

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove pp.sh

---------

Co-authored-by: Edenzzzz <wtan45@wisc.edu>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: root <root@notebook-c55824c0-7742-45e8-9591-c855bb77ad29-0.notebook-c55824c0-7742-45e8-9591-c855bb77ad29.colossal-ai.svc.cluster.local>
2024-06-26 14:48:02 +08:00
botbw 8a5c86439a
[gemini] fix missing return (#5845) 2024-06-21 11:38:40 +08:00
Yuanheng Zhao 7b249c76e5
[Fix] Fix spec-dec Glide LlamaModel for compatibility with transformers (#5837)
* fix glide llama model

* revise
2024-06-19 15:37:53 +08:00
Kai Lv 0adca5b688
[launch] Support IPv4 host initialization in launch (#5822) 2024-06-18 19:18:29 +08:00
GuangyaoZhang d84d68601a change 'xxx if xxx else None' to 'xxx or None' 2024-06-18 03:32:42 +00:00
GuangyaoZhang a83a2336e8 rebase master llama change 2024-06-18 02:56:47 +00:00
GuangyaoZhang 363cde6957 merge model and attention forward 2024-06-18 02:32:41 +00:00
GuangyaoZhang 7a2b08646f Remove CohereLayerNorm and use existing layernorm 2024-06-18 02:32:41 +00:00
GuangyaoZhang fe2e74c03a fix precommit 2024-06-18 02:31:33 +00:00
GuangyaoZhang f656d61778 change command 2024-06-18 02:31:33 +00:00
GuangyaoZhang 0b81163bc0 Copy llama to command 2024-06-18 02:31:33 +00:00
Edenzzzz 8795bb2e80
Support 4d parallel + flash attention (#5789)
* support tp + sp + pp

* remove comments

---------

Co-authored-by: Edenzzzz <wtan45@wisc.edu>
2024-06-17 17:40:47 +08:00
flybird11111 2ddf624a86
[shardformer] upgrade transformers to 4.39.3 (#5815)
* [shardformer]upgrade transformers for gpt2/gptj/whisper (#5807)

* [shardformer] fix modeling of gpt2 and gptj

* [shardformer] fix whisper modeling

* [misc] update requirements

---------

Co-authored-by: ver217 <lhx0217@gmail.com>

* [shardformer]upgrade transformers for mistral (#5808)

* upgrade transformers for mistral

* fix

* fix

* [shardformer]upgrade transformers for llama (#5809)

* update transformers

fix

* fix

* fix

* [inference] upgrade transformers (#5810)

* update transformers

fix

* fix

* fix

* fix

* fix

* [gemini] update transformers for gemini (#5814)

---------

Co-authored-by: ver217 <lhx0217@gmail.com>
2024-06-14 10:59:33 +08:00
botbw 3bcbba9262
[gemini] quick fix on possible async operation (#5803)
* [gemini] quick fix on possible async operation

* [gemini] quick fix on possible async operation
2024-06-13 10:35:17 +08:00
Haze188 d9dddf574f
[Gemini] Use async stream to prefetch and h2d data moving (#5781)
* use async stream to prefetch and h2d data moving

* Remove redundant code
2024-06-12 15:48:52 +08:00
Li Xingjian 8554585a5f
[Inference] Fix flash-attn import and add model test (#5794)
* Fix torch int32 dtype

Signed-off-by: char-1ee <xingjianli59@gmail.com>

* Fix flash-attn import

Signed-off-by: char-1ee <xingjianli59@gmail.com>

* Add generalized model test

Signed-off-by: char-1ee <xingjianli59@gmail.com>

* Remove exposed path to model

Signed-off-by: char-1ee <xingjianli59@gmail.com>

* Add default value for use_flash_attn

Signed-off-by: char-1ee <xingjianli59@gmail.com>

* Rename model test

Signed-off-by: char-1ee <xingjianli59@gmail.com>

---------

Signed-off-by: char-1ee <xingjianli59@gmail.com>
2024-06-12 14:13:50 +08:00
Hongxin Liu aa125bcc91
[shardformer] fix modeling of bloom and falcon (#5796) 2024-06-11 17:43:50 +08:00
Runyu Lu c0948aff97
[Inference]refactor baichuan (#5791)
* refactor baichuan

* remove unused code and add TODO for lazyinit
2024-06-11 10:52:01 +08:00
char-1ee f5981e808e Remove flash attention backend
Signed-off-by: char-1ee <xingjianli59@gmail.com>
2024-06-07 10:02:19 +00:00
char-1ee ceba662d22 Clean up
Signed-off-by: char-1ee <xingjianli59@gmail.com>
2024-06-07 09:09:29 +00:00
char-1ee 5f398fc000 Pass inference model shard configs for module init
Signed-off-by: char-1ee <xingjianli59@gmail.com>
2024-06-07 08:33:52 +00:00
char-1ee eec77e5702 Fix tests and naming
Signed-off-by: char-1ee <xingjianli59@gmail.com>
2024-06-07 08:33:47 +00:00
char-1ee 04386d9eff Refactor modeling by adding attention backend
Signed-off-by: char-1ee <xingjianli59@gmail.com>
2024-06-07 08:33:47 +00:00
Hongxin Liu 73e88a5553
[shardformer] fix import (#5788) 2024-06-06 19:09:50 +08:00
Hongxin Liu b9d646fe9e
[misc] fix dist logger (#5782) 2024-06-05 15:04:22 +08:00
botbw 3f7e3131d9
[gemini] optimize reduce scatter d2h copy (#5760)
* [gemini] optimize reduce scatter d2h copy

* [fix] fix missing reduce variable

* [refactor] remove legacy async reduce scatter code

* [gemini] missing sync

* Revert "[refactor] remove legacy async reduce scatter code"

This reverts commit 58ad76d466.

* [gemini] further optimize with async all reduce

* [fix] pass flag from manager to chunk
2024-06-05 14:23:13 +08:00
Edenzzzz 79f7a7b211
[misc] Accelerate CI for zero and dist optim (#5758)
* remove fp16 from lamb

* remove d2h copy in checking states

---------

Co-authored-by: Edenzzzz <wtan45@wisc.edu>
2024-06-05 11:25:19 +08:00
flybird11111 50b4c8e8cf
[hotfix] fix llama flash attention forward (#5777) 2024-06-05 10:56:47 +08:00
yuehuayingxueluo b45000f839
[Inference]Add Streaming LLM (#5745)
* Add Streaming LLM

* add some parameters to llama_generation.py

* verify streamingllm config

* add test_streamingllm.py

* modified according to the opinions of review

* add Citation

* change _block_tables tolist
2024-06-05 10:51:19 +08:00
Yuanheng Zhao 406443200f
[Hotfix] Add missing init file in inference.executor (#5774) 2024-06-03 22:29:39 +08:00
duanjunwen 1b76564e16
[test] Fix/fix testcase (#5770)
* [fix] branch for fix testcase;

* [fix] fix test_analyzer & test_auto_parallel;

* [fix] remove local change about moe;

* [fix] rm local change moe;
2024-06-03 15:26:01 +08:00
flybird11111 3f2be80530
fix (#5765) 2024-06-03 11:25:18 +08:00
botbw 023ea13cb5
Merge pull request #5749 from hpcaitech/prefetch
[Gemini] Prefetch next chunk before each op
2024-05-29 15:35:54 +08:00
hxwang 8547562884 [chore] remove unnecessary assert since compute list might not be recorded 2024-05-28 05:16:02 +00:00
hxwang e5e3320948 [bug] continue fix 2024-05-28 02:41:23 +00:00
hxwang 936dd96dbb [bug] workaround for idx fix 2024-05-28 02:33:12 +00:00
Edenzzzz 5f8c0a0ac3
[Feature] auto-cast optimizers to distributed version (#5746)
* auto-cast optimizers to distributed

* fix galore casting

* logger

---------

Co-authored-by: Edenzzzz <wtan45@wisc.edu>
2024-05-24 17:24:16 +08:00
hxwang ff507b755e Merge branch 'main' of github.com:hpcaitech/ColossalAI into prefetch 2024-05-24 04:05:07 +00:00
botbw 2fc85abf43
[gemini] async grad chunk reduce (all-reduce&reduce-scatter) (#5713)
* [gemini] async grad chunk reduce (all-reduce&reduce-scatter)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [gemini] add test

* [gemini] rename func

* [gemini] update llama benchmark

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [gemini] use tensor counter

* [gemini] change default config in GeminiPlugin and GeminiDDP

* [chore] typo

* [gemini] fix sync issue & add test cases

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-05-24 10:31:16 +08:00
Jianghai 85946d4236
[Inference]Fix readme and example for API server (#5742)
* fix chatapi readme and example

* updating doc

* add an api and change the doc

* remove

* add credits and del 'API' heading

* readme

* readme
2024-05-24 10:03:05 +08:00
hxwang 15d21a077a Merge remote-tracking branch 'origin/main' into prefetch 2024-05-23 15:49:33 +00:00
binmakeswell 4647ec28c8
[inference] release (#5747)
* [inference] release

* [inference] release

* [inference] release

* [inference] release

* [inference] release

* [inference] release

* [inference] release
2024-05-23 17:44:06 +08:00
Yuanheng Zhao df6747603f
[Colossal-Inference] (v0.1.0) Merge pull request #5739 from hpcaitech/feature/colossal-infer
[Inference] Merge feature/colossal-infer
2024-05-22 14:31:09 +08:00
Yuanheng Zhao bd38fe6b91
[NFC] Fix code factors on inference triton kernels (#5743) 2024-05-21 22:12:15 +08:00
botbw 13c06d36a3
[bug] fix early return (#5740)
* [bug] fix silly bug

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [chore] add test for prefetch

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-05-21 14:21:58 +08:00
Haze188 22ce873c3f
[Shardformer] Add parallel output for shardformer models(bloom, falcon) (#5702)
* [pre-commit.ci] auto fixes from pre-commit.com hooks

* add parallel cross entropy output for falcon model & fix some typos in bloom.py

* fix module name error, self.model -> self.transformers in bloom, falcon model

* Fix the overflow bug of distributed cross entropy loss function when training with fp16

* add dtype to parallel cross entropy loss function

* fix dtype related typos adn prettify the loss.py

* fix grad dtype and update dtype mismatch error

* fix typo bugs
2024-05-21 11:07:13 +08:00
pre-commit-ci[bot] b3c0e6d871 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2024-05-21 02:09:15 +00:00
hxwang 137a7c341b [chore] fix init error 2024-05-21 02:07:21 +00:00
Yuanheng Zhao 8633c15da9 [sync] Sync feature/colossal-infer with main 2024-05-20 15:50:53 +00:00
Yuanheng Zhao d8b1ea4ac9
[doc] Update Inference Readme (#5736)
* [doc] update inference readme

* add contents

* trivial
2024-05-20 22:50:04 +08:00
Yuanheng Zhao bdf9a001d6
[Fix/Inference] Add unsupported auto-policy error message (#5730)
* [fix] auto policy error message

* trivial
2024-05-20 22:49:18 +08:00
genghaozhe 90d8d0183c remove personal comments 2024-05-20 07:28:20 +00:00
genghaozhe bfcb2d1ff8 refactor the code structure to solve the circular import 2024-05-20 07:25:24 +00:00
genghaozhe 1ec92d29af remove perf log, unrelated file and so on 2024-05-20 05:23:26 +00:00
genghaozhe 5c6c5d6be3 remove comments 2024-05-20 05:23:12 +00:00
genghaozhe 7416e4943b fix conflicts to beautify the code 2024-05-20 04:09:51 +00:00
genghaozhe d22bf30ca6 implement auto policy prefetch and modify a little origin code. 2024-05-20 04:01:53 +00:00
pre-commit-ci[bot] f1918e18a5 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2024-05-20 03:00:07 +00:00
hxwang a55a9e298b [gemini] init auto policy prefetch 2024-05-20 02:21:17 +00:00
Yuanheng Zhao 283c407a19
[Inference] Fix Inference Generation Config and Sampling (#5710)
* refactor and add

* config default values

* fix gen config passing

* fix rpc generation config
2024-05-19 15:08:42 +08:00
genghaozhe 06a3a100b3 remove unrelated code 2024-05-17 10:57:49 +00:00
genghaozhe 3d625ca836 add some todo Message 2024-05-17 10:55:28 +00:00
flybird11111 9d83c6d715
[lazy] fix lazy cls init (#5720)
* fix

* fix

* fix

* fix

* fix

* remove kernel intall

* rebase

revert

fix

* fix

* fix
2024-05-17 18:18:59 +08:00
botbw e57812c672
[chore] Update placement_policy.py 2024-05-17 13:42:18 +08:00
Yuanheng Zhao 8bcfe360fd
[example] Update Inference Example (#5725)
* [example] update inference example
2024-05-17 11:28:53 +08:00
genghaozhe 013690a86b remove set(all_chunks) 2024-05-16 09:57:51 +00:00
hxwang 6efbadba25 [chore] remove debugging info 2024-05-16 16:46:39 +08:00
hxwang 20701d4533 [chore] remove print 2024-05-16 16:45:50 +08:00
hxwang f45f8a2aa7 [gemini] maxprefetch means maximum work to keep 2024-05-16 16:12:53 +08:00
genghaozhe fc2248cf99 Merge branch 'prefetch' of github.com:botbw/ColossalAI into feature/prefetch 2024-05-16 08:05:32 +00:00
pre-commit-ci[bot] 6bbe956316 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2024-05-16 07:26:20 +00:00
hxwang 82b25524ff Merge branch 'prefetch' of github.com:botbw/ColossalAI into prefetch 2024-05-16 07:25:22 +00:00
genghaozhe 1f6b57099c Merge branch 'prefetch' of github.com:botbw/ColossalAI into botbw-prefetch 2024-05-16 07:23:40 +00:00
hxwang 2e68eebdfe [chore] refactor & sync 2024-05-16 07:22:10 +00:00
pre-commit-ci[bot] 5bedea6e10 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2024-05-16 05:20:01 +00:00
hxwang 4148ceed9f [gemini] use compute_chunk to find next chunk 2024-05-16 13:17:26 +08:00
hxwang b2e9745888 [chore] sync 2024-05-16 04:45:06 +00:00
hxwang 6e38eafebe [gemini] prefetch chunks 2024-05-15 16:51:44 +08:00
Jianghai f47f2fbb24
[Inference] Fix API server, test and example (#5712)
* fix api server

* fix generation config

* fix api server

* fix comments

* fix infer hanging bug

* resolve comments, change backend to free port
2024-05-15 15:47:31 +08:00
Runyu Lu 74c47921fa
[Fix] Llama3 Load/Omit CheckpointIO Temporarily (#5717)
* Fix Llama3 Load error
* Omit Checkpoint IO Temporarily
2024-05-14 20:17:43 +08:00
Edenzzzz 43995ee436
[Feature] Distributed optimizers: Lamb, Galore, CAME and Adafactor (#5694)
* [feat] Add distributed lamb; minor fixes in DeviceMesh (#5476)

* init: add dist lamb; add debiasing for lamb

* dist lamb tester mostly done

* all tests passed

* add comments

* all tests passed. Removed debugging statements

* moved setup_distributed inside plugin. Added dist layout caching

* organize better

---------

Co-authored-by: Edenzzzz <wtan45@wisc.edu>

* [hotfix] Improve tester precision by removing ZeRO on vanilla lamb (#5576)

Co-authored-by: Edenzzzz <wtan45@wisc.edu>

* [optim] add distributed came (#5526)

* test CAME under LowLevelZeroOptimizer wrapper

* test CAME TP row and col pass

* test CAME zero pass

* came zero add master and worker param id convert

* came zero test pass

* came zero test pass

* test distributed came passed

* reform code, Modify some expressions and add comments

* minor fix of test came

* minor fix of dist_came and test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor fix of dist_came and test

* rebase dist-optim

* rebase dist-optim

* fix remaining comments

* add test dist came using booster api

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [optim] Distributed Adafactor (#5484)

* [feature] solve conflict; update optimizer readme;

* [feature] update optimize readme;

* [fix] fix testcase;

* [feature] Add transformer-bert to testcase;solve a bug related to indivisible shape (induction in use_zero and tp is row parallel);

* [feature] Add transformers_bert model zoo in testcase;

* [feature] add user documentation to docs/source/feature.

* [feature] add API Reference & Sample to optimizer Readme; add state check for bert exam;

* [feature] modify user documentation;

* [fix] fix readme format issue;

* [fix] add zero=0 in testcase; cached augment in dict;

* [fix] fix percision issue;

* [feature] add distributed rms;

* [feature] remove useless comment in testcase;

* [fix] Remove useless test; open zero test; remove fp16 test in bert exam;

* [feature] Extract distributed rms function;

* [feature] add booster + lowlevelzeroPlugin in test;

* [feature] add Start_with_booster_API case in md; add Supporting Information in md;

* [fix] Also remove state movement in base adafactor;

* [feature] extract factor function;

* [feature] add LowLevelZeroPlugin test;

* [fix] add tp=False and zero=True in logic;

* [fix] fix use zero logic;

* [feature] add row residue logic in column parallel factor;

* [feature] add check optim state func;

* [feature] Remove duplicate logic;

* [feature] update optim state check func and percision test bug;

* [fix] update/fix optim state; Still exist percision issue;

* [fix] Add use_zero check in _rms; Add plugin support info in Readme; Add Dist Adafactor init Info;

* [feature] removed print & comments in utils;

* [feature] uodate Readme;

* [feature] add LowLevelZeroPlugin test with Bert model zoo;

* [fix] fix logic in _rms;

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [fix] remove comments in testcase;

* [feature] add zh-Han Readme;

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [Feature] refractor dist came; fix percision error; add low level zero test with bert model zoo; (#5676)

* [feature] daily update;

* [fix] fix dist came;

* [feature] refractor dist came; fix percision error; add low level zero test with bert model zoo;

* [fix] open rms; fix low level zero test; fix dist came test function name;

* [fix] remove redundant test;

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [Feature] Add Galore (Adam, Adafactor) and distributed GaloreAdamW8bit (#5570)

* init: add dist lamb; add debiasing for lamb

* dist lamb tester mostly done

* all tests passed

* add comments

* all tests passed. Removed debugging statements

* moved setup_distributed inside plugin. Added dist layout caching

* organize better

* update comments

* add initial distributed galore

* add initial distributed galore

* add galore set param utils; change setup_distributed interface

* projected grad precision passed

* basic precision tests passed

* tests passed; located svd precision issue in fwd-bwd; banned these tests

* Plugin DP + TP tests passed

* move get_shard_dim to d_tensor

* add comments

* remove useless files

* remove useless files

* fix zero typo

* improve interface

* remove moe changes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix import

* fix deepcopy

* update came & adafactor to main

* fix param map

* fix typo

---------

Co-authored-by: Edenzzzz <wtan45@wisc.edu>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [Hotfix] Remove one buggy test case from dist_adafactor for now (#5692)


Co-authored-by: Edenzzzz <wtan45@wisc.edu>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

---------

Co-authored-by: Edenzzzz <wtan45@wisc.edu>
Co-authored-by: chongqichuizi875 <107315010+chongqichuizi875@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: duanjunwen <54985467+duanjunwen@users.noreply.github.com>
Co-authored-by: Hongxin Liu <lhx0217@gmail.com>
2024-05-14 13:52:45 +08:00
Steve Luo 7806842f2d
add paged-attetionv2: support seq length split across thread block (#5707) 2024-05-14 12:46:54 +08:00
Runyu Lu 18d67d0e8e
[Feat]Inference RPC Server Support (#5705)
* rpc support source
* kv cache logical/physical disaggregation
* sampler refactor
* colossalai launch built in
* Unitest
* Rpyc support

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-05-14 10:00:55 +08:00
hugo-syn 393c8f5b7f
[hotfix] fix inference typo (#5438) 2024-05-13 21:06:44 +08:00
yuehuayingxueluo de4bf3dedf
[Inference]Adapt repetition_penalty and no_repeat_ngram_size (#5708)
* Adapt repetition_penalty and no_repeat_ngram_size

* fix no_repeat_ngram_size_logit_process

* remove batch_updated

* fix annotation

* modified codes based on the review feedback.

* rm get_batch_token_ids
2024-05-11 15:13:25 +08:00
Wang Binluo 537f6a3855
[Shardformer]fix the num_heads assert for llama model and qwen model (#5704)
* fix the num_heads assert

* fix the transformers import

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix the import

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-05-10 15:33:39 +08:00
Wang Binluo a3cc68ca93
[Shardformer] Support the Qwen2 model (#5699)
* feat: support qwen2 model

* fix: modify model config and add Qwen2RMSNorm

* fix qwen2 model conflicts

* test: add qwen2 shard test

* to: add qwen2 auto policy

* support qwen model

* fix the conflicts

* add try catch

* add transformers version for qwen2

* add the ColoAttention for the qwen2 model

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add the unit test version check

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix the test input bug

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix the version check

* fix the version check

---------

Co-authored-by: Wenhao Chen <cwher@outlook.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-05-09 20:04:25 +08:00
傅剑寒 bfad39357b
[Inference/Feat] Add quant kvcache interface (#5700)
* add quant kvcache interface

* delete unused output

* complete args comments
2024-05-09 18:03:24 +08:00
flybird11111 d4c5ef441e
[gemini]remove registered gradients hooks (#5696)
* fix gemini

fix gemini

* fix

fix
2024-05-09 10:29:49 +08:00
CjhHa1 bc9063adf1 resolve rebase conflicts on Branch feat/online-serving 2024-05-08 15:20:53 +00:00
Jianghai 61a1b2e798 [Inference] Fix bugs and docs for feat/online-server (#5598)
* fix test bugs

* add do sample test

* del useless lines

* fix comments

* fix tests

* delete version tag

* delete version tag

* add

* del test sever

* fix test

* fix

* Revert "add"

This reverts commit b9305fb024.
2024-05-08 15:20:53 +00:00
CjhHa1 7bbb28e48b [Inference] resolve rebase conflicts
fix
2024-05-08 15:20:53 +00:00
Jianghai c064032865 [Online Server] Chat Api for streaming and not streaming response (#5470)
* fix bugs

* fix bugs

* fix api server

* fix api server

* add chat api and test

* del request.n
2024-05-08 15:20:53 +00:00
Jianghai de378cd2ab [Inference] Finish Online Serving Test, add streaming output api, continuous batching test and example (#5432)
* finish online test and add examples

* fix test_contionus_batching

* fix some bugs

* fix bash

* fix

* fix inference

* finish revision

* fix typos

* revision
2024-05-08 15:20:52 +00:00
Jianghai 69cd7e069d [Inference] ADD async and sync Api server using FastAPI (#5396)
* add api server

* fix

* add

* add completion service and fix bug

* add generation config

* revise shardformer

* fix bugs

* add docstrings and fix some bugs

* fix bugs and add choices for prompt template
2024-05-08 15:18:28 +00:00
yuehuayingxueluo d482922035
[Inference] Support the logic related to ignoring EOS token (#5693)
* Adapt temperature processing logic

* add ValueError for top_p and top_k

* add GQA Test

* fix except_msg

* support ignore EOS token

* change variable's name

* fix annotation
2024-05-08 19:59:10 +08:00
yuehuayingxueluo 9c2fe7935f
[Inference]Adapt temperature processing logic (#5689)
* Adapt temperature processing logic

* add ValueError for top_p and top_k

* add GQA Test

* fix except_msg
2024-05-08 17:58:29 +08:00