Hongxin Liu
641b1ee71a
[devops] remove post commit ci ( #5566 )
...
* [devops] remove post commit ci
* [misc] run pre-commit on all files
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
8 months ago
digger yu
bce9499ed3
fix some typo ( #5307 )
10 months ago
flybird11111
576a2f7b10
[gemini] gemini support tensor parallelism. ( #4942 )
...
* [colossalai]fix typo
* [inference] Add smmoothquant for llama (#4904 )
* [inference] add int8 rotary embedding kernel for smoothquant (#4843 )
* [inference] add smoothquant llama attention (#4850 )
* add smoothquant llama attention
* remove uselss code
* remove useless code
* fix import error
* rename file name
* [inference] add silu linear fusion for smoothquant llama mlp (#4853 )
* add silu linear
* update skip condition
* catch smoothquant cuda lib exception
* prcocess exception for tests
* [inference] add llama mlp for smoothquant (#4854 )
* add llama mlp for smoothquant
* fix down out scale
* remove duplicate lines
* add llama mlp check
* delete useless code
* [inference] add smoothquant llama (#4861 )
* add smoothquant llama
* fix attention accuracy
* fix accuracy
* add kv cache and save pretrained
* refactor example
* delete smooth
* refactor code
* [inference] add smooth function and delete useless code for smoothquant (#4895 )
* add smooth function and delete useless code
* update datasets
* remove duplicate import
* delete useless file
* refactor codes (#4902 )
* rafactor code
* add license
* add torch-int and smoothquant license
* Update flash_attention_patch.py
To be compatible with the new change in the Transformers library, where a new argument 'padding_mask' was added to forward function of attention layer.
https://github.com/huggingface/transformers/pull/25598
* [kernel] support pure fp16 for cpu adam and update gemini optim tests (#4921 )
* [kernel] support pure fp16 for cpu adam (#4896 )
* [kernel] fix cpu adam kernel for pure fp16 and update tests (#4919 )
* [kernel] fix cpu adam
* [test] update gemini optim test
* [format] applied code formatting on changed files in pull request 4908 (#4918 )
Co-authored-by: github-actions <github-actions@github.com>
* [gemini] support gradient accumulation (#4869 )
* add test
* fix no_sync bug in low level zero plugin
* fix test
* add argument for grad accum
* add grad accum in backward hook for gemini
* finish implementation, rewrite tests
* fix test
* skip stuck model in low level zero test
* update doc
* optimize communication & fix gradient checkpoint
* modify doc
* cleaning codes
* update cpu adam fp16 case
* [hotfix] fix torch 2.0 compatibility (#4936 )
* [hotfix] fix launch
* [test] fix test gemini optim
* [shardformer] fix vit
* [test] add no master test for low level zero plugin (#4934 )
* [format] applied code formatting on changed files in pull request 4820 (#4886 )
Co-authored-by: github-actions <github-actions@github.com>
* [nfc] fix some typo with colossalai/ docs/ etc. (#4920 )
* [Refactor] Integrated some lightllm kernels into token-attention (#4946 )
* add some req for inference
* clean codes
* add codes
* add some lightllm deps
* clean codes
* hello
* delete rms files
* add some comments
* add comments
* add doc
* add lightllm deps
* add lightllm cahtglm2 kernels
* add lightllm cahtglm2 kernels
* replace rotary embedding with lightllm kernel
* add some commnets
* add some comments
* add some comments
* add
* replace fwd kernel att1
* fix a arg
* add
* add
* fix token attention
* add some comments
* clean codes
* modify comments
* fix readme
* fix bug
* fix bug
---------
Co-authored-by: cuiqing.li <lixx336@gmail.com>
Co-authored-by: CjhHa1 <cjh18671720497@outlook.com>
* [test] merge old components to test to model zoo (#4945 )
* [test] add custom models in model zoo
* [test] update legacy test
* [test] update model zoo
* [test] update gemini test
* [test] remove components to test
* [inference] add reference and fix some bugs (#4937 )
* add reference and fix some bugs
* update gptq init
---------
Co-authored-by: Xu Kai <xukai16@foxamil.com>
* [Inference]ADD Bench Chatglm2 script (#4963 )
* add bench chatglm
* fix bug and make utils
---------
Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
* [Pipeline inference] Combine kvcache with pipeline inference (#4938 )
* merge kvcache with pipeline inference and refactor the code structure
* support ppsize > 2
* refactor pipeline code
* do pre-commit
* modify benchmark
* fix bench mark
* polish code
* add docstring and update readme
* refactor the code
* fix some logic bug of ppinfer
* polish readme
* fix typo
* skip infer test
* updated c++17 compiler flags (#4983 )
* [Inference] Dynamic Batching Inference, online and offline (#4953 )
* [inference] Dynamic Batching for Single and Multiple GPUs (#4831 )
* finish batch manager
* 1
* first
* fix
* fix dynamic batching
* llama infer
* finish test
* support different lengths generating
* del prints
* del prints
* fix
* fix bug
---------
Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
* [inference] Async dynamic batching (#4894 )
* finish input and output logic
* add generate
* test forward
* 1
* [inference]Re push async dynamic batching (#4901 )
* adapt to ray server
* finish async
* finish test
* del test
---------
Co-authored-by: yuehuayingxueluo <867460659@qq.com>
* Revert "[inference]Re push async dynamic batching (#4901 )" (#4905 )
This reverts commit fbf3c09e67
.
* Revert "[inference] Async dynamic batching (#4894 )"
This reverts commit fced140250
.
* Revert "[inference] Async dynamic batching (#4894 )" (#4909 )
This reverts commit fced140250
.
* Add Ray Distributed Environment Init Scripts
* support DynamicBatchManager base function
* revert _set_tokenizer version
* add driver async generate
* add async test
* fix bugs in test_ray_dist.py
* add get_tokenizer.py
* fix code style
* fix bugs about No module named 'pydantic' in ci test
* fix bugs in ci test
* fix bugs in ci test
* fix bugs in ci test
* [infer]Add Ray Distributed Environment Init Scripts (#4911 )
* Revert "[inference] Async dynamic batching (#4894 )"
This reverts commit fced140250
.
* Add Ray Distributed Environment Init Scripts
* support DynamicBatchManager base function
* revert _set_tokenizer version
* add driver async generate
* add async test
* fix bugs in test_ray_dist.py
* add get_tokenizer.py
* fix code style
* fix bugs about No module named 'pydantic' in ci test
* fix bugs in ci test
* fix bugs in ci test
* fix bugs in ci test
* support dynamic batch for bloom model and is_running function
* [Inference]Test for new Async engine (#4935 )
* infer engine
* infer engine
* test engine
* test engine
* new manager
* change step
* add
* test
* fix
* fix
* finish test
* finish test
* finish test
* finish test
* add license
---------
Co-authored-by: yuehuayingxueluo <867460659@qq.com>
* add assertion for config (#4947 )
* [Inference] Finish dynamic batching offline test (#4948 )
* test
* fix test
* fix quant
* add default
* fix
* fix some bugs
* fix some bugs
* fix
* fix bug
* fix bugs
* reset param
---------
Co-authored-by: yuehuayingxueluo <867460659@qq.com>
Co-authored-by: Cuiqing Li <lixx3527@gmail.com>
Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
* [Kernels]Updated Triton kernels into 2.1.0 and adding flash-decoding for llama token attention (#4965 )
* adding flash-decoding
* clean
* adding kernel
* adding flash-decoding
* add integration
* add
* adding kernel
* adding kernel
* adding triton 2.1.0 features for inference
* update bloom triton kernel
* remove useless vllm kernels
* clean codes
* fix
* adding files
* fix readme
* update llama flash-decoding
---------
Co-authored-by: cuiqing.li <lixx336@gmail.com>
* fix ColossalEval (#4992 )
Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
* [doc]Update doc for colossal-inference (#4989 )
* update doc
* Update README.md
---------
Co-authored-by: cuiqing.li <lixx336@gmail.com>
* [hotfix] Fix the bug where process groups were not being properly released. (#4940 )
* Fix the bug where process groups were not being properly released.
* test
* Revert "test"
This reverts commit 479900c139
.
* [hotfix] fix the bug of repeatedly storing param group (#4951 )
* [doc] add supported feature diagram for hybrid parallel plugin (#4996 )
* [Pipeline Inference] Merge pp with tp (#4993 )
* refactor pipeline into new CaiInferEngine
* updata llama modeling forward
* merge tp with pp
* update docstring
* optimize test workflow and example
* fix typo
* add assert and todo
* [release] update version (#4995 )
* [release] update version
* [hotfix] fix ci
* [gemini] gemini support tp
[gemini] gemini support tp
[gemini] gemini support tp
[gemini] gemini support tp
[gemini] gemini support tp
* fix
fix
fix
* update checkpointIO
update checkpointIO
update checkpointIO
update checkpointIO
update checkpointIO
update checkpointIO
update checkpointIO
update checkpointIO
update checkpointIO
* support fused layernorm
support fused layernorm
support fused layernorm
* update fusedlayernorm
update fusedlayernorm
update fusedlayernorm
* add sequence parallel to gemini
add sequence parallel to gemini
* fix
* fix comments
fix comments
fix comments
* fix
* fix t5
* clear cache
* fix
* activate ci
* activate ci
* fix
* fix
* fix
* fix
* revert
* modify tp gather method
modify tp gather method
modify tp gather method
modify tp gather method
* fix test
---------
Co-authored-by: Xu Kai <xukai16@foxmail.com>
Co-authored-by: Zian(Andy) Zheng <62330719+Orion-Zheng@users.noreply.github.com>
Co-authored-by: Hongxin Liu <lhx0217@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions <github-actions@github.com>
Co-authored-by: Baizhou Zhang <eddiezhang@pku.edu.cn>
Co-authored-by: Zhongkai Zhao <kanezz620@gmail.com>
Co-authored-by: digger yu <digger-yu@outlook.com>
Co-authored-by: Cuiqing Li <lixx3527@gmail.com>
Co-authored-by: cuiqing.li <lixx336@gmail.com>
Co-authored-by: CjhHa1 <cjh18671720497@outlook.com>
Co-authored-by: Xu Kai <xukai16@foxamil.com>
Co-authored-by: Jianghai <72591262+CjhHa1@users.noreply.github.com>
Co-authored-by: Bin Jia <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: アマデウス <kurisusnowdeng@users.noreply.github.com>
Co-authored-by: yuehuayingxueluo <867460659@qq.com>
Co-authored-by: Yuanchen <70520919+chengeharrison@users.noreply.github.com>
Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
Co-authored-by: littsk <1214689160@qq.com>
Co-authored-by: ppt0011 <143150326+ppt0011@users.noreply.github.com>
1 year ago
Hongxin Liu
079bf3cb26
[misc] update pre-commit and run all files ( #4752 )
...
* [misc] update pre-commit
* [misc] run pre-commit
* [misc] remove useless configuration files
* [misc] ignore cuda for clang-format
1 year ago
Baizhou Zhang
0ceec8f9a9
[pipeline] support fp32 for HybridPlugin/merge shardformer test and pipeline test into one file ( #4354 )
...
* add naive optimizer for 3DPlugin/refactor gpt2 shardformer test
* merge tests of PP/DP/TP combinations into one test file
* fix bug when sync grad for dp in HybridPlugin
* update supported precisions for 3DPlugin/fix bug when shifting tp_degree
* improve the passing of lazy_init
* modify lazy_init/use sync_shared_params
1 year ago
Hongxin Liu
d921ce8391
[shardformer] support inplace sharding ( #4251 )
...
* [shardformer] embedding support inplace sharding
* [shardformer] linear support inplace sharding
* [shardformer] layernorm support inplace sharding
* [shardformer] qkv support inplace sharding
* [test] update shardformer layer test
* [shardformer] fix shared param sharding
* [shardformer] fix bert policy
* [shardformer] fix bloom policy
* [shardformer] fix llama policy
* [shardformer] fix opt policy
* [shardformer] fix t5 policy
* [shardformer] fix fused qkv linear
* [shardformer] fix bugs
* force sync
* [test] fix bugs
* [test] fix transformer version
1 year ago
Frank Lee
70c58cfd4f
[shardformer] supported fused qkv checkpoint ( #4073 )
1 year ago
Frank Lee
8eb09a4c69
[shardformer] support module saving and loading ( #4062 )
...
* [shardformer] support module saving and loading
* polish code
1 year ago
Frank Lee
45d9384346
[shardformer] removed inplace tensor sharding ( #4018 )
1 year ago
Frank Lee
015af592f8
[shardformer] integrated linear 1D with dtensor ( #3996 )
...
* [shardformer] integrated linear 1D with dtensor
* polish code
1 year ago