Wenhao Chen
bb0a668fee
[hotfix] set return_outputs=False in examples and polish code ( #5404 )
...
* fix: simplify merge_batch
* fix: use return_outputs=False to eliminate extra memory consumption
* feat: add return_outputs warning
* style: remove `return_outputs=False` as it is the default value
2024-03-25 12:31:09 +08:00
flybird11111
5e16bf7980
[shardformer] fix gathering output when using tensor parallelism ( #5431 )
...
* fix
* padding vocab_size when using pipeline parallellism
padding vocab_size when using pipeline parallellism
fix
fix
* fix
* fix
fix
fix
* fix gather output
* fix
* fix
* fix
fix resize embedding
fix resize embedding
* fix resize embedding
fix
* revert
* revert
* revert
2024-03-18 15:55:11 +08:00
Hongxin Liu
f2e8b9ef9f
[devops] fix compatibility ( #5444 )
...
* [devops] fix compatibility
* [hotfix] update compatibility test on pr
* [devops] fix compatibility
* [devops] record duration during comp test
* [test] decrease test duration
* fix falcon
2024-03-13 15:24:13 +08:00
digger yu
5e1c93d732
[hotfix] fix typo change MoECheckpintIO to MoECheckpointIO ( #5335 )
...
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
2024-03-05 21:52:30 +08:00
flybird11111
29695cf70c
[example]add gpt2 benchmark example script. ( #5295 )
...
* benchmark gpt2
* fix
fix
fix
fix
* [doc] fix typo in Colossal-LLaMA-2/README.md (#5247 )
* [workflow] fixed build CI (#5240 )
* [workflow] fixed build CI
* polish
* polish
* polish
* polish
* polish
* [ci] fixed booster test (#5251 )
* [ci] fixed booster test
* [ci] fixed booster test
* [ci] fixed booster test
* [ci] fixed ddp test (#5254 )
* [ci] fixed ddp test
* polish
* fix typo in applications/ColossalEval/README.md (#5250 )
* [ci] fix shardformer tests. (#5255 )
* fix ci
fix
* revert: revert p2p
* feat: add enable_metadata_cache option
* revert: enable t5 tests
---------
Co-authored-by: Wenhao Chen <cwher@outlook.com>
* [doc] fix doc typo (#5256 )
* [doc] fix annotation display
* [doc] fix llama2 doc
* [hotfix]: add pp sanity check and fix mbs arg (#5268 )
* fix: fix misleading mbs arg
* feat: add pp sanity check
* fix: fix 1f1b sanity check
* [workflow] fixed incomplete bash command (#5272 )
* [workflow] fixed oom tests (#5275 )
* [workflow] fixed oom tests
* polish
* polish
* polish
* [ci] fix test_hybrid_parallel_plugin_checkpoint_io.py (#5276 )
* fix ci
fix
* fix test
* revert: revert p2p
* feat: add enable_metadata_cache option
* revert: enable t5 tests
* fix
---------
Co-authored-by: Wenhao Chen <cwher@outlook.com>
* [shardformer] hybridparallelplugin support gradients accumulation. (#5246 )
* support gradients acc
fix
fix
fix
fix
fix
fix
fix
fix
fix
fix
fix
fix
fix
* fix
fix
* fix
fix
fix
* [hotfix] Fix ShardFormer test execution path when using sequence parallelism (#5230 )
* fix auto loading gpt2 tokenizer (#5279 )
* [doc] add llama2-13B disyplay (#5285 )
* Update README.md
* fix 13b typo
---------
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
* fix llama pretrain (#5287 )
* fix
* fix
* fix
fix
* fix
fix
fix
* fix
fix
* benchmark gpt2
* fix
fix
fix
fix
* [workflow] fixed build CI (#5240 )
* [workflow] fixed build CI
* polish
* polish
* polish
* polish
* polish
* [ci] fixed booster test (#5251 )
* [ci] fixed booster test
* [ci] fixed booster test
* [ci] fixed booster test
* fix
fix
* fix
fix
fix
* fix
* fix
fix
fix
fix
fix
* fix
* Update shardformer.py
---------
Co-authored-by: digger yu <digger-yu@outlook.com>
Co-authored-by: Frank Lee <somerlee.9@gmail.com>
Co-authored-by: Wenhao Chen <cwher@outlook.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
Co-authored-by: Zhongkai Zhao <kanezz620@gmail.com>
Co-authored-by: Michelle <97082656+MichelleMa8@users.noreply.github.com>
Co-authored-by: Desperado-Jia <502205863@qq.com>
2024-03-04 16:18:13 +08:00
QinLuo
bf34c6fef6
[fsdp] impl save/load shard model/optimizer ( #5357 )
2024-02-27 13:51:14 +08:00
Frank Lee
efef43b53c
Merge pull request #5372 from hpcaitech/exp/mixtral
2024-02-08 16:30:05 +08:00
Hongxin Liu
da39d21b71
[moe] support mixtral ( #5309 )
...
* [moe] add mixtral block for single expert
* [moe] mixtral block fwd support uneven ep
* [moe] mixtral block bwd support uneven ep
* [moe] add mixtral moe layer
* [moe] simplify replace
* [meo] support save sharded mixtral
* [meo] support load sharded mixtral
* [meo] support save sharded optim
* [meo] integrate moe manager into plug
* [meo] fix optimizer load
* [meo] fix mixtral layer
2024-02-07 19:21:02 +08:00
Xuanlei Zhao
7d8e0338a4
[moe] init mixtral impl
2024-02-07 19:21:02 +08:00
Hongxin Liu
6c0fa7b9a8
[llama] fix dataloader for hybrid parallel ( #5358 )
...
* [plugin] refactor prepare dataloader
* [plugin] update train script
2024-02-05 15:14:56 +08:00
Wenhao Chen
1c790c0877
[fix] remove unnecessary dp_size assert ( #5351 )
...
* fix: remove unnecessary assert
* test: add more 3d plugin tests
* fix: add warning
2024-02-02 14:40:20 +08:00
ver217
148469348a
Merge branch 'main' into sync/npu
2024-01-18 12:05:21 +08:00
flybird11111
46e091651b
[shardformer] hybridparallelplugin support gradients accumulation. ( #5246 )
...
* support gradients acc
fix
fix
fix
fix
fix
fix
fix
fix
fix
fix
fix
fix
fix
* fix
fix
* fix
fix
fix
2024-01-17 15:22:33 +08:00
flybird11111
e830ef917d
[ci] fix shardformer tests. ( #5255 )
...
* fix ci
fix
* revert: revert p2p
* feat: add enable_metadata_cache option
* revert: enable t5 tests
---------
Co-authored-by: Wenhao Chen <cwher@outlook.com>
2024-01-11 19:07:45 +08:00
Frank Lee
9102d655ab
[hotfix] removed unused flag ( #5242 )
2024-01-09 14:57:07 +08:00
Hongxin Liu
d202cc28c0
[npu] change device to accelerator api ( #5239 )
...
* update accelerator
* fix timer
* fix amp
* update
* fix
* update bug
* add error raise
* fix autocast
* fix set device
* remove doc accelerator
* update doc
* update doc
* update doc
* use nullcontext
* update cpu
* update null context
* change time limit for example
* udpate
* update
* update
* update
* [npu] polish accelerator code
---------
Co-authored-by: Xuanlei Zhao <xuanlei.zhao@gmail.com>
Co-authored-by: zxl <43881818+oahzxl@users.noreply.github.com>
2024-01-09 10:20:05 +08:00
flybird11111
365671be10
fix-test ( #5210 )
...
fix-test
fix-test
2024-01-03 14:26:13 +08:00
Wenhao Chen
d799a3088f
[pipeline]: add p2p fallback order and fix interleaved pp deadlock ( #5214 )
...
* fix: add fallback order option and update 1f1b
* fix: fix deadlock comm in interleaved pp
* test: modify p2p test
2024-01-03 11:34:49 +08:00
Wenhao Chen
4fa689fca1
[pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp ( #5134 )
...
* test: add more p2p tests
* fix: remove send_forward_recv_forward as p2p op list need to use the same group
* fix: make send and receive atomic
* feat: update P2PComm fn
* feat: add metadata cache in 1f1b
* feat: add metadata cache in interleaved pp
* feat: modify is_xx_stage fn
* revert: add _broadcast_object_list
* feat: add interleaved pp in llama policy
* feat: set NCCL_BUFFSIZE in HybridParallelPlugin
2023-12-22 10:44:00 +08:00
flybird11111
21aa5de00b
[gemini] hotfix NaN loss while using Gemini + tensor_parallel ( #5150 )
...
* fix
aaa
fix
fix
fix
* fix
* fix
* test ci
* fix ci
fix
2023-12-08 11:10:51 +08:00
flybird11111
2a2ec49aa7
[plugin]fix 3d checkpoint load when booster boost without optimizer. ( #5135 )
...
* fix 3d checkpoint load when booster boost without optimizer
fix 3d checkpoint load when booster boost without optimizer
* test ci
* revert ci
* fix
fix
2023-11-30 18:37:47 +08:00
github-actions[bot]
d10ee42f68
[format] applied code formatting on changed files in pull request 5088 ( #5127 )
...
Co-authored-by: github-actions <github-actions@github.com>
2023-11-29 13:38:37 +08:00
Wenhao Chen
7172459e74
[shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert ( #5088 )
...
* [shardformer] implement policy for all GPT-J models and test
* [shardformer] support interleaved pipeline parallel for bert finetune
* [shardformer] shardformer support falcon (#4883 )
* [shardformer]: fix interleaved pipeline for bert model (#5048 )
* [hotfix]: disable seq parallel for gptj and falcon, and polish code (#5093 )
* Add Mistral support for Shardformer (#5103 )
* [shardformer] add tests to mistral (#5105 )
---------
Co-authored-by: Pengtai Xu <henryxu880@gmail.com>
Co-authored-by: ppt0011 <143150326+ppt0011@users.noreply.github.com>
Co-authored-by: flybird11111 <1829166702@qq.com>
Co-authored-by: eric8607242 <e0928021388@gmail.com>
2023-11-28 16:54:42 +08:00
Xuanlei Zhao
3acbf6d496
[npu] add npu support for hybrid plugin and llama ( #5090 )
...
* llama 3d
* update
* fix autocast
2023-11-22 19:23:21 +08:00
github-actions[bot]
8921a73c90
[format] applied code formatting on changed files in pull request 5067 ( #5072 )
...
Co-authored-by: github-actions <github-actions@github.com>
2023-11-20 19:46:43 +08:00
Hongxin Liu
e5ce4c8ea6
[npu] add npu support for gemini and zero ( #5067 )
...
* [npu] setup device utils (#5047 )
* [npu] add npu device support
* [npu] support low level zero
* [test] update npu zero plugin test
* [hotfix] fix import
* [test] recover tests
* [npu] gemini support npu (#5052 )
* [npu] refactor device utils
* [gemini] support npu
* [example] llama2+gemini support npu
* [kernel] add arm cpu adam kernel (#5065 )
* [kernel] add arm cpu adam
* [optim] update adam optimizer
* [kernel] arm cpu adam remove bf16 support
2023-11-20 16:12:41 +08:00
flybird11111
3e02154710
[gemini] gemini support extra-dp ( #5043 )
...
* support ddp
* fix
* fix
* fix
fix
* support ddp
* fix
* fix
* fix
fix
* simplify tests
* fix
* fix
* fix
fix
fix
* fix
2023-11-16 21:03:04 +08:00
flybird11111
576a2f7b10
[gemini] gemini support tensor parallelism. ( #4942 )
...
* [colossalai]fix typo
* [inference] Add smmoothquant for llama (#4904 )
* [inference] add int8 rotary embedding kernel for smoothquant (#4843 )
* [inference] add smoothquant llama attention (#4850 )
* add smoothquant llama attention
* remove uselss code
* remove useless code
* fix import error
* rename file name
* [inference] add silu linear fusion for smoothquant llama mlp (#4853 )
* add silu linear
* update skip condition
* catch smoothquant cuda lib exception
* prcocess exception for tests
* [inference] add llama mlp for smoothquant (#4854 )
* add llama mlp for smoothquant
* fix down out scale
* remove duplicate lines
* add llama mlp check
* delete useless code
* [inference] add smoothquant llama (#4861 )
* add smoothquant llama
* fix attention accuracy
* fix accuracy
* add kv cache and save pretrained
* refactor example
* delete smooth
* refactor code
* [inference] add smooth function and delete useless code for smoothquant (#4895 )
* add smooth function and delete useless code
* update datasets
* remove duplicate import
* delete useless file
* refactor codes (#4902 )
* rafactor code
* add license
* add torch-int and smoothquant license
* Update flash_attention_patch.py
To be compatible with the new change in the Transformers library, where a new argument 'padding_mask' was added to forward function of attention layer.
https://github.com/huggingface/transformers/pull/25598
* [kernel] support pure fp16 for cpu adam and update gemini optim tests (#4921 )
* [kernel] support pure fp16 for cpu adam (#4896 )
* [kernel] fix cpu adam kernel for pure fp16 and update tests (#4919 )
* [kernel] fix cpu adam
* [test] update gemini optim test
* [format] applied code formatting on changed files in pull request 4908 (#4918 )
Co-authored-by: github-actions <github-actions@github.com>
* [gemini] support gradient accumulation (#4869 )
* add test
* fix no_sync bug in low level zero plugin
* fix test
* add argument for grad accum
* add grad accum in backward hook for gemini
* finish implementation, rewrite tests
* fix test
* skip stuck model in low level zero test
* update doc
* optimize communication & fix gradient checkpoint
* modify doc
* cleaning codes
* update cpu adam fp16 case
* [hotfix] fix torch 2.0 compatibility (#4936 )
* [hotfix] fix launch
* [test] fix test gemini optim
* [shardformer] fix vit
* [test] add no master test for low level zero plugin (#4934 )
* [format] applied code formatting on changed files in pull request 4820 (#4886 )
Co-authored-by: github-actions <github-actions@github.com>
* [nfc] fix some typo with colossalai/ docs/ etc. (#4920 )
* [Refactor] Integrated some lightllm kernels into token-attention (#4946 )
* add some req for inference
* clean codes
* add codes
* add some lightllm deps
* clean codes
* hello
* delete rms files
* add some comments
* add comments
* add doc
* add lightllm deps
* add lightllm cahtglm2 kernels
* add lightllm cahtglm2 kernels
* replace rotary embedding with lightllm kernel
* add some commnets
* add some comments
* add some comments
* add
* replace fwd kernel att1
* fix a arg
* add
* add
* fix token attention
* add some comments
* clean codes
* modify comments
* fix readme
* fix bug
* fix bug
---------
Co-authored-by: cuiqing.li <lixx336@gmail.com>
Co-authored-by: CjhHa1 <cjh18671720497@outlook.com>
* [test] merge old components to test to model zoo (#4945 )
* [test] add custom models in model zoo
* [test] update legacy test
* [test] update model zoo
* [test] update gemini test
* [test] remove components to test
* [inference] add reference and fix some bugs (#4937 )
* add reference and fix some bugs
* update gptq init
---------
Co-authored-by: Xu Kai <xukai16@foxamil.com>
* [Inference]ADD Bench Chatglm2 script (#4963 )
* add bench chatglm
* fix bug and make utils
---------
Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
* [Pipeline inference] Combine kvcache with pipeline inference (#4938 )
* merge kvcache with pipeline inference and refactor the code structure
* support ppsize > 2
* refactor pipeline code
* do pre-commit
* modify benchmark
* fix bench mark
* polish code
* add docstring and update readme
* refactor the code
* fix some logic bug of ppinfer
* polish readme
* fix typo
* skip infer test
* updated c++17 compiler flags (#4983 )
* [Inference] Dynamic Batching Inference, online and offline (#4953 )
* [inference] Dynamic Batching for Single and Multiple GPUs (#4831 )
* finish batch manager
* 1
* first
* fix
* fix dynamic batching
* llama infer
* finish test
* support different lengths generating
* del prints
* del prints
* fix
* fix bug
---------
Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
* [inference] Async dynamic batching (#4894 )
* finish input and output logic
* add generate
* test forward
* 1
* [inference]Re push async dynamic batching (#4901 )
* adapt to ray server
* finish async
* finish test
* del test
---------
Co-authored-by: yuehuayingxueluo <867460659@qq.com>
* Revert "[inference]Re push async dynamic batching (#4901 )" (#4905 )
This reverts commit fbf3c09e67
.
* Revert "[inference] Async dynamic batching (#4894 )"
This reverts commit fced140250
.
* Revert "[inference] Async dynamic batching (#4894 )" (#4909 )
This reverts commit fced140250
.
* Add Ray Distributed Environment Init Scripts
* support DynamicBatchManager base function
* revert _set_tokenizer version
* add driver async generate
* add async test
* fix bugs in test_ray_dist.py
* add get_tokenizer.py
* fix code style
* fix bugs about No module named 'pydantic' in ci test
* fix bugs in ci test
* fix bugs in ci test
* fix bugs in ci test
* [infer]Add Ray Distributed Environment Init Scripts (#4911 )
* Revert "[inference] Async dynamic batching (#4894 )"
This reverts commit fced140250
.
* Add Ray Distributed Environment Init Scripts
* support DynamicBatchManager base function
* revert _set_tokenizer version
* add driver async generate
* add async test
* fix bugs in test_ray_dist.py
* add get_tokenizer.py
* fix code style
* fix bugs about No module named 'pydantic' in ci test
* fix bugs in ci test
* fix bugs in ci test
* fix bugs in ci test
* support dynamic batch for bloom model and is_running function
* [Inference]Test for new Async engine (#4935 )
* infer engine
* infer engine
* test engine
* test engine
* new manager
* change step
* add
* test
* fix
* fix
* finish test
* finish test
* finish test
* finish test
* add license
---------
Co-authored-by: yuehuayingxueluo <867460659@qq.com>
* add assertion for config (#4947 )
* [Inference] Finish dynamic batching offline test (#4948 )
* test
* fix test
* fix quant
* add default
* fix
* fix some bugs
* fix some bugs
* fix
* fix bug
* fix bugs
* reset param
---------
Co-authored-by: yuehuayingxueluo <867460659@qq.com>
Co-authored-by: Cuiqing Li <lixx3527@gmail.com>
Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
* [Kernels]Updated Triton kernels into 2.1.0 and adding flash-decoding for llama token attention (#4965 )
* adding flash-decoding
* clean
* adding kernel
* adding flash-decoding
* add integration
* add
* adding kernel
* adding kernel
* adding triton 2.1.0 features for inference
* update bloom triton kernel
* remove useless vllm kernels
* clean codes
* fix
* adding files
* fix readme
* update llama flash-decoding
---------
Co-authored-by: cuiqing.li <lixx336@gmail.com>
* fix ColossalEval (#4992 )
Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
* [doc]Update doc for colossal-inference (#4989 )
* update doc
* Update README.md
---------
Co-authored-by: cuiqing.li <lixx336@gmail.com>
* [hotfix] Fix the bug where process groups were not being properly released. (#4940 )
* Fix the bug where process groups were not being properly released.
* test
* Revert "test"
This reverts commit 479900c139
.
* [hotfix] fix the bug of repeatedly storing param group (#4951 )
* [doc] add supported feature diagram for hybrid parallel plugin (#4996 )
* [Pipeline Inference] Merge pp with tp (#4993 )
* refactor pipeline into new CaiInferEngine
* updata llama modeling forward
* merge tp with pp
* update docstring
* optimize test workflow and example
* fix typo
* add assert and todo
* [release] update version (#4995 )
* [release] update version
* [hotfix] fix ci
* [gemini] gemini support tp
[gemini] gemini support tp
[gemini] gemini support tp
[gemini] gemini support tp
[gemini] gemini support tp
* fix
fix
fix
* update checkpointIO
update checkpointIO
update checkpointIO
update checkpointIO
update checkpointIO
update checkpointIO
update checkpointIO
update checkpointIO
update checkpointIO
* support fused layernorm
support fused layernorm
support fused layernorm
* update fusedlayernorm
update fusedlayernorm
update fusedlayernorm
* add sequence parallel to gemini
add sequence parallel to gemini
* fix
* fix comments
fix comments
fix comments
* fix
* fix t5
* clear cache
* fix
* activate ci
* activate ci
* fix
* fix
* fix
* fix
* revert
* modify tp gather method
modify tp gather method
modify tp gather method
modify tp gather method
* fix test
---------
Co-authored-by: Xu Kai <xukai16@foxmail.com>
Co-authored-by: Zian(Andy) Zheng <62330719+Orion-Zheng@users.noreply.github.com>
Co-authored-by: Hongxin Liu <lhx0217@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions <github-actions@github.com>
Co-authored-by: Baizhou Zhang <eddiezhang@pku.edu.cn>
Co-authored-by: Zhongkai Zhao <kanezz620@gmail.com>
Co-authored-by: digger yu <digger-yu@outlook.com>
Co-authored-by: Cuiqing Li <lixx3527@gmail.com>
Co-authored-by: cuiqing.li <lixx336@gmail.com>
Co-authored-by: CjhHa1 <cjh18671720497@outlook.com>
Co-authored-by: Xu Kai <xukai16@foxamil.com>
Co-authored-by: Jianghai <72591262+CjhHa1@users.noreply.github.com>
Co-authored-by: Bin Jia <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: アマデウス <kurisusnowdeng@users.noreply.github.com>
Co-authored-by: yuehuayingxueluo <867460659@qq.com>
Co-authored-by: Yuanchen <70520919+chengeharrison@users.noreply.github.com>
Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
Co-authored-by: littsk <1214689160@qq.com>
Co-authored-by: ppt0011 <143150326+ppt0011@users.noreply.github.com>
2023-11-10 10:15:16 +08:00
Xuanlei Zhao
f71e63b0f3
[moe] support optimizer checkpoint ( #5015 )
...
* Refactor MoE Manager setup method
* unshard optim ckpt
* optim io
* update transformer version
* update requirements
* update ckpt
* update ckpt
* update ckpt
* fix engine
* fix engine
2023-11-08 15:07:03 +00:00
littsk
1a3315e336
[hotfix] Add layer norm gradients all-reduce for sequence parallel ( #4926 )
...
* [hotfix] Add layer norm gradients all-reduce for sequence parallel. (#4915 )
* Add layer norm gradients all-reduce for sequence parallel.
* skip pipeline inference test
* [hotfix] fixing polices of sequence parallel (#4922 )
* Add layer norm gradients all-reduce for sequence parallel.
* fix parameter passing when calling get_autopolicy
---------
Co-authored-by: littsk <1214689160@qq.com>
* Hotfix/add grad all reduce for sequence parallel (#4927 )
* Add layer norm gradients all-reduce for sequence parallel.
* fix parameter passing when calling get_autopolicy
* fix bug using wrong variables
---------
Co-authored-by: littsk <1214689160@qq.com>
* fix policy initialization
* fix bloom and chatglm policices
* polish code of handling layernorm
* fix moe module
* polish code of class initializing
---------
Co-authored-by: Zhongkai Zhao <kanezz620@gmail.com>
2023-11-03 13:32:43 +08:00
Xuanlei Zhao
dc003c304c
[moe] merge moe into main ( #4978 )
...
* update moe module
* support openmoe
2023-11-02 02:21:24 +00:00
Baizhou Zhang
c040d70aa0
[hotfix] fix the bug of repeatedly storing param group ( #4951 )
2023-10-31 14:48:01 +08:00
Baizhou Zhang
21ba89cab6
[gemini] support gradient accumulation ( #4869 )
...
* add test
* fix no_sync bug in low level zero plugin
* fix test
* add argument for grad accum
* add grad accum in backward hook for gemini
* finish implementation, rewrite tests
* fix test
* skip stuck model in low level zero test
* update doc
* optimize communication & fix gradient checkpoint
* modify doc
* cleaning codes
* update cpu adam fp16 case
2023-10-17 14:07:21 +08:00
Zhongkai Zhao
a0684e7bd6
[feature] support no master weights option for low level zero plugin ( #4816 )
...
* [feature] support no master weights for low level zero plugin
* [feature] support no master weights for low level zero plugin, remove data copy when no master weights
* remove data copy and typecasting when no master weights
* not load weights to cpu when using no master weights
* fix grad: use fp16 grad when no master weights
* only do not update working param when no master weights
* fix: only do not update working param when no master weights
* fix: passing params in dict format in hybrid plugin
* fix: remove extra params (tp_process_group) in hybrid_parallel_plugin
2023-10-13 07:57:45 +00:00
littsk
83b52c56cd
[feature] Add clip_grad_norm for hybrid_parallel_plugin ( #4837 )
...
* Add clip_grad_norm for hibrid_parallel_plugin
* polish code
* add unittests
* Move tp to a higher-level optimizer interface.
* bug fix
* polish code
2023-10-12 11:32:37 +08:00
Hongxin Liu
df63564184
[gemini] support amp o3 for gemini ( #4872 )
...
* [gemini] support no reuse fp16 chunk
* [gemini] support no master weight for optim
* [gemini] support no master weight for gemini ddp
* [test] update gemini tests
* [test] update gemini tests
* [plugin] update gemini plugin
* [test] fix gemini checkpointio test
* [test] fix gemini checkpoint io
2023-10-12 10:39:08 +08:00
shaoyuw
c97a3523db
fix: typo in comment of low_level_zero plugin
2023-10-05 16:30:34 +00:00
Baizhou Zhang
a2db75546d
[doc] polish shardformer doc ( #4779 )
...
* fix example format in docstring
* polish shardformer doc
2023-09-26 10:57:47 +08:00
Baizhou Zhang
c0a033700c
[shardformer] fix master param sync for hybrid plugin/rewrite unwrapping logic ( #4758 )
...
* fix master param sync for hybrid plugin
* rewrite unwrap for ddp/fsdp
* rewrite unwrap for zero/gemini
* rewrite unwrap for hybrid plugin
* fix geemini unwrap
* fix bugs
2023-09-20 18:29:37 +08:00
Hongxin Liu
079bf3cb26
[misc] update pre-commit and run all files ( #4752 )
...
* [misc] update pre-commit
* [misc] run pre-commit
* [misc] remove useless configuration files
* [misc] ignore cuda for clang-format
2023-09-19 14:20:26 +08:00
Xuanlei Zhao
ac2797996b
[shardformer] add custom policy in hybrid parallel plugin ( #4718 )
...
* add custom policy
* update assert
2023-09-15 17:53:13 +08:00
Baizhou Zhang
f911d5b09d
[doc] Add user document for Shardformer ( #4702 )
...
* create shardformer doc files
* add docstring for seq-parallel
* update ShardConfig docstring
* add links to llama example
* add outdated massage
* finish introduction & supporting information
* finish 'how shardformer works'
* finish shardformer.md English doc
* fix doctest fail
* add Chinese document
2023-09-15 10:56:39 +08:00
Baizhou Zhang
d8ceeac14e
[hotfix] fix typo in hybrid parallel io ( #4697 )
2023-09-12 17:32:19 +08:00
Baizhou Zhang
660eed9124
[pipeline] set optimizer to optional in execute_pipeline ( #4630 )
...
* set optimizer to optional in execute_pipeline
* arrange device and mixed precision in booster init
* fix execute_pipeline in booster.py
2023-09-07 10:42:59 +08:00
Hongxin Liu
fae6c92ead
Merge branch 'main' into feature/shardformer
2023-09-05 21:54:08 +08:00
Hongxin Liu
807e01a4ba
[zero] hotfix master param sync ( #4618 )
...
* [zero] add method to update master params
* [zero] update zero plugin
* [plugin] update low level zero plugin
2023-09-05 15:04:02 +08:00
Bin Jia
86d22581e4
[shardformer] Add overlap optional for HybridParallelPlugin ( #4615 )
...
* add optional overlap for plugin
* remove fixed todo
2023-09-05 11:52:23 +08:00
Hongxin Liu
a39a5c66fe
Merge branch 'main' into feature/shardformer
2023-09-04 23:43:13 +08:00
Baizhou Zhang
e79b1e80e2
[checkpointio] support huggingface from_pretrained for all plugins ( #4606 )
2023-09-04 23:25:01 +08:00
flybird11111
0a94fcd351
[shardformer] update bert finetune example with HybridParallelPlugin ( #4584 )
...
* [shardformer] fix opt test hanging
* fix
* test
* test
* test
* fix test
* fix test
* remove print
* add fix
* [shardformer] add bert finetune example
* [shardformer] add bert finetune example
* [shardformer] add bert finetune example
* [shardformer] add bert finetune example
* [shardformer] add bert finetune example
* [shardformer] add bert finetune example
* [shardformer] fix epoch change
* [shardformer] broadcast add pp group
* [shardformer] fix opt test hanging
* fix
* test
* test
* [shardformer] zero1+pp and the corresponding tests (#4517 )
* pause
* finish pp+zero1
* Update test_shard_vit.py
* [shardformer/fix overlap bug] fix overlap bug, add overlap as an option in shardco… (#4516 )
* fix overlap bug and support bert, add overlap as an option in shardconfig
* support overlap for chatglm and bloom
* [shardformer] fix emerged bugs after updating transformers (#4526 )
* test
* fix test
* fix test
* remove print
* add fix
* [shardformer] add bert finetune example
* [shardformer] add bert finetune example
* [shardformer] Add overlap support for gpt2 (#4535 )
* add overlap support for gpt2
* remove unused code
* remove unused code
* [shardformer] support pp+tp+zero1 tests (#4531 )
* [shardformer] fix opt test hanging
* fix
* test
* test
* test
* fix test
* fix test
* remove print
* add fix
* [shardformer] pp+tp+zero1
[shardformer] pp+tp+zero1
[shardformer] pp+tp+zero1
[shardformer] pp+tp+zero1
[shardformer] pp+tp+zero1
[shardformer] pp+tp+zero1
* [shardformer] pp+tp+zero1
* [shardformer] pp+tp+zero1
* [shardformer] pp+tp+zero1
* [shardformer] pp+tp+zero1
* [shardformer] fix submodule replacement bug when enabling pp (#4544 )
* [shardformer] support sharded optimizer checkpointIO of HybridParallelPlugin (#4540 )
* implement sharded optimizer saving
* add more param info
* finish implementation of sharded optimizer saving
* fix bugs in optimizer sharded saving
* add pp+zero test
* param group loading
* greedy loading of optimizer
* fix bug when loading
* implement optimizer sharded saving
* add optimizer test & arrange checkpointIO utils
* fix gemini sharding state_dict
* add verbose option
* add loading of master params
* fix typehint
* fix master/working mapping in fp16 amp
* [shardformer] add bert finetune example
* [shardformer] add bert finetune example
* [shardformer] add bert finetune example
* [shardformer] add bert finetune example
* [shardformer] fix epoch change
* [shardformer] broadcast add pp group
* rebase feature/shardformer
* update pipeline
* [shardformer] fix
* [shardformer] fix
* [shardformer] bert finetune fix
* [shardformer] add all_reduce operation to loss
add all_reduce operation to loss
* [shardformer] make compatible with pytree.
make compatible with pytree.
* [shardformer] disable tp
disable tp
* [shardformer] add 3d plugin to ci test
* [shardformer] update num_microbatches to None
* [shardformer] update microbatchsize
* [shardformer] update assert
* update scheduler
* update scheduler
---------
Co-authored-by: Jianghai <72591262+CjhHa1@users.noreply.github.com>
Co-authored-by: Bin Jia <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: Baizhou Zhang <eddiezhang@pku.edu.cn>
2023-09-04 21:46:29 +08:00