Xuanlei Zhao
f71e63b0f3
[moe] support optimizer checkpoint ( #5015 )
...
* Refactor MoE Manager setup method
* unshard optim ckpt
* optim io
* update transformer version
* update requirements
* update ckpt
* update ckpt
* update ckpt
* fix engine
* fix engine
1 year ago
littsk
1a3315e336
[hotfix] Add layer norm gradients all-reduce for sequence parallel ( #4926 )
...
* [hotfix] Add layer norm gradients all-reduce for sequence parallel. (#4915 )
* Add layer norm gradients all-reduce for sequence parallel.
* skip pipeline inference test
* [hotfix] fixing polices of sequence parallel (#4922 )
* Add layer norm gradients all-reduce for sequence parallel.
* fix parameter passing when calling get_autopolicy
---------
Co-authored-by: littsk <1214689160@qq.com>
* Hotfix/add grad all reduce for sequence parallel (#4927 )
* Add layer norm gradients all-reduce for sequence parallel.
* fix parameter passing when calling get_autopolicy
* fix bug using wrong variables
---------
Co-authored-by: littsk <1214689160@qq.com>
* fix policy initialization
* fix bloom and chatglm policices
* polish code of handling layernorm
* fix moe module
* polish code of class initializing
---------
Co-authored-by: Zhongkai Zhao <kanezz620@gmail.com>
1 year ago
Xuanlei Zhao
dc003c304c
[moe] merge moe into main ( #4978 )
...
* update moe module
* support openmoe
1 year ago
Baizhou Zhang
c040d70aa0
[hotfix] fix the bug of repeatedly storing param group ( #4951 )
1 year ago
Baizhou Zhang
21ba89cab6
[gemini] support gradient accumulation ( #4869 )
...
* add test
* fix no_sync bug in low level zero plugin
* fix test
* add argument for grad accum
* add grad accum in backward hook for gemini
* finish implementation, rewrite tests
* fix test
* skip stuck model in low level zero test
* update doc
* optimize communication & fix gradient checkpoint
* modify doc
* cleaning codes
* update cpu adam fp16 case
1 year ago
Zhongkai Zhao
a0684e7bd6
[feature] support no master weights option for low level zero plugin ( #4816 )
...
* [feature] support no master weights for low level zero plugin
* [feature] support no master weights for low level zero plugin, remove data copy when no master weights
* remove data copy and typecasting when no master weights
* not load weights to cpu when using no master weights
* fix grad: use fp16 grad when no master weights
* only do not update working param when no master weights
* fix: only do not update working param when no master weights
* fix: passing params in dict format in hybrid plugin
* fix: remove extra params (tp_process_group) in hybrid_parallel_plugin
1 year ago
littsk
83b52c56cd
[feature] Add clip_grad_norm for hybrid_parallel_plugin ( #4837 )
...
* Add clip_grad_norm for hibrid_parallel_plugin
* polish code
* add unittests
* Move tp to a higher-level optimizer interface.
* bug fix
* polish code
1 year ago
Hongxin Liu
df63564184
[gemini] support amp o3 for gemini ( #4872 )
...
* [gemini] support no reuse fp16 chunk
* [gemini] support no master weight for optim
* [gemini] support no master weight for gemini ddp
* [test] update gemini tests
* [test] update gemini tests
* [plugin] update gemini plugin
* [test] fix gemini checkpointio test
* [test] fix gemini checkpoint io
1 year ago
shaoyuw
c97a3523db
fix: typo in comment of low_level_zero plugin
1 year ago
Hongxin Liu
4965c0dabd
[lazy] support from_pretrained ( #4801 )
...
* [lazy] patch from pretrained
* [lazy] fix from pretrained and add tests
* [devops] update ci
1 year ago
Baizhou Zhang
a2db75546d
[doc] polish shardformer doc ( #4779 )
...
* fix example format in docstring
* polish shardformer doc
1 year ago
Baizhou Zhang
c0a033700c
[shardformer] fix master param sync for hybrid plugin/rewrite unwrapping logic ( #4758 )
...
* fix master param sync for hybrid plugin
* rewrite unwrap for ddp/fsdp
* rewrite unwrap for zero/gemini
* rewrite unwrap for hybrid plugin
* fix geemini unwrap
* fix bugs
1 year ago
Hongxin Liu
079bf3cb26
[misc] update pre-commit and run all files ( #4752 )
...
* [misc] update pre-commit
* [misc] run pre-commit
* [misc] remove useless configuration files
* [misc] ignore cuda for clang-format
1 year ago
Xuanlei Zhao
ac2797996b
[shardformer] add custom policy in hybrid parallel plugin ( #4718 )
...
* add custom policy
* update assert
1 year ago
Baizhou Zhang
f911d5b09d
[doc] Add user document for Shardformer ( #4702 )
...
* create shardformer doc files
* add docstring for seq-parallel
* update ShardConfig docstring
* add links to llama example
* add outdated massage
* finish introduction & supporting information
* finish 'how shardformer works'
* finish shardformer.md English doc
* fix doctest fail
* add Chinese document
1 year ago
Baizhou Zhang
d8ceeac14e
[hotfix] fix typo in hybrid parallel io ( #4697 )
1 year ago
Baizhou Zhang
1d454733c4
[doc] Update booster user documents. ( #4669 )
...
* update booster_api.md
* update booster_checkpoint.md
* update booster_plugins.md
* move transformers importing inside function
* fix Dict typing
* fix autodoc bug
* small fix
1 year ago
Baizhou Zhang
660eed9124
[pipeline] set optimizer to optional in execute_pipeline ( #4630 )
...
* set optimizer to optional in execute_pipeline
* arrange device and mixed precision in booster init
* fix execute_pipeline in booster.py
1 year ago
Hongxin Liu
807e01a4ba
[zero] hotfix master param sync ( #4618 )
...
* [zero] add method to update master params
* [zero] update zero plugin
* [plugin] update low level zero plugin
1 year ago
Bin Jia
86d22581e4
[shardformer] Add overlap optional for HybridParallelPlugin ( #4615 )
...
* add optional overlap for plugin
* remove fixed todo
1 year ago
Baizhou Zhang
e79b1e80e2
[checkpointio] support huggingface from_pretrained for all plugins ( #4606 )
1 year ago
flybird11111
0a94fcd351
[shardformer] update bert finetune example with HybridParallelPlugin ( #4584 )
...
* [shardformer] fix opt test hanging
* fix
* test
* test
* test
* fix test
* fix test
* remove print
* add fix
* [shardformer] add bert finetune example
* [shardformer] add bert finetune example
* [shardformer] add bert finetune example
* [shardformer] add bert finetune example
* [shardformer] add bert finetune example
* [shardformer] add bert finetune example
* [shardformer] fix epoch change
* [shardformer] broadcast add pp group
* [shardformer] fix opt test hanging
* fix
* test
* test
* [shardformer] zero1+pp and the corresponding tests (#4517 )
* pause
* finish pp+zero1
* Update test_shard_vit.py
* [shardformer/fix overlap bug] fix overlap bug, add overlap as an option in shardco… (#4516 )
* fix overlap bug and support bert, add overlap as an option in shardconfig
* support overlap for chatglm and bloom
* [shardformer] fix emerged bugs after updating transformers (#4526 )
* test
* fix test
* fix test
* remove print
* add fix
* [shardformer] add bert finetune example
* [shardformer] add bert finetune example
* [shardformer] Add overlap support for gpt2 (#4535 )
* add overlap support for gpt2
* remove unused code
* remove unused code
* [shardformer] support pp+tp+zero1 tests (#4531 )
* [shardformer] fix opt test hanging
* fix
* test
* test
* test
* fix test
* fix test
* remove print
* add fix
* [shardformer] pp+tp+zero1
[shardformer] pp+tp+zero1
[shardformer] pp+tp+zero1
[shardformer] pp+tp+zero1
[shardformer] pp+tp+zero1
[shardformer] pp+tp+zero1
* [shardformer] pp+tp+zero1
* [shardformer] pp+tp+zero1
* [shardformer] pp+tp+zero1
* [shardformer] pp+tp+zero1
* [shardformer] fix submodule replacement bug when enabling pp (#4544 )
* [shardformer] support sharded optimizer checkpointIO of HybridParallelPlugin (#4540 )
* implement sharded optimizer saving
* add more param info
* finish implementation of sharded optimizer saving
* fix bugs in optimizer sharded saving
* add pp+zero test
* param group loading
* greedy loading of optimizer
* fix bug when loading
* implement optimizer sharded saving
* add optimizer test & arrange checkpointIO utils
* fix gemini sharding state_dict
* add verbose option
* add loading of master params
* fix typehint
* fix master/working mapping in fp16 amp
* [shardformer] add bert finetune example
* [shardformer] add bert finetune example
* [shardformer] add bert finetune example
* [shardformer] add bert finetune example
* [shardformer] fix epoch change
* [shardformer] broadcast add pp group
* rebase feature/shardformer
* update pipeline
* [shardformer] fix
* [shardformer] fix
* [shardformer] bert finetune fix
* [shardformer] add all_reduce operation to loss
add all_reduce operation to loss
* [shardformer] make compatible with pytree.
make compatible with pytree.
* [shardformer] disable tp
disable tp
* [shardformer] add 3d plugin to ci test
* [shardformer] update num_microbatches to None
* [shardformer] update microbatchsize
* [shardformer] update assert
* update scheduler
* update scheduler
---------
Co-authored-by: Jianghai <72591262+CjhHa1@users.noreply.github.com>
Co-authored-by: Bin Jia <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: Baizhou Zhang <eddiezhang@pku.edu.cn>
1 year ago
Hongxin Liu
63ecafb1fb
[checkpointio] optimize zero optim checkpoint io ( #4591 )
...
* [zero] update checkpoint io to save memory
* [checkpointio] add device map to save memory
1 year ago
Hongxin Liu
508ca36fe3
[pipeline] 1f1b schedule receive microbatch size ( #4589 )
1 year ago
Baizhou Zhang
38ccb8b1a3
[shardformer] support from_pretrained when loading model with HybridParallelPlugin ( #4575 )
...
* hybrid plugin support huggingface from_pretrained
* add huggingface compatibility tests
* add folder cleaning
* fix bugs
1 year ago
Baizhou Zhang
c9625dbb63
[shardformer] support sharded optimizer checkpointIO of HybridParallelPlugin ( #4540 )
...
* implement sharded optimizer saving
* add more param info
* finish implementation of sharded optimizer saving
* fix bugs in optimizer sharded saving
* add pp+zero test
* param group loading
* greedy loading of optimizer
* fix bug when loading
* implement optimizer sharded saving
* add optimizer test & arrange checkpointIO utils
* fix gemini sharding state_dict
* add verbose option
* add loading of master params
* fix typehint
* fix master/working mapping in fp16 amp
1 year ago
Baizhou Zhang
44eab2b27f
[shardformer] support sharded checkpoint IO for models of HybridParallelPlugin ( #4506 )
...
* add APIs
* implement save_sharded_model
* add test for hybrid checkpointio
* implement naive loading for sharded model
* implement efficient sharded model loading
* open a new file for hybrid checkpoint_io
* small fix
* fix circular importing
* fix docstring
* arrange arguments and apis
* small fix
1 year ago
Hongxin Liu
27061426f7
[gemini] improve compatibility and add static placement policy ( #4479 )
...
* [gemini] remove distributed-related part from colotensor (#4379 )
* [gemini] remove process group dependency
* [gemini] remove tp part from colo tensor
* [gemini] patch inplace op
* [gemini] fix param op hook and update tests
* [test] remove useless tests
* [test] remove useless tests
* [misc] fix requirements
* [test] fix model zoo
* [test] fix model zoo
* [test] fix model zoo
* [test] fix model zoo
* [test] fix model zoo
* [misc] update requirements
* [gemini] refactor gemini optimizer and gemini ddp (#4398 )
* [gemini] update optimizer interface
* [gemini] renaming gemini optimizer
* [gemini] refactor gemini ddp class
* [example] update gemini related example
* [example] update gemini related example
* [plugin] fix gemini plugin args
* [test] update gemini ckpt tests
* [gemini] fix checkpoint io
* [example] fix opt example requirements
* [example] fix opt example
* [example] fix opt example
* [example] fix opt example
* [gemini] add static placement policy (#4443 )
* [gemini] add static placement policy
* [gemini] fix param offload
* [test] update gemini tests
* [plugin] update gemini plugin
* [plugin] update gemini plugin docstr
* [misc] fix flash attn requirement
* [test] fix gemini checkpoint io test
* [example] update resnet example result (#4457 )
* [example] update bert example result (#4458 )
* [doc] update gemini doc (#4468 )
* [example] update gemini related examples (#4473 )
* [example] update gpt example
* [example] update dreambooth example
* [example] update vit
* [example] update opt
* [example] update palm
* [example] update vit and opt benchmark
* [hotfix] fix bert in model zoo (#4480 )
* [hotfix] fix bert in model zoo
* [test] remove chatglm gemini test
* [test] remove sam gemini test
* [test] remove vit gemini test
* [hotfix] fix opt tutorial example (#4497 )
* [hotfix] fix opt tutorial example
* [hotfix] fix opt tutorial example
1 year ago
Baizhou Zhang
1c7df566e2
[shardformer] support tp+zero for shardformer ( #4472 )
...
* support tp+zero/input type cast for hybridplugin
* add tp+zero tests
* fix bucket arguments
1 year ago
Bin Jia
7c8be77081
[shardformer/sequence parallel] support gpt2 seq parallel with pp/dp/tp ( #4460 )
...
* support gpt2 seq parallel with pp/dp/tp
* fix a bug when waiting for stream done
* delete unused gpt2_seq file
1 year ago
Baizhou Zhang
6ef33f75aa
[shardformer] support DDP in HybridPlugin/add tp+dp tests ( #4446 )
...
* support DDP for HybridPlugin/add tp+dp tests
* add docstring for HybridParallelPlugin
1 year ago
Bin Jia
424629fea0
[shardformer/sequence parallel] Cherry pick commit to new branch ( #4450 )
...
* [shardformer/sequence parallel] Support sequence parallel for gpt2 (#4384 )
* [sequence parallel] add sequence parallel linear col/row support (#4336 )
* add sequence parallel linear col/row support
* add annotation
* add annotation
* add support for gpt2 fused qkv linear layer
* support sequence parallel in GPT2
* add docstring and note
* add requirments
* remove unused flash-attb
* modify flash attn test
* modify flash attn setting
* modify flash attn code
* add assert before divide, rename forward function
* [shardformer/test] fix gpt2 test with seq-parallel
* [shardformer/sequence parallel] Overlap input gather and grad computation during col backward (#4401 )
* overlap gather input / grad computing during col backward
* modify test for overlap
* simplify code
* fix code and modify cuda stream synchronize
* [shardformer/sequence parallel] polish code
1 year ago
Hongxin Liu
172f7fa3cf
[misc] resolve code factor issues ( #4433 )
1 year ago
flybird1111
d2cd48e0be
[shardformer] test all optimizations ( #4399 )
...
[shardformer] test all optimizations
[shardformer] test all optimizations
[shardformer] test all optimizations
1 year ago
Baizhou Zhang
ed4c448488
[pipeline] rewrite t5 tests & support multi-tensor transmitting in pipeline ( #4388 )
...
* fix remaining t5 bugs/rewrite t5 tests
* fix multi-tensor communication in pipeline
* rearrange test_config
* fix keyerror in sync_shared_params
* fix get_held_layers & Randomnizer, complete t5 tests
* erase printing
* fix get_held_layers through modifying _release_unheld_layers
* fix _get_recursive_held_layers bug
1 year ago
Baizhou Zhang
b1feeced8e
[shardformer] add util functions for shardformer tests/fix sync_shared_param ( #4366 )
...
* add util functions for shardformer tests & rewrite gpt2 test
* fix shared_params & embedding/merging
* fix precision
1 year ago
Baizhou Zhang
0ceec8f9a9
[pipeline] support fp32 for HybridPlugin/merge shardformer test and pipeline test into one file ( #4354 )
...
* add naive optimizer for 3DPlugin/refactor gpt2 shardformer test
* merge tests of PP/DP/TP combinations into one test file
* fix bug when sync grad for dp in HybridPlugin
* update supported precisions for 3DPlugin/fix bug when shifting tp_degree
* improve the passing of lazy_init
* modify lazy_init/use sync_shared_params
1 year ago
Hongxin Liu
261eab02fb
[plugin] add 3d parallel plugin ( #4295 )
...
* [amp] add mixed precision optimizer
* [plugin] add 3d parallel plugin
* [booster] support pipeline
* [plugin] 3d parallel plugin support clip grad norm
* [shardformer] fix sharder and add plugin test
* [plugin] rename 3d parallel plugin
* [ci] support testmon core pkg change detection (#4305 )
* [hotfix] debug testmon
* [hotfix] fix llama
* [hotfix] fix p2p bugs
* [hotfix] fix requirements
1 year ago
LuGY
1a49a5ea00
[zero] support shard optimizer state dict of zero ( #4194 )
...
* support shard optimizer of zero
* polish code
* support sync grad manually
1 year ago
LuGY
79cf1b5f33
[zero]support no_sync method for zero1 plugin ( #4138 )
...
* support no sync for zero1 plugin
* polish
* polish
1 year ago
梁爽
abe4f971e0
[NFC] polish colossalai/booster/plugin/low_level_zero_plugin.py code style ( #4256 )
...
Co-authored-by: supercooledith <893754954@qq.com>
1 year ago
Jianghai
b366f1d99f
[NFC] Fix format for mixed precision ( #4253 )
...
* [NFC] polish colossalai/booster/mixed_precision/mixed_precision_base.py code style
1 year ago
Baizhou Zhang
c6f6005990
[checkpointio] Sharded Optimizer Checkpoint for Gemini Plugin ( #4302 )
...
* sharded optimizer checkpoint for gemini plugin
* modify test to reduce testing time
* update doc
* fix bug when keep_gatherd is true under GeminiPlugin
1 year ago
Baizhou Zhang
58913441a1
Next commit [checkpointio] Unsharded Optimizer Checkpoint for Gemini Plugin ( #4141 )
...
* [checkpointio] unsharded optimizer checkpoint for Gemini plugin
* [checkpointio] unsharded optimizer checkpoint for Gemini using all_gather
1 year ago
Baizhou Zhang
0bb0b481b4
[gemini] fix argument naming during chunk configuration searching
1 year ago
Baizhou Zhang
822c3d4d66
[checkpointio] sharded optimizer checkpoint for DDP plugin ( #4002 )
1 year ago
Wenhao Chen
725af3eeeb
[booster] make optimizer argument optional for boost ( #3993 )
...
* feat: make optimizer optional in Booster.boost
* test: skip unet test if diffusers version > 0.10.2
1 year ago
Baizhou Zhang
c9cff7e7fa
[checkpointio] General Checkpointing of Sharded Optimizers ( #3984 )
1 year ago
Frank Lee
71fe52769c
[gemini] fixed the gemini checkpoint io ( #3934 )
1 year ago
Frank Lee
bd1ab98158
[gemini] fixed the gemini checkpoint io ( #3934 )
1 year ago