FoolPlayer
a73130482d
[shardformer] Unit test ( #3928 )
...
* fix bug in slicer, add slicer unit test
* add dropout test
* use pid as dropout seed
* updata dropout test with local pattern
* ad todo
2023-07-04 16:05:01 +08:00
FoolPlayer
f1cb5ac6bf
[shardformer] Align bert value ( #3907 )
...
* add bert align test, fix dist loss bug
* forward and backward align
* add ignore index
* add shardformer CI
* add gather_output optional for user in shardconfig
* update readme with optional gather_ouput
* add dist crossentropy loss test, remove unused files
* remove unused file
* remove unused file
* rename the file
* polish code
2023-07-04 16:05:01 +08:00
FoolPlayer
79f8d5d54b
[shardformer] add gpt2 policy and modify shard and slicer to support ( #3883 )
...
* add gpt2 policy and modify shard and slicer to support
* remove unused code
* polish code
2023-07-04 16:05:01 +08:00
FoolPlayer
70173e3123
update README ( #3909 )
2023-07-04 16:05:01 +08:00
FoolPlayer
ab8a47f830
[shardformer] add Dropout layer support different dropout pattern ( #3856 )
...
* add dropout layer, add dropout test
* modify seed manager as context manager
* add a copy of col_nn.layer
* add dist_crossentropy loss; separate module test
* polish the code
* fix dist crossentropy loss
2023-07-04 16:05:01 +08:00
FoolPlayer
c594dc2f1c
[shardformer] update readme with modules implement doc ( #3834 )
...
* update readme with modules content
* remove img
2023-07-04 16:05:01 +08:00
Frank Lee
4972e1f40e
[shardformer] refactored the user api ( #3828 )
...
* [shardformer] refactored the user api
* polish code
2023-07-04 16:05:01 +08:00
Frank Lee
235792f170
[shardformer] updated readme ( #3827 )
2023-07-04 16:05:01 +08:00
FoolPlayer
8cc11235c0
[shardformer]: Feature/shardformer, add some docstring and readme ( #3816 )
...
* init shardformer code structure
* add implement of sharder (inject and replace)
* add implement of replace layer to colossal layer
* separate different layer policy, add some notion
* implement 1d and 2d slicer, can tell col or row
* fix bug when slicing and inject model
* fix some bug; add inference test example
* add share weight and train example
* add train
* add docstring and readme
* add docstring for other files
* pre-commit
2023-07-04 16:05:01 +08:00
FoolPlayer
8d68de767d
[shardformer] init shardformer code structure ( #3731 )
...
* init shardformer code structure
* add implement of sharder (inject and replace)
* add implement of replace layer to colossal layer
* separate different layer policy, add some notion
* implement 1d and 2d slicer, can tell col or row
* fix bug when slicing and inject model
* fix some bug; add inference test example
2023-07-04 16:05:01 +08:00
Baizhou Zhang
1350ece492
[hotfix] fix import bug in checkpoint_io ( #4142 )
2023-07-03 22:14:37 +08:00
digger yu
8abc87798f
fix Tensor is not defined ( #4129 )
2023-07-03 17:10:18 +08:00
digger yu
7e46bc87b6
fix CheckpointIndexFile is not defined ( #4109 )
2023-07-03 17:09:06 +08:00
digger yu
09fe9dc704
[nfc]fix ColossalaiOptimizer is not defined ( #4122 )
2023-06-30 17:23:22 +08:00
Frank Lee
95e95b6d58
[testing] move pytest to be inside the function ( #4087 )
2023-06-27 11:02:25 +08:00
Baizhou Zhang
0bb0b481b4
[gemini] fix argument naming during chunk configuration searching
2023-06-25 13:34:15 +08:00
github-actions[bot]
a52f62082d
[format] applied code formatting on changed files in pull request 4021 ( #4022 )
...
Co-authored-by: github-actions <github-actions@github.com>
2023-06-19 11:23:24 +08:00
Baizhou Zhang
822c3d4d66
[checkpointio] sharded optimizer checkpoint for DDP plugin ( #4002 )
2023-06-16 14:14:05 +08:00
Wenhao Chen
725af3eeeb
[booster] make optimizer argument optional for boost ( #3993 )
...
* feat: make optimizer optional in Booster.boost
* test: skip unet test if diffusers version > 0.10.2
2023-06-15 17:38:42 +08:00
Baizhou Zhang
c9cff7e7fa
[checkpointio] General Checkpointing of Sharded Optimizers ( #3984 )
2023-06-15 15:21:26 +08:00
Frank Lee
71fe52769c
[gemini] fixed the gemini checkpoint io ( #3934 )
2023-06-12 15:11:27 +08:00
Frank Lee
ddcf58cacf
Revert "[sync] sync feature/shardformer with develop"
2023-06-09 09:41:27 +08:00
FoolPlayer
24651fdd4f
Merge pull request #3931 from FrankLeeeee/sync/develop-to-shardformer
...
[sync] sync feature/shardformer with develop
2023-06-09 09:34:00 +08:00
FoolPlayer
ef1537759c
[shardformer] add gpt2 policy and modify shard and slicer to support ( #3883 )
...
* add gpt2 policy and modify shard and slicer to support
* remove unused code
* polish code
2023-06-08 15:01:34 +08:00
FoolPlayer
6370a935f6
update README ( #3909 )
2023-06-08 15:01:34 +08:00
FoolPlayer
21a3915c98
[shardformer] add Dropout layer support different dropout pattern ( #3856 )
...
* add dropout layer, add dropout test
* modify seed manager as context manager
* add a copy of col_nn.layer
* add dist_crossentropy loss; separate module test
* polish the code
* fix dist crossentropy loss
2023-06-08 15:01:34 +08:00
FoolPlayer
997544c1f9
[shardformer] update readme with modules implement doc ( #3834 )
...
* update readme with modules content
* remove img
2023-06-08 15:01:34 +08:00
Frank Lee
537a52b7a2
[shardformer] refactored the user api ( #3828 )
...
* [shardformer] refactored the user api
* polish code
2023-06-08 15:01:34 +08:00
Frank Lee
bc19024bf9
[shardformer] updated readme ( #3827 )
2023-06-08 15:01:34 +08:00
FoolPlayer
58f6432416
[shardformer]: Feature/shardformer, add some docstring and readme ( #3816 )
...
* init shardformer code structure
* add implement of sharder (inject and replace)
* add implement of replace layer to colossal layer
* separate different layer policy, add some notion
* implement 1d and 2d slicer, can tell col or row
* fix bug when slicing and inject model
* fix some bug; add inference test example
* add share weight and train example
* add train
* add docstring and readme
* add docstring for other files
* pre-commit
2023-06-08 15:01:34 +08:00
FoolPlayer
6a69b44dfc
[shardformer] init shardformer code structure ( #3731 )
...
* init shardformer code structure
* add implement of sharder (inject and replace)
* add implement of replace layer to colossal layer
* separate different layer policy, add some notion
* implement 1d and 2d slicer, can tell col or row
* fix bug when slicing and inject model
* fix some bug; add inference test example
2023-06-08 15:01:34 +08:00
Frank Lee
eb39154d40
[dtensor] updated api and doc ( #3845 )
2023-06-08 10:18:17 +08:00
digger yu
de0d7df33f
[nfc] fix typo colossalai/zero ( #3923 )
2023-06-08 00:01:29 +08:00
digger yu
a9d1cadc49
fix typo with colossalai/trainer utils zero ( #3908 )
2023-06-07 16:08:37 +08:00
Frank Lee
d51e83d642
Merge pull request #3916 from FrankLeeeee/sync/dtensor-with-develop
...
[sync] sync feature/dtensor with develop
2023-06-07 11:50:43 +08:00
Hongxin Liu
9c88b6cbd1
[lazy] fix compatibility problem on torch 1.13 ( #3911 )
2023-06-07 11:10:12 +08:00
digger yu
0e484e6201
[nfc]fix typo colossalai/pipeline tensor nn ( #3899 )
...
* fix typo colossalai/autochunk auto_parallel amp
* fix typo colossalai/auto_parallel nn utils etc.
* fix typo colossalai/auto_parallel autochunk fx/passes etc.
* fix typo docs/
* change placememt_policy to placement_policy in docs/ and examples/
* fix typo colossalai/ applications/
* fix typo colossalai/cli fx kernel
* fix typo colossalai/nn
* revert change warmuped
* fix typo colossalai/pipeline tensor nn
2023-06-06 14:07:36 +08:00
Baizhou Zhang
c1535ccbba
[doc] fix docs about booster api usage ( #3898 )
2023-06-06 13:36:11 +08:00
digger yu
1878749753
[nfc] fix typo colossalai/nn ( #3887 )
...
* fix typo colossalai/autochunk auto_parallel amp
* fix typo colossalai/auto_parallel nn utils etc.
* fix typo colossalai/auto_parallel autochunk fx/passes etc.
* fix typo docs/
* change placememt_policy to placement_policy in docs/ and examples/
* fix typo colossalai/ applications/
* fix typo colossalai/cli fx kernel
* fix typo colossalai/nn
* revert change warmuped
2023-06-05 16:04:27 +08:00
Hongxin Liu
ae02d4e4f7
[bf16] add bf16 support ( #3882 )
...
* [bf16] add bf16 support for fused adam (#3844 )
* [bf16] fused adam kernel support bf16
* [test] update fused adam kernel test
* [test] update fused adam test
* [bf16] cpu adam and hybrid adam optimizers support bf16 (#3860 )
* [bf16] implement mixed precision mixin and add bf16 support for low level zero (#3869 )
* [bf16] add mixed precision mixin
* [bf16] low level zero optim support bf16
* [text] update low level zero test
* [text] fix low level zero grad acc test
* [bf16] add bf16 support for gemini (#3872 )
* [bf16] gemini support bf16
* [test] update gemini bf16 test
* [doc] update gemini docstring
* [bf16] add bf16 support for plugins (#3877 )
* [bf16] add bf16 support for legacy zero (#3879 )
* [zero] init context support bf16
* [zero] legacy zero support bf16
* [test] add zero bf16 test
* [doc] add bf16 related docstring for legacy zero
2023-06-05 15:58:31 +08:00
Liu Ziming
8065cc5fba
Modify torch version requirement to adapt torch 2.0 ( #3896 )
2023-06-05 15:57:35 +08:00
Hongxin Liu
dbb32692d2
[lazy] refactor lazy init ( #3891 )
...
* [lazy] remove old lazy init
* [lazy] refactor lazy init folder structure
* [lazy] fix lazy tensor deepcopy
* [test] update lazy init test
2023-06-05 14:20:47 +08:00
digger yu
70c8cdecf4
[nfc] fix typo colossalai/cli fx kernel ( #3847 )
...
* fix typo colossalai/autochunk auto_parallel amp
* fix typo colossalai/auto_parallel nn utils etc.
* fix typo colossalai/auto_parallel autochunk fx/passes etc.
* fix typo docs/
* change placememt_policy to placement_policy in docs/ and examples/
* fix typo colossalai/ applications/
* fix typo colossalai/cli fx kernel
2023-06-02 15:02:45 +08:00
digger yu
e2d81eba0d
[nfc] fix typo colossalai/ applications/ ( #3831 )
...
* fix typo colossalai/autochunk auto_parallel amp
* fix typo colossalai/auto_parallel nn utils etc.
* fix typo colossalai/auto_parallel autochunk fx/passes etc.
* fix typo docs/
* change placememt_policy to placement_policy in docs/ and examples/
* fix typo colossalai/ applications/
2023-05-25 16:19:41 +08:00
wukong1992
3229f93e30
[booster] add warning for torch fsdp plugin doc ( #3833 )
2023-05-25 14:00:02 +08:00
Hongxin Liu
7c9f2ed6dd
[dtensor] polish sharding spec docstring ( #3838 )
...
* [dtensor] polish sharding spec docstring
* [dtensor] polish sharding spec example docstring
2023-05-25 13:09:42 +08:00
digger yu
7f8203af69
fix typo colossalai/auto_parallel autochunk fx/passes etc. ( #3808 )
2023-05-24 09:01:50 +08:00
wukong1992
6b305a99d6
[booster] torch fsdp fix ckpt ( #3788 )
2023-05-23 16:58:45 +08:00
digger yu
9265f2d4d7
[NFC]fix typo colossalai/auto_parallel nn utils etc. ( #3779 )
...
* fix typo colossalai/autochunk auto_parallel amp
* fix typo colossalai/auto_parallel nn utils etc.
2023-05-23 15:28:20 +08:00
jiangmingyan
e871e342b3
[API] add docstrings and initialization to apex amp, naive amp ( #3783 )
...
* [mixed_precison] add naive amp demo
* [mixed_precison] add naive amp demo
* [api] add docstrings and initialization to apex amp, naive amp
* [api] add docstring to apex amp/ naive amp
* [api] add docstring to apex amp/ naive amp
* [api] add docstring to apex amp/ naive amp
* [api] add docstring to apex amp/ naive amp
* [api] add docstring to apex amp/ naive amp
* [api] add docstring to apex amp/ naive amp
* [api] fix
* [api] fix
2023-05-23 15:17:24 +08:00