Jianghai
1094e0f0d3
[pipeline] Bert pipeline for shardformer and its tests ( #4197 )
...
* add pipeline forward
* complete pipeline forward check
* fix bert forward without pipeline
* fix comments
* discard useless line
* add todo
* clean prints
* fix distribute layers
2023-08-15 23:25:14 +08:00
Hongxin Liu
890774b2fb
[shardformer] support lazy init ( #4202 )
...
* [shardformer] support lazy init
* [shardformer] linear support lazy init
* [shardformer] embedding support lazy init
* [shardformer] norm support lazy init
* [shardformer] fused linear support lazy init
* [test] update shardformer test layer
* [test] shardformer with lazy init fit ddp
* [lazy] hotfix deepcopy of param
* [shardformer] fix bert policy and update test
* [shardformer] fix bloom policy and update test
* [shardformer] fix opt policy and update test
* [shardformer] fix t5 policy and update test
* [shardformer] fix gpt2 policy and update test
* [shardformer] fix llama policy and update test
2023-08-15 23:25:14 +08:00
Jianghai
f3bcc292c8
[pipeline] move bert related pipeline components to shardformer ( #4187 )
...
* move bert related pipeline components to shardformer
* fix bugs
* revision
* fix bert model tests
* fix bert_lm_head model tests
* fix tests
* fix tests
* done checks
* skip bloom
2023-08-15 23:25:14 +08:00
ver217
5fc60a3a04
[test] add shard util tests
2023-08-15 23:25:14 +08:00
ver217
2d6cc07feb
[test] update shardformer tests
2023-08-15 23:25:14 +08:00
github-actions[bot]
c77b3b19be
[format] applied code formatting on changed files in pull request 4152 ( #4157 )
...
Co-authored-by: github-actions <github-actions@github.com>
2023-07-04 16:07:47 +08:00
Frank Lee
1fb0d95df0
[shardformer] made tensor parallelism configurable ( #4144 )
...
* [shardformer] made tensor parallelism configurable
* polish code
2023-07-04 16:05:01 +08:00
Frank Lee
74257cb446
[shardformer] refactored some doc and api ( #4137 )
...
* [shardformer] refactored some doc and api
* polish code
2023-07-04 16:05:01 +08:00
Frank Lee
ae035d305d
[shardformer] added embedding gradient check ( #4124 )
2023-07-04 16:05:01 +08:00
Frank Lee
6a88bae4ec
[shardformer] integrate with data parallelism ( #4103 )
2023-07-04 16:05:01 +08:00
Frank Lee
f3b6aaa6b7
[shardformer] supported fused normalization ( #4112 )
2023-07-04 16:05:01 +08:00
Frank Lee
b1c2901530
[shardformer] supported bloom model ( #4098 )
2023-07-04 16:05:01 +08:00
Kun Lin
8af29ee47a
[shardformer] support vision transformer ( #4096 )
...
* first v of vit shardformer
* keep vit
* update
* vit shard add vitattention vitlayer
* update num head shard para
* finish test for vit
* add new_model_class & postprocess
* add vit readme
* delete old files & fix the conflict
* fix sth
2023-07-04 16:05:01 +08:00
jiangmingyan
ac80937138
[shardformer] shardformer support opt models ( #4091 )
...
* [shardformer] shardformer support opt models
* [shardformer] shardformer support opt models, fix
* [shardformer] shardformer support opt models, fix
* [shardformer] shardformer support opt models, fix
2023-07-04 16:05:01 +08:00
Frank Lee
d33a44e8c3
[shardformer] refactored layernorm ( #4086 )
2023-07-04 16:05:01 +08:00
FoolPlayer
92f6791095
[shardformer] Add layernorm ( #4072 )
...
* add layernorm to bert
* add layernorm test
* add layernorm test with load state dict
* add use_mixedfusedLN in shard config
* refactor policy to support fused_layernorm
2023-07-04 16:05:01 +08:00
Frank Lee
70c58cfd4f
[shardformer] supported fused qkv checkpoint ( #4073 )
2023-07-04 16:05:01 +08:00
FoolPlayer
0803a61412
[shardformer] add linearconv1d test ( #4067 )
...
* add linearconv1d test
* add linearconv1d test
2023-07-04 16:05:01 +08:00
Frank Lee
8eb09a4c69
[shardformer] support module saving and loading ( #4062 )
...
* [shardformer] support module saving and loading
* polish code
2023-07-04 16:05:01 +08:00
FoolPlayer
7740c55c55
support kit use for bert/gpt test ( #4055 )
...
* support kit use for bert test
* support kit test for gpt2
2023-07-04 16:05:01 +08:00
Frank Lee
f22ddacef0
[shardformer] refactored the shardformer layer structure ( #4053 )
2023-07-04 16:05:01 +08:00
Frank Lee
58df720570
[shardformer] adapted T5 and LLaMa test to use kit ( #4049 )
...
* [shardformer] adapted T5 and LLaMa test to use kit
* polish code
2023-07-04 16:05:01 +08:00
FoolPlayer
4021b9a8a2
[shardformer] add gpt2 test and layer class refactor ( #4041 )
...
* add gpt2 test and layer class refactor
* add dropout in gpt2 policy
2023-07-04 16:05:01 +08:00
Frank Lee
d857f3dbba
[shardformer] supported T5 and its variants ( #4045 )
2023-07-04 16:05:01 +08:00
Frank Lee
c1d5453e9f
[shardformer] adapted llama to the new API ( #4036 )
2023-07-04 16:05:01 +08:00
FoolPlayer
74d176c8d8
[shardformer] fix bert and gpt downstream with new api ( #4024 )
...
* fix bert downstream with new api
* remove comment line
2023-07-04 16:05:01 +08:00
FoolPlayer
507c0ad368
add vocabembedding layer
2023-07-04 16:05:01 +08:00
Frank Lee
3893fa1a8d
[shardformer] refactored embedding and dropout to parallel module ( #4013 )
...
* [shardformer] refactored embedding and dropout to parallel module
* polish code
2023-07-04 16:05:01 +08:00
FoolPlayer
dfca9678fa
integrate with dist layer ( #4011 )
2023-07-04 16:05:01 +08:00
Frank Lee
015af592f8
[shardformer] integrated linear 1D with dtensor ( #3996 )
...
* [shardformer] integrated linear 1D with dtensor
* polish code
2023-07-04 16:05:01 +08:00
FoolPlayer
f7774ec0f3
[Shardformer] Downstream bert ( #3979 )
...
* add dist dropout in model
* update docstring and bert policy with dropout
* refactor basepolicy and sharded, update bert
* update format
* update gpt2 policy
* update bert policy
* remove unused code
* update readme for new policy usage
* add downstream model of bert
* remove unused code
2023-07-04 16:05:01 +08:00
wukong1992
c1c672d0f0
[shardformer] shardformer support t5 model ( #3994 )
...
test t5
2023-07-04 16:05:01 +08:00
wukong1992
6b30dfb7ce
[shardformer] support llama model using shardformer ( #3969 )
...
adjust layer attr
2023-07-04 16:05:01 +08:00
FoolPlayer
a73130482d
[shardformer] Unit test ( #3928 )
...
* fix bug in slicer, add slicer unit test
* add dropout test
* use pid as dropout seed
* updata dropout test with local pattern
* ad todo
2023-07-04 16:05:01 +08:00
FoolPlayer
f1cb5ac6bf
[shardformer] Align bert value ( #3907 )
...
* add bert align test, fix dist loss bug
* forward and backward align
* add ignore index
* add shardformer CI
* add gather_output optional for user in shardconfig
* update readme with optional gather_ouput
* add dist crossentropy loss test, remove unused files
* remove unused file
* remove unused file
* rename the file
* polish code
2023-07-04 16:05:01 +08:00