Frank Lee
|
d857f3dbba
|
[shardformer] supported T5 and its variants (#4045)
|
2023-07-04 16:05:01 +08:00 |
Frank Lee
|
c1d5453e9f
|
[shardformer] adapted llama to the new API (#4036)
|
2023-07-04 16:05:01 +08:00 |
FoolPlayer
|
74d176c8d8
|
[shardformer] fix bert and gpt downstream with new api (#4024)
* fix bert downstream with new api
* remove comment line
|
2023-07-04 16:05:01 +08:00 |
FoolPlayer
|
dfca9678fa
|
integrate with dist layer (#4011)
|
2023-07-04 16:05:01 +08:00 |
FoolPlayer
|
f7774ec0f3
|
[Shardformer] Downstream bert (#3979)
* add dist dropout in model
* update docstring and bert policy with dropout
* refactor basepolicy and sharded, update bert
* update format
* update gpt2 policy
* update bert policy
* remove unused code
* update readme for new policy usage
* add downstream model of bert
* remove unused code
|
2023-07-04 16:05:01 +08:00 |
wukong1992
|
c1c672d0f0
|
[shardformer] shardformer support t5 model (#3994)
test t5
|
2023-07-04 16:05:01 +08:00 |
wukong1992
|
6b30dfb7ce
|
[shardformer] support llama model using shardformer (#3969)
adjust layer attr
|
2023-07-04 16:05:01 +08:00 |
FoolPlayer
|
f1cb5ac6bf
|
[shardformer] Align bert value (#3907)
* add bert align test, fix dist loss bug
* forward and backward align
* add ignore index
* add shardformer CI
* add gather_output optional for user in shardconfig
* update readme with optional gather_ouput
* add dist crossentropy loss test, remove unused files
* remove unused file
* remove unused file
* rename the file
* polish code
|
2023-07-04 16:05:01 +08:00 |