FoolPlayer
|
0803a61412
|
[shardformer] add linearconv1d test (#4067)
* add linearconv1d test
* add linearconv1d test
|
2023-07-04 16:05:01 +08:00 |
Frank Lee
|
8eb09a4c69
|
[shardformer] support module saving and loading (#4062)
* [shardformer] support module saving and loading
* polish code
|
2023-07-04 16:05:01 +08:00 |
FoolPlayer
|
7740c55c55
|
support kit use for bert/gpt test (#4055)
* support kit use for bert test
* support kit test for gpt2
|
2023-07-04 16:05:01 +08:00 |
Frank Lee
|
f22ddacef0
|
[shardformer] refactored the shardformer layer structure (#4053)
|
2023-07-04 16:05:01 +08:00 |
Frank Lee
|
58df720570
|
[shardformer] adapted T5 and LLaMa test to use kit (#4049)
* [shardformer] adapted T5 and LLaMa test to use kit
* polish code
|
2023-07-04 16:05:01 +08:00 |
FoolPlayer
|
4021b9a8a2
|
[shardformer] add gpt2 test and layer class refactor (#4041)
* add gpt2 test and layer class refactor
* add dropout in gpt2 policy
|
2023-07-04 16:05:01 +08:00 |
Frank Lee
|
d857f3dbba
|
[shardformer] supported T5 and its variants (#4045)
|
2023-07-04 16:05:01 +08:00 |
Frank Lee
|
c1d5453e9f
|
[shardformer] adapted llama to the new API (#4036)
|
2023-07-04 16:05:01 +08:00 |
FoolPlayer
|
74d176c8d8
|
[shardformer] fix bert and gpt downstream with new api (#4024)
* fix bert downstream with new api
* remove comment line
|
2023-07-04 16:05:01 +08:00 |
FoolPlayer
|
507c0ad368
|
add vocabembedding layer
|
2023-07-04 16:05:01 +08:00 |
Frank Lee
|
3893fa1a8d
|
[shardformer] refactored embedding and dropout to parallel module (#4013)
* [shardformer] refactored embedding and dropout to parallel module
* polish code
|
2023-07-04 16:05:01 +08:00 |
FoolPlayer
|
dfca9678fa
|
integrate with dist layer (#4011)
|
2023-07-04 16:05:01 +08:00 |
Frank Lee
|
015af592f8
|
[shardformer] integrated linear 1D with dtensor (#3996)
* [shardformer] integrated linear 1D with dtensor
* polish code
|
2023-07-04 16:05:01 +08:00 |
FoolPlayer
|
f7774ec0f3
|
[Shardformer] Downstream bert (#3979)
* add dist dropout in model
* update docstring and bert policy with dropout
* refactor basepolicy and sharded, update bert
* update format
* update gpt2 policy
* update bert policy
* remove unused code
* update readme for new policy usage
* add downstream model of bert
* remove unused code
|
2023-07-04 16:05:01 +08:00 |
wukong1992
|
c1c672d0f0
|
[shardformer] shardformer support t5 model (#3994)
test t5
|
2023-07-04 16:05:01 +08:00 |
wukong1992
|
6b30dfb7ce
|
[shardformer] support llama model using shardformer (#3969)
adjust layer attr
|
2023-07-04 16:05:01 +08:00 |
FoolPlayer
|
a73130482d
|
[shardformer] Unit test (#3928)
* fix bug in slicer, add slicer unit test
* add dropout test
* use pid as dropout seed
* updata dropout test with local pattern
* ad todo
|
2023-07-04 16:05:01 +08:00 |
FoolPlayer
|
f1cb5ac6bf
|
[shardformer] Align bert value (#3907)
* add bert align test, fix dist loss bug
* forward and backward align
* add ignore index
* add shardformer CI
* add gather_output optional for user in shardconfig
* update readme with optional gather_ouput
* add dist crossentropy loss test, remove unused files
* remove unused file
* remove unused file
* rename the file
* polish code
|
2023-07-04 16:05:01 +08:00 |