FoolPlayer
|
74d176c8d8
|
[shardformer] fix bert and gpt downstream with new api (#4024)
* fix bert downstream with new api
* remove comment line
|
2023-07-04 16:05:01 +08:00 |
FoolPlayer
|
507c0ad368
|
add vocabembedding layer
|
2023-07-04 16:05:01 +08:00 |
Frank Lee
|
3893fa1a8d
|
[shardformer] refactored embedding and dropout to parallel module (#4013)
* [shardformer] refactored embedding and dropout to parallel module
* polish code
|
2023-07-04 16:05:01 +08:00 |
FoolPlayer
|
dfca9678fa
|
integrate with dist layer (#4011)
|
2023-07-04 16:05:01 +08:00 |
Frank Lee
|
015af592f8
|
[shardformer] integrated linear 1D with dtensor (#3996)
* [shardformer] integrated linear 1D with dtensor
* polish code
|
2023-07-04 16:05:01 +08:00 |
FoolPlayer
|
f7774ec0f3
|
[Shardformer] Downstream bert (#3979)
* add dist dropout in model
* update docstring and bert policy with dropout
* refactor basepolicy and sharded, update bert
* update format
* update gpt2 policy
* update bert policy
* remove unused code
* update readme for new policy usage
* add downstream model of bert
* remove unused code
|
2023-07-04 16:05:01 +08:00 |
wukong1992
|
c1c672d0f0
|
[shardformer] shardformer support t5 model (#3994)
test t5
|
2023-07-04 16:05:01 +08:00 |
wukong1992
|
6b30dfb7ce
|
[shardformer] support llama model using shardformer (#3969)
adjust layer attr
|
2023-07-04 16:05:01 +08:00 |
FoolPlayer
|
a73130482d
|
[shardformer] Unit test (#3928)
* fix bug in slicer, add slicer unit test
* add dropout test
* use pid as dropout seed
* updata dropout test with local pattern
* ad todo
|
2023-07-04 16:05:01 +08:00 |
FoolPlayer
|
f1cb5ac6bf
|
[shardformer] Align bert value (#3907)
* add bert align test, fix dist loss bug
* forward and backward align
* add ignore index
* add shardformer CI
* add gather_output optional for user in shardconfig
* update readme with optional gather_ouput
* add dist crossentropy loss test, remove unused files
* remove unused file
* remove unused file
* rename the file
* polish code
|
2023-07-04 16:05:01 +08:00 |