Frank Lee
|
1fb0d95df0
|
[shardformer] made tensor parallelism configurable (#4144)
* [shardformer] made tensor parallelism configurable
* polish code
|
2023-07-04 16:05:01 +08:00 |
Frank Lee
|
ae035d305d
|
[shardformer] added embedding gradient check (#4124)
|
2023-07-04 16:05:01 +08:00 |
Frank Lee
|
6a88bae4ec
|
[shardformer] integrate with data parallelism (#4103)
|
2023-07-04 16:05:01 +08:00 |
FoolPlayer
|
7740c55c55
|
support kit use for bert/gpt test (#4055)
* support kit use for bert test
* support kit test for gpt2
|
2023-07-04 16:05:01 +08:00 |
FoolPlayer
|
4021b9a8a2
|
[shardformer] add gpt2 test and layer class refactor (#4041)
* add gpt2 test and layer class refactor
* add dropout in gpt2 policy
|
2023-07-04 16:05:01 +08:00 |
Frank Lee
|
c1d5453e9f
|
[shardformer] adapted llama to the new API (#4036)
|
2023-07-04 16:05:01 +08:00 |
FoolPlayer
|
74d176c8d8
|
[shardformer] fix bert and gpt downstream with new api (#4024)
* fix bert downstream with new api
* remove comment line
|
2023-07-04 16:05:01 +08:00 |
FoolPlayer
|
dfca9678fa
|
integrate with dist layer (#4011)
|
2023-07-04 16:05:01 +08:00 |
FoolPlayer
|
f7774ec0f3
|
[Shardformer] Downstream bert (#3979)
* add dist dropout in model
* update docstring and bert policy with dropout
* refactor basepolicy and sharded, update bert
* update format
* update gpt2 policy
* update bert policy
* remove unused code
* update readme for new policy usage
* add downstream model of bert
* remove unused code
|
2023-07-04 16:05:01 +08:00 |
FoolPlayer
|
f1cb5ac6bf
|
[shardformer] Align bert value (#3907)
* add bert align test, fix dist loss bug
* forward and backward align
* add ignore index
* add shardformer CI
* add gather_output optional for user in shardconfig
* update readme with optional gather_ouput
* add dist crossentropy loss test, remove unused files
* remove unused file
* remove unused file
* rename the file
* polish code
|
2023-07-04 16:05:01 +08:00 |