Frank Lee
|
1fb0d95df0
|
[shardformer] made tensor parallelism configurable (#4144)
* [shardformer] made tensor parallelism configurable
* polish code
|
2023-07-04 16:05:01 +08:00 |
Frank Lee
|
ae035d305d
|
[shardformer] added embedding gradient check (#4124)
|
2023-07-04 16:05:01 +08:00 |
Frank Lee
|
6a88bae4ec
|
[shardformer] integrate with data parallelism (#4103)
|
2023-07-04 16:05:01 +08:00 |
Frank Lee
|
70c58cfd4f
|
[shardformer] supported fused qkv checkpoint (#4073)
|
2023-07-04 16:05:01 +08:00 |
FoolPlayer
|
0803a61412
|
[shardformer] add linearconv1d test (#4067)
* add linearconv1d test
* add linearconv1d test
|
2023-07-04 16:05:01 +08:00 |
FoolPlayer
|
7740c55c55
|
support kit use for bert/gpt test (#4055)
* support kit use for bert test
* support kit test for gpt2
|
2023-07-04 16:05:01 +08:00 |
FoolPlayer
|
4021b9a8a2
|
[shardformer] add gpt2 test and layer class refactor (#4041)
* add gpt2 test and layer class refactor
* add dropout in gpt2 policy
|
2023-07-04 16:05:01 +08:00 |