Frank Lee
|
1fb0d95df0
|
[shardformer] made tensor parallelism configurable (#4144)
* [shardformer] made tensor parallelism configurable
* polish code
|
2023-07-04 16:05:01 +08:00 |
Frank Lee
|
ae035d305d
|
[shardformer] added embedding gradient check (#4124)
|
2023-07-04 16:05:01 +08:00 |
Frank Lee
|
6a88bae4ec
|
[shardformer] integrate with data parallelism (#4103)
|
2023-07-04 16:05:01 +08:00 |
Frank Lee
|
58df720570
|
[shardformer] adapted T5 and LLaMa test to use kit (#4049)
* [shardformer] adapted T5 and LLaMa test to use kit
* polish code
|
2023-07-04 16:05:01 +08:00 |
Frank Lee
|
d857f3dbba
|
[shardformer] supported T5 and its variants (#4045)
|
2023-07-04 16:05:01 +08:00 |
Frank Lee
|
c1d5453e9f
|
[shardformer] adapted llama to the new API (#4036)
|
2023-07-04 16:05:01 +08:00 |
wukong1992
|
6b30dfb7ce
|
[shardformer] support llama model using shardformer (#3969)
adjust layer attr
|
2023-07-04 16:05:01 +08:00 |