Frank Lee
1fb0d95df0
[shardformer] made tensor parallelism configurable ( #4144 )
...
* [shardformer] made tensor parallelism configurable
* polish code
2023-07-04 16:05:01 +08:00
Frank Lee
74257cb446
[shardformer] refactored some doc and api ( #4137 )
...
* [shardformer] refactored some doc and api
* polish code
2023-07-04 16:05:01 +08:00
Frank Lee
ae035d305d
[shardformer] added embedding gradient check ( #4124 )
2023-07-04 16:05:01 +08:00
Frank Lee
44a190e6ac
[shardformer] import huggingface implicitly ( #4101 )
2023-07-04 16:05:01 +08:00
Frank Lee
f3b6aaa6b7
[shardformer] supported fused normalization ( #4112 )
2023-07-04 16:05:01 +08:00
Frank Lee
b1c2901530
[shardformer] supported bloom model ( #4098 )
2023-07-04 16:05:01 +08:00
Kun Lin
8af29ee47a
[shardformer] support vision transformer ( #4096 )
...
* first v of vit shardformer
* keep vit
* update
* vit shard add vitattention vitlayer
* update num head shard para
* finish test for vit
* add new_model_class & postprocess
* add vit readme
* delete old files & fix the conflict
* fix sth
2023-07-04 16:05:01 +08:00
Frank Lee
f22ddacef0
[shardformer] refactored the shardformer layer structure ( #4053 )
2023-07-04 16:05:01 +08:00
Frank Lee
58df720570
[shardformer] adapted T5 and LLaMa test to use kit ( #4049 )
...
* [shardformer] adapted T5 and LLaMa test to use kit
* polish code
2023-07-04 16:05:01 +08:00
Frank Lee
d857f3dbba
[shardformer] supported T5 and its variants ( #4045 )
2023-07-04 16:05:01 +08:00
wukong1992
c1c672d0f0
[shardformer] shardformer support t5 model ( #3994 )
...
test t5
2023-07-04 16:05:01 +08:00