Frank Lee
b1c2901530
[shardformer] supported bloom model ( #4098 )
2023-07-04 16:05:01 +08:00
Kun Lin
8af29ee47a
[shardformer] support vision transformer ( #4096 )
...
* first v of vit shardformer
* keep vit
* update
* vit shard add vitattention vitlayer
* update num head shard para
* finish test for vit
* add new_model_class & postprocess
* add vit readme
* delete old files & fix the conflict
* fix sth
2023-07-04 16:05:01 +08:00
Frank Lee
d33a44e8c3
[shardformer] refactored layernorm ( #4086 )
2023-07-04 16:05:01 +08:00
FoolPlayer
92f6791095
[shardformer] Add layernorm ( #4072 )
...
* add layernorm to bert
* add layernorm test
* add layernorm test with load state dict
* add use_mixedfusedLN in shard config
* refactor policy to support fused_layernorm
2023-07-04 16:05:01 +08:00
Frank Lee
70c58cfd4f
[shardformer] supported fused qkv checkpoint ( #4073 )
2023-07-04 16:05:01 +08:00
FoolPlayer
0803a61412
[shardformer] add linearconv1d test ( #4067 )
...
* add linearconv1d test
* add linearconv1d test
2023-07-04 16:05:01 +08:00
Frank Lee
8eb09a4c69
[shardformer] support module saving and loading ( #4062 )
...
* [shardformer] support module saving and loading
* polish code
2023-07-04 16:05:01 +08:00
Frank Lee
f22ddacef0
[shardformer] refactored the shardformer layer structure ( #4053 )
2023-07-04 16:05:01 +08:00
FoolPlayer
507c0ad368
add vocabembedding layer
2023-07-04 16:05:01 +08:00
Frank Lee
3893fa1a8d
[shardformer] refactored embedding and dropout to parallel module ( #4013 )
...
* [shardformer] refactored embedding and dropout to parallel module
* polish code
2023-07-04 16:05:01 +08:00
Frank Lee
015af592f8
[shardformer] integrated linear 1D with dtensor ( #3996 )
...
* [shardformer] integrated linear 1D with dtensor
* polish code
2023-07-04 16:05:01 +08:00