* Feature/vit support (#4182)
* [shardformer] added tests
* [shardformer] vit test finish and support
* fix attention dropout
* support base vit pipeline
* support vit downstream model
* fix vit shard test
* modify hidden states return type
---------
Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com>