ver217
d35bd7d0e6
[shardformer] fix type hint
2023-08-15 23:25:14 +08:00
ver217
59f6f573f1
[pipeline] update shardformer policy
2023-08-15 23:25:14 +08:00
Frank Lee
1fb0d95df0
[shardformer] made tensor parallelism configurable ( #4144 )
...
* [shardformer] made tensor parallelism configurable
* polish code
2023-07-04 16:05:01 +08:00
Frank Lee
74257cb446
[shardformer] refactored some doc and api ( #4137 )
...
* [shardformer] refactored some doc and api
* polish code
2023-07-04 16:05:01 +08:00
jiangmingyan
7f9b30335b
[shardformer] write an shardformer example with bert finetuning ( #4126 )
...
* [shardformer] add benchmark of shardformer
* [shardformer] add benchmark of shardformer
2023-07-04 16:05:01 +08:00
Frank Lee
44a190e6ac
[shardformer] import huggingface implicitly ( #4101 )
2023-07-04 16:05:01 +08:00
Frank Lee
6a88bae4ec
[shardformer] integrate with data parallelism ( #4103 )
2023-07-04 16:05:01 +08:00
Frank Lee
f3b6aaa6b7
[shardformer] supported fused normalization ( #4112 )
2023-07-04 16:05:01 +08:00
FoolPlayer
92f6791095
[shardformer] Add layernorm ( #4072 )
...
* add layernorm to bert
* add layernorm test
* add layernorm test with load state dict
* add use_mixedfusedLN in shard config
* refactor policy to support fused_layernorm
2023-07-04 16:05:01 +08:00
Frank Lee
c1d5453e9f
[shardformer] adapted llama to the new API ( #4036 )
2023-07-04 16:05:01 +08:00
FoolPlayer
74d176c8d8
[shardformer] fix bert and gpt downstream with new api ( #4024 )
...
* fix bert downstream with new api
* remove comment line
2023-07-04 16:05:01 +08:00
FoolPlayer
d3bc530849
[shardformer] Refactor shardformer api ( #4001 )
...
* fix an error in readme
* simplify code
* refactor shardformer
* add todo
* remove slicer
* resolve code review
2023-07-04 16:05:01 +08:00
FoolPlayer
f7774ec0f3
[Shardformer] Downstream bert ( #3979 )
...
* add dist dropout in model
* update docstring and bert policy with dropout
* refactor basepolicy and sharded, update bert
* update format
* update gpt2 policy
* update bert policy
* remove unused code
* update readme for new policy usage
* add downstream model of bert
* remove unused code
2023-07-04 16:05:01 +08:00
FoolPlayer
f1cb5ac6bf
[shardformer] Align bert value ( #3907 )
...
* add bert align test, fix dist loss bug
* forward and backward align
* add ignore index
* add shardformer CI
* add gather_output optional for user in shardconfig
* update readme with optional gather_ouput
* add dist crossentropy loss test, remove unused files
* remove unused file
* remove unused file
* rename the file
* polish code
2023-07-04 16:05:01 +08:00
Frank Lee
4972e1f40e
[shardformer] refactored the user api ( #3828 )
...
* [shardformer] refactored the user api
* polish code
2023-07-04 16:05:01 +08:00
Frank Lee
ddcf58cacf
Revert "[sync] sync feature/shardformer with develop"
2023-06-09 09:41:27 +08:00
Frank Lee
537a52b7a2
[shardformer] refactored the user api ( #3828 )
...
* [shardformer] refactored the user api
* polish code
2023-06-08 15:01:34 +08:00