Commit Graph

17 Commits (b3f5d7a3ba01fdd015866162608348fe480f1d55)

Author SHA1 Message Date
ver217 d35bd7d0e6 [shardformer] fix type hint 2023-08-15 23:25:14 +08:00
ver217 59f6f573f1 [pipeline] update shardformer policy 2023-08-15 23:25:14 +08:00
Frank Lee 1fb0d95df0 [shardformer] made tensor parallelism configurable (#4144)
* [shardformer] made tensor parallelism configurable

* polish code
2023-07-04 16:05:01 +08:00
Frank Lee 74257cb446 [shardformer] refactored some doc and api (#4137)
* [shardformer] refactored some doc and api

* polish code
2023-07-04 16:05:01 +08:00
jiangmingyan 7f9b30335b [shardformer] write an shardformer example with bert finetuning (#4126)
* [shardformer] add benchmark of shardformer

* [shardformer] add benchmark of shardformer
2023-07-04 16:05:01 +08:00
Frank Lee 44a190e6ac [shardformer] import huggingface implicitly (#4101) 2023-07-04 16:05:01 +08:00
Frank Lee 6a88bae4ec [shardformer] integrate with data parallelism (#4103) 2023-07-04 16:05:01 +08:00
Frank Lee f3b6aaa6b7 [shardformer] supported fused normalization (#4112) 2023-07-04 16:05:01 +08:00
FoolPlayer 92f6791095 [shardformer] Add layernorm (#4072)
* add layernorm to bert

* add layernorm test

* add layernorm test with load state dict

* add use_mixedfusedLN in shard config

* refactor policy to support fused_layernorm
2023-07-04 16:05:01 +08:00
Frank Lee c1d5453e9f [shardformer] adapted llama to the new API (#4036) 2023-07-04 16:05:01 +08:00
FoolPlayer 74d176c8d8 [shardformer] fix bert and gpt downstream with new api (#4024)
* fix bert downstream with new api

* remove comment line
2023-07-04 16:05:01 +08:00
FoolPlayer d3bc530849 [shardformer] Refactor shardformer api (#4001)
* fix an error in readme

* simplify code

* refactor shardformer

* add todo

* remove slicer

* resolve code review
2023-07-04 16:05:01 +08:00
FoolPlayer f7774ec0f3 [Shardformer] Downstream bert (#3979)
* add dist dropout in model

* update docstring and bert policy with dropout

* refactor basepolicy and sharded, update bert

* update format

* update gpt2 policy

* update bert policy

* remove unused code

* update readme for new policy usage

* add downstream model of bert

* remove unused code
2023-07-04 16:05:01 +08:00
FoolPlayer f1cb5ac6bf [shardformer] Align bert value (#3907)
* add bert align test, fix dist loss bug

* forward and backward align

* add ignore index

* add shardformer CI

* add gather_output optional for user in shardconfig

* update readme with optional gather_ouput

* add dist crossentropy loss test, remove unused files

* remove unused file

* remove unused file

* rename the file

* polish code
2023-07-04 16:05:01 +08:00
Frank Lee 4972e1f40e [shardformer] refactored the user api (#3828)
* [shardformer] refactored the user api

* polish code
2023-07-04 16:05:01 +08:00
Frank Lee ddcf58cacf
Revert "[sync] sync feature/shardformer with develop" 2023-06-09 09:41:27 +08:00
Frank Lee 537a52b7a2 [shardformer] refactored the user api (#3828)
* [shardformer] refactored the user api

* polish code
2023-06-08 15:01:34 +08:00