Commit Graph

27 Commits (8e718a1421203e0f5607f477e1a998567c70d123)

Author SHA1 Message Date
GuangyaoZhang d84d68601a change 'xxx if xxx else None' to 'xxx or None'
5 months ago
Haze188 22ce873c3f
[Shardformer] Add parallel output for shardformer models(bloom, falcon) (#5702)
6 months ago
Wang Binluo d3f34ee8cc
[Shardformer] add assert for num of attention heads divisible by tp_size (#5670)
7 months ago
Wang Binluo 0d0a582033
[shardformer] update transformers (#5583)
7 months ago
flybird11111 a0ad587c24
[shardformer] refactor embedding resize (#5603)
7 months ago
Zhongkai Zhao 8e412a548e
[shardformer] Sequence Parallelism Optimization (#5533)
8 months ago
Wenhao Chen e614aa34f3
[shardformer, pipeline] add `gradient_checkpointing_ratio` and heterogenous shard policy for llama (#5508)
8 months ago
Insu Jang 00525f7772
[shardformer] fix pipeline forward error if custom layer distribution is used (#5189)
8 months ago
Wenhao Chen 7172459e74
[shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert (#5088)
1 year ago
littsk 1a3315e336
[hotfix] Add layer norm gradients all-reduce for sequence parallel (#4926)
1 year ago
Hongxin Liu 079bf3cb26
[misc] update pre-commit and run all files (#4752)
1 year ago
Bin Jia c554b7f559
[shardformer/fix overlap bug] fix overlap bug, add overlap as an option in shardco… (#4516)
1 year ago
flybird11111 59e252ecdb
[shardformer] chatglm support sequence parallel (#4482)
1 year ago
flybird11111 0ecd71e041
[shardformer] bloom support sequence parallel (#4465)
1 year ago
flybird1111 906426cb44 [Shardformer] Merge flash attention branch to pipeline branch (#4362)
1 year ago
Jianghai 18ebcf406a [pipeline] reformat for unified design (#4283)
1 year ago
Hongxin Liu d921ce8391 [shardformer] support inplace sharding (#4251)
1 year ago
Jianghai 34f0e34a4c [pipeline] finish bloom models pipeline and tests (#4223)
1 year ago
Jianghai 37d22f6878 [pipeline] add bloom model pipeline (#4210)
1 year ago
Hongxin Liu 890774b2fb [shardformer] support lazy init (#4202)
1 year ago
ver217 1ed3f8a24f [shardformer] rename policy file name
1 year ago
Frank Lee 89f45eda5a [shardformer] added development protocol for standardization (#4149)
1 year ago
Frank Lee 1fb0d95df0 [shardformer] made tensor parallelism configurable (#4144)
1 year ago
Frank Lee 74257cb446 [shardformer] refactored some doc and api (#4137)
1 year ago
Frank Lee ae035d305d [shardformer] added embedding gradient check (#4124)
1 year ago
Frank Lee f3b6aaa6b7 [shardformer] supported fused normalization (#4112)
1 year ago
Frank Lee b1c2901530 [shardformer] supported bloom model (#4098)
1 year ago