Commit Graph

17 Commits (10a19e22c63aa9963a889874b63c47ccd0e6db42)

Author SHA1 Message Date
Wang Binluo d3f34ee8cc
[Shardformer] add assert for num of attention heads divisible by tp_size (#5670)
7 months ago
Wang Binluo 0d0a582033
[shardformer] update transformers (#5583)
7 months ago
flybird11111 a0ad587c24
[shardformer] refactor embedding resize (#5603)
7 months ago
Wenhao Chen e614aa34f3
[shardformer, pipeline] add `gradient_checkpointing_ratio` and heterogenous shard policy for llama (#5508)
8 months ago
Insu Jang 00525f7772
[shardformer] fix pipeline forward error if custom layer distribution is used (#5189)
8 months ago
Hongxin Liu 19e1a5cf16
[shardformer] update colo attention to support custom mask (#5510)
8 months ago
digger yu 71321a07cf
fix typo change dosen't to doesn't (#5308)
10 months ago
Wenhao Chen 7172459e74
[shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert (#5088)
1 year ago
littsk 1a3315e336
[hotfix] Add layer norm gradients all-reduce for sequence parallel (#4926)
1 year ago
Hongxin Liu 079bf3cb26
[misc] update pre-commit and run all files (#4752)
1 year ago
flybird11111 20190b49a5
[shardformer] to fix whisper test failed due to significant accuracy differences. (#4710)
1 year ago
flybird11111 d367b88785
[shardformer] fix opt test hanging (#4521)
1 year ago
flybird11111 3353e55c80
[shardformer] vit/llama/t5 ignore the sequence parallelism flag and some fix. (#4498)
1 year ago
Jianghai 8739aa7fa0
[shardformer] Pipeline/whisper (#4456)
1 year ago
flybird1111 906426cb44 [Shardformer] Merge flash attention branch to pipeline branch (#4362)
1 year ago
FoolPlayer 726541afe2 update some module with new api version
1 year ago
FoolPlayer 9ee4ebea83 [shardformer] support whisper (#4212)
1 year ago