Commit Graph

12 Commits (8fd25d6e09069a8437c6ebee8dd83e1de4c9b83d)

Author SHA1 Message Date
Wenxuan Tan 8fd25d6e09
[Feature] Split cross-entropy computation in SP (#5959)
3 months ago
Hongxin Liu 9664b1bc19
[shardformer] hotfix attn mask (#5945)
4 months ago
Edenzzzz fbf33ecd01
[Feature] Enable PP + SP for llama (#5868)
5 months ago
flybird11111 2ddf624a86
[shardformer] upgrade transformers to 4.39.3 (#5815)
6 months ago
Haze188 22ce873c3f
[Shardformer] Add parallel output for shardformer models(bloom, falcon) (#5702)
6 months ago
wangbluo 4e50cce26b fix the mistral model
7 months ago
wangbluo 2632916329 remove useless code
7 months ago
wangbluo 9efc79ef24 add parallel output for mistral model
7 months ago
Hongxin Liu 1b387ca9fe
[shardformer] refactor pipeline grad ckpt config (#5646)
7 months ago
Wang Binluo 0d0a582033
[shardformer] update transformers (#5583)
7 months ago
Frank Lee 7cfed5f076
[feat] refactored extension module (#5298)
10 months ago
Wenhao Chen 7172459e74
[shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert (#5088)
1 year ago