12 Commits (ckpt)

Author SHA1 Message Date
Wenxuan Tan 8fd25d6e09
[Feature] Split cross-entropy computation in SP (#5959) 2 months ago
flybird11111 0c10afd372
[FP8] rebase main (#5963) 4 months ago
Hongxin Liu 9664b1bc19
[shardformer] hotfix attn mask (#5945) 4 months ago
Edenzzzz fbf33ecd01
[Feature] Enable PP + SP for llama (#5868) 5 months ago
flybird11111 2ddf624a86
[shardformer] upgrade transformers to 4.39.3 (#5815) 5 months ago
Haze188 22ce873c3f
[Shardformer] Add parallel output for shardformer models(bloom, falcon) (#5702) 6 months ago
wangbluo 4e50cce26b fix the mistral model 7 months ago
wangbluo 2632916329 remove useless code 7 months ago
wangbluo 9efc79ef24 add parallel output for mistral model 7 months ago
Hongxin Liu 1b387ca9fe
[shardformer] refactor pipeline grad ckpt config (#5646) 7 months ago
Wang Binluo 0d0a582033
[shardformer] update transformers (#5583) 7 months ago
Frank Lee 7cfed5f076
[feat] refactored extension module (#5298) 10 months ago
Wenhao Chen 7172459e74
[shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert (#5088) 1 year ago