7 Commits (ckpt)

Author SHA1 Message Date
Wenxuan Tan 8fd25d6e09
[Feature] Split cross-entropy computation in SP (#5959) 2 months ago
Wang Binluo b2483c8e31
[fp8] support hybrid parallel plugin (#5982) 3 months ago
flybird11111 0c10afd372
[FP8] rebase main (#5963) 4 months ago
Hongxin Liu 9664b1bc19
[shardformer] hotfix attn mask (#5945) 4 months ago
Guangyao Zhang 1c961b20f3
[ShardFormer] fix qwen2 sp (#5903) 4 months ago
Guangyao Zhang 669849d74b
[ShardFormer] Add Ulysses Sequence Parallelism support for Command-R, Qwen2 and ChatGLM (#5897) 5 months ago
Edenzzzz fbf33ecd01
[Feature] Enable PP + SP for llama (#5868) 5 months ago
Jianghai 8ab46b4000
[Shardformer] change qwen2 modeling into gradient checkpointing style (#5874) 5 months ago
Hongxin Liu 73e88a5553
[shardformer] fix import (#5788) 6 months ago
Wang Binluo 537f6a3855
[Shardformer]fix the num_heads assert for llama model and qwen model (#5704) 7 months ago
Wang Binluo a3cc68ca93
[Shardformer] Support the Qwen2 model (#5699) 7 months ago
wangbluo 4e50cce26b fix the mistral model 7 months ago
wangbluo 2632916329 remove useless code 7 months ago
wangbluo 9efc79ef24 add parallel output for mistral model 7 months ago
Hongxin Liu 1b387ca9fe
[shardformer] refactor pipeline grad ckpt config (#5646) 7 months ago
Wang Binluo 0d0a582033
[shardformer] update transformers (#5583) 7 months ago
Frank Lee 7cfed5f076
[feat] refactored extension module (#5298) 10 months ago
Wenhao Chen 7172459e74
[shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert (#5088) 1 year ago