Commit Graph

109 Commits (feat/moe)

Author SHA1 Message Date
Wenhao Chen 1810b9100f [pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp (#5134)
11 months ago
flybird11111 79718fae04
[shardformer] llama support DistCrossEntropy (#5176)
12 months ago
Wenhao Chen 7172459e74
[shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert (#5088)
1 year ago
flybird11111 3e02154710
[gemini] gemini support extra-dp (#5043)
1 year ago
littsk 1a3315e336
[hotfix] Add layer norm gradients all-reduce for sequence parallel (#4926)
1 year ago
Hongxin Liu 1f5d2e8062
[hotfix] fix torch 2.0 compatibility (#4936)
1 year ago
littsk 83b52c56cd
[feature] Add clip_grad_norm for hybrid_parallel_plugin (#4837)
1 year ago
littsk ffd9a3cbc9
[hotfix] fix bug in sequence parallel test (#4887)
1 year ago
Hongxin Liu 079bf3cb26
[misc] update pre-commit and run all files (#4752)
1 year ago
flybird11111 eedaa3e1ef
[shardformer]fix gpt2 double head (#4663)
1 year ago
flybird11111 7486ed7d3a
[shardformer] update llama2/opt finetune example and fix llama2 policy (#4645)
1 year ago
Hongxin Liu bd18678478
[test] fix gemini checkpoint and gpt test (#4620)
1 year ago
Hongxin Liu e71d245293
[test] ignore gpt2 shardformer test (#4619)
1 year ago
Hongxin Liu a39a5c66fe
Merge branch 'main' into feature/shardformer
1 year ago
Jianghai 24c0768795
[shardformer] Pytree fix (#4533)
1 year ago
Baizhou Zhang 2c787d7f47
[shardformer] fix submodule replacement bug when enabling pp (#4544)
1 year ago
flybird11111 ec18fc7340
[shardformer] support pp+tp+zero1 tests (#4531)
1 year ago
flybird11111 d367b88785
[shardformer] fix opt test hanging (#4521)
1 year ago
Bin Jia e241b74f24
[shardformer] Add overlap support for gpt2 (#4535)
1 year ago
Baizhou Zhang 0387a47e63
[shardformer] fix emerged bugs after updating transformers (#4526)
1 year ago
Bin Jia c554b7f559
[shardformer/fix overlap bug] fix overlap bug, add overlap as an option in shardco… (#4516)
1 year ago
Jianghai 376533a564
[shardformer] zero1+pp and the corresponding tests (#4517)
1 year ago
flybird11111 de8a65babc
[shardformer] opt fix. (#4514)
1 year ago
flybird11111 3353e55c80
[shardformer] vit/llama/t5 ignore the sequence parallelism flag and some fix. (#4498)
1 year ago
Hongxin Liu 27061426f7
[gemini] improve compatibility and add static placement policy (#4479)
1 year ago
Jianghai e04436a82a
[shardformer] tests for 3d parallel (#4493)
1 year ago
Jianghai 5545114fd8
rename chatglm to chatglm2 (#4484)
1 year ago
Baizhou Zhang 1c7df566e2
[shardformer] support tp+zero for shardformer (#4472)
1 year ago
Jianghai 8739aa7fa0
[shardformer] Pipeline/whisper (#4456)
1 year ago
Bin Jia 7c8be77081
[shardformer/sequence parallel] support gpt2 seq parallel with pp/dp/tp (#4460)
1 year ago
Baizhou Zhang 6ef33f75aa
[shardformer] support DDP in HybridPlugin/add tp+dp tests (#4446)
1 year ago
Bin Jia 424629fea0
[shardformer/sequence parallel] Cherry pick commit to new branch (#4450)
1 year ago
Hongxin Liu 172f7fa3cf [misc] resolve code factor issues (#4433)
1 year ago
flybird11111 328a791d10 [shardformer] update bloom/llama/vit/chatglm tests (#4420)
1 year ago
flybird11111 108e54a0b4 [shardformer]update t5 tests for using all optimizations. (#4407)
1 year ago
flybird11111 1edc9b5fb3 [shardformer] update tests for all optimization (#4413)
1 year ago
Baizhou Zhang 7711bd524a [shardformer] rewrite tests for opt/bloom/llama/vit/chatglm (#4395)
1 year ago
flybird11111 21e0a42fd1 [shardformer]fix, test gpt2 for AMP+TP (#4403)
1 year ago
Jianghai 7596e9ae08 [pipeline] rewrite bert tests and fix some bugs (#4409)
1 year ago
flybird1111 d2cd48e0be [shardformer] test all optimizations (#4399)
1 year ago
Baizhou Zhang ed4c448488 [pipeline] rewrite t5 tests & support multi-tensor transmitting in pipeline (#4388)
1 year ago
flybird1111 906426cb44 [Shardformer] Merge flash attention branch to pipeline branch (#4362)
1 year ago
Jianghai a88e92251d [pipeline] add chatglm (#4363)
1 year ago
Baizhou Zhang b1feeced8e [shardformer] add util functions for shardformer tests/fix sync_shared_param (#4366)
1 year ago
Bin Jia 5c6f183192 [test] Hotfix/fix some model test and refactor check util api (#4369)
1 year ago
FoolPlayer 726541afe2 update some module with new api version
1 year ago
FoolPlayer 879301d0da [shardformer] support Blip2 (#4243)
1 year ago
klhhhhh 8120eca0c0 [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit
1 year ago
klhhhhh 1a29e8fc29 [shardformer] polish chatglm code
1 year ago
klhhhhh 8620009dd7 [sharformer] add first version of policy of chatglm
1 year ago