Commit Graph

47 Commits (feat/moe)

Author SHA1 Message Date
Elsa Granger d565df3821
[pipeline] A more general _communicate in p2p (#5062)
11 months ago
Wenhao Chen d799a3088f
[pipeline]: add p2p fallback order and fix interleaved pp deadlock (#5214)
11 months ago
Wenhao Chen 4fa689fca1
[pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp (#5134)
11 months ago
Wenhao Chen 7172459e74
[shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert (#5088)
1 year ago
Hongxin Liu 079bf3cb26
[misc] update pre-commit and run all files (#4752)
1 year ago
Hongxin Liu b5f9e37c70
[legacy] clean up legacy code (#4743)
1 year ago
Hongxin Liu 554aa9592e
[legacy] move communication and nn to legacy and refactor logger (#4671)
1 year ago
Baizhou Zhang 660eed9124
[pipeline] set optimizer to optional in execute_pipeline (#4630)
1 year ago
Hongxin Liu fae6c92ead
Merge branch 'main' into feature/shardformer
1 year ago
Hongxin Liu 89fe027787 [legacy] move trainer to legacy (#4545)
1 year ago
Hongxin Liu a39a5c66fe
Merge branch 'main' into feature/shardformer
1 year ago
Hongxin Liu 508ca36fe3
[pipeline] 1f1b schedule receive microbatch size (#4589)
1 year ago
Hongxin Liu 27061426f7
[gemini] improve compatibility and add static placement policy (#4479)
1 year ago
Jianghai 8739aa7fa0
[shardformer] Pipeline/whisper (#4456)
1 year ago
LuGY a78daf6180
[shardformer] support interleaved pipeline (#4448)
1 year ago
github-actions[bot] d20dceb9a3
[format] applied code formatting on changed files in pull request 4441 (#4445)
1 year ago
Jianghai a88e92251d [pipeline] add chatglm (#4363)
1 year ago
Jianghai f13954cd58 [pipeline] refactor test pipeline and remove useless utils in pipeline (#4324)
1 year ago
LuGY d3c6cd66f3 [pipeline] add unit test for 1f1b (#4303)
1 year ago
Baizhou Zhang 36e546b2cc [pipeline] add pipeline support for T5Stack/T5EncoderModel (#4300)
1 year ago
Jianghai d8408d185c [pipeline] OPT model pipeline (#4258)
1 year ago
Jianghai e7cc62d735 [pipeline] All bert models (#4233)
1 year ago
Jianghai f3bcc292c8 [pipeline] move bert related pipeline components to shardformer (#4187)
1 year ago
Jianghai c5ea728016 [pipeline] add bert_for_pretraining bert_lmhead forward and policy (#4172)
1 year ago
Jianghai 90a65ea682 [pipeline] build bloom model and policy , revise the base class of policy (#4161)
1 year ago
Jianghai c552cefa93 [pipeline]add pipeline policy and bert forward (#4130)
1 year ago
Hongxin Liu 5c897ddb94 [pipeline] add stage manager (#4093)
1 year ago
Jianghai e8e7e49243 [pipeline]add pipeline policy and bert forward (#4130)
1 year ago
Hongxin Liu f51ce1bc8e [pipeline] refactor 1f1b schedule (#4115)
1 year ago
Hongxin Liu 45fdc9b42c [pipeline] implement p2p communication (#4100)
1 year ago
Hongxin Liu 422544222f [pipeline] add stage manager (#4093)
1 year ago
Frank Lee 80eba05b0a
[test] refactor tests with spawn (#3452)
2 years ago
Ziyue Jiang 09d69e1c25
[PP Middleware] Add bwd and step for PP middleware (#2111)
2 years ago
Ziyue Jiang e4705ba4e2
[Pipeline Middleware] fix data race in Pipeline Scheduler for DAG (#2087)
2 years ago
Ziyue Jiang 597cdd3006
[Pipeline Middleware] Adapt scheduler for Topo (#2066)
2 years ago
Ziyue Jiang b0936e4a44
[rpc] split with dag (#2028)
2 years ago
Super Daniel 393f594051
[fx/meta/rpc] move _meta_registration.py to fx folder / register fx functions with compatibility checks / remove color debug (#1710)
2 years ago
Kirigaya Kazuto 9708638ded
[pipeline/pytree] add pytree to process args and kwargs | provide `data_process_func` to process args and kwargs after forward (#1642)
2 years ago
Kirigaya Kazuto 170fa81095
[pipeline/chimera] test chimera | fix bug of initializing (#1615)
2 years ago
Kirigaya Kazuto edc9e419ad
[pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera (#1595)
2 years ago
Kirigaya Kazuto 6159d45417
[pipeline/tuning] improve dispatch performance both time and space cost (#1544)
2 years ago
Kirigaya Kazuto f1e1836218
[pipeline/pipleline_process_group] finish PipelineProcessGroup to manage local abd global rank in TP,DP and PP (#1508)
2 years ago
Kirigaya Kazuto 5a6fd71f90
[pipeline/rpc] update outstanding mechanism | optimize dispatching strategy (#1497)
2 years ago
Kirigaya Kazuto 9145aef2b4
[pipeline/rpc] implement distributed optimizer | test with assert_close (#1486)
2 years ago
Kirigaya Kazuto a6c8749198
[pipeline/rpc] support interleaving | fix checkpoint bug | change logic when dispatch data in work_list to ensure steady 1F1B (#1483)
2 years ago
Kirigaya Kazuto bb5f5289e0
[pipeline/rpc] implement a demo for PP with cuda rpc framework (#1470)
2 years ago
Frank Lee 2b2dc1c86b
[pipeline] refactor the pipeline module (#1087)
2 years ago