56 Commits (80c3c8789bdfc095618a0b725a4980cd575c6c6b)

Author SHA1 Message Date
Hongxin Liu 7f8b16635b
[misc] refactor launch API and tensor constructor (#5666) 7 months ago
Hongxin Liu 1b387ca9fe
[shardformer] refactor pipeline grad ckpt config (#5646) 7 months ago
Hongxin Liu 641b1ee71a
[devops] remove post commit ci (#5566) 8 months ago
Wenhao Chen e614aa34f3
[shardformer, pipeline] add `gradient_checkpointing_ratio` and heterogenous shard policy for llama (#5508) 8 months ago
Insu Jang 00525f7772
[shardformer] fix pipeline forward error if custom layer distribution is used (#5189) 8 months ago
Wenhao Chen bb0a668fee
[hotfix] set return_outputs=False in examples and polish code (#5404) 8 months ago
Wenhao Chen ef4f0ee854
[hotfix]: add pp sanity check and fix mbs arg (#5268) 10 months ago
Hongxin Liu d202cc28c0
[npu] change device to accelerator api (#5239) 11 months ago
Elsa Granger d565df3821
[pipeline] A more general _communicate in p2p (#5062) 11 months ago
Wenhao Chen d799a3088f
[pipeline]: add p2p fallback order and fix interleaved pp deadlock (#5214) 11 months ago
Wenhao Chen 4fa689fca1
[pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp (#5134) 11 months ago
Wenhao Chen 7172459e74
[shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert (#5088) 1 year ago
Hongxin Liu 079bf3cb26
[misc] update pre-commit and run all files (#4752) 1 year ago
Hongxin Liu b5f9e37c70
[legacy] clean up legacy code (#4743) 1 year ago
Hongxin Liu 554aa9592e
[legacy] move communication and nn to legacy and refactor logger (#4671) 1 year ago
Baizhou Zhang 660eed9124
[pipeline] set optimizer to optional in execute_pipeline (#4630) 1 year ago
Hongxin Liu 89fe027787 [legacy] move trainer to legacy (#4545) 1 year ago
Hongxin Liu 508ca36fe3
[pipeline] 1f1b schedule receive microbatch size (#4589) 1 year ago
Hongxin Liu 27061426f7
[gemini] improve compatibility and add static placement policy (#4479) 1 year ago
Jianghai 8739aa7fa0
[shardformer] Pipeline/whisper (#4456) 1 year ago
LuGY a78daf6180
[shardformer] support interleaved pipeline (#4448) 1 year ago
github-actions[bot] d20dceb9a3
[format] applied code formatting on changed files in pull request 4441 (#4445) 1 year ago
Jianghai a88e92251d [pipeline] add chatglm (#4363) 1 year ago
Jianghai f13954cd58 [pipeline] refactor test pipeline and remove useless utils in pipeline (#4324) 1 year ago
LuGY d3c6cd66f3 [pipeline] add unit test for 1f1b (#4303) 1 year ago
Baizhou Zhang 36e546b2cc [pipeline] add pipeline support for T5Stack/T5EncoderModel (#4300) 1 year ago
Jianghai d8408d185c [pipeline] OPT model pipeline (#4258) 1 year ago
Jianghai e7cc62d735 [pipeline] All bert models (#4233) 1 year ago
Jianghai f3bcc292c8 [pipeline] move bert related pipeline components to shardformer (#4187) 1 year ago
Jianghai c5ea728016 [pipeline] add bert_for_pretraining bert_lmhead forward and policy (#4172) 1 year ago
Jianghai 90a65ea682 [pipeline] build bloom model and policy , revise the base class of policy (#4161) 1 year ago
Jianghai c552cefa93 [pipeline]add pipeline policy and bert forward (#4130) 1 year ago
Hongxin Liu 5c897ddb94 [pipeline] add stage manager (#4093) 1 year ago
Jianghai e8e7e49243 [pipeline]add pipeline policy and bert forward (#4130) 1 year ago
Hongxin Liu f51ce1bc8e [pipeline] refactor 1f1b schedule (#4115) 1 year ago
Hongxin Liu 45fdc9b42c [pipeline] implement p2p communication (#4100) 1 year ago
Hongxin Liu 422544222f [pipeline] add stage manager (#4093) 1 year ago
Frank Lee 80eba05b0a
[test] refactor tests with spawn (#3452) 2 years ago
Ziyue Jiang 09d69e1c25
[PP Middleware] Add bwd and step for PP middleware (#2111) 2 years ago
Ziyue Jiang e4705ba4e2
[Pipeline Middleware] fix data race in Pipeline Scheduler for DAG (#2087) 2 years ago
Ziyue Jiang 597cdd3006
[Pipeline Middleware] Adapt scheduler for Topo (#2066) 2 years ago
Ziyue Jiang b0936e4a44
[rpc] split with dag (#2028) 2 years ago
Super Daniel 393f594051
[fx/meta/rpc] move _meta_registration.py to fx folder / register fx functions with compatibility checks / remove color debug (#1710) 2 years ago
Kirigaya Kazuto 9708638ded
[pipeline/pytree] add pytree to process args and kwargs | provide `data_process_func` to process args and kwargs after forward (#1642) 2 years ago
Kirigaya Kazuto 170fa81095
[pipeline/chimera] test chimera | fix bug of initializing (#1615) 2 years ago
Kirigaya Kazuto edc9e419ad
[pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera (#1595) 2 years ago
Kirigaya Kazuto 6159d45417
[pipeline/tuning] improve dispatch performance both time and space cost (#1544) 2 years ago
Kirigaya Kazuto f1e1836218
[pipeline/pipleline_process_group] finish PipelineProcessGroup to manage local abd global rank in TP,DP and PP (#1508) 2 years ago
Kirigaya Kazuto 5a6fd71f90
[pipeline/rpc] update outstanding mechanism | optimize dispatching strategy (#1497) 2 years ago
Kirigaya Kazuto 9145aef2b4
[pipeline/rpc] implement distributed optimizer | test with assert_close (#1486) 2 years ago