Commit Graph

1 Commits (a7cda6f57dd8f7fc0d5c7ae3b4b5cf9cd8140c13)

Author SHA1 Message Date
Kirigaya Kazuto f1e1836218
[pipeline/pipleline_process_group] finish PipelineProcessGroup to manage local abd global rank in TP,DP and PP (#1508)
* support p2p communication with any type of object | pass test

* reconstruct pipeline schedule with p2p_v2.py(support communication with List[Any]) | pass test

* [engin/schedule] use p2p_v2 to recontruct pipeline_schedule

* [pipeline/rpc] implement a demo for PP with cuda rpc framework

* [pipeline/rpc] support interleaving | fix checkpoint bug | change logic when dispatch data in work_list to ensure steady 1F1B

* [pipeline/rpc] implement distributed optimizer | test with assert_close

* [pipeline/rpc] implement distributed optimizer | test with assert_close

* [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy

* [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy

* [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy

* [pipeline/pipleline_process_group] finish PipelineProcessGroup to manage local abd global rank in TP,DP and PP

* [pipeline/pipleline_process_group] remove comment

* [pipeline/pipleline_process_group] remove comment

* [pipeline/pipleline_process_group] skip process group test

* [pipeline/pipleline_process_group] remove test named function
2022-09-01 17:45:47 +08:00