mirror of https://github.com/hpcaitech/ColossalAI
f1e1836218
* support p2p communication with any type of object | pass test * reconstruct pipeline schedule with p2p_v2.py(support communication with List[Any]) | pass test * [engin/schedule] use p2p_v2 to recontruct pipeline_schedule * [pipeline/rpc] implement a demo for PP with cuda rpc framework * [pipeline/rpc] support interleaving | fix checkpoint bug | change logic when dispatch data in work_list to ensure steady 1F1B * [pipeline/rpc] implement distributed optimizer | test with assert_close * [pipeline/rpc] implement distributed optimizer | test with assert_close * [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy * [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy * [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy * [pipeline/pipleline_process_group] finish PipelineProcessGroup to manage local abd global rank in TP,DP and PP * [pipeline/pipleline_process_group] remove comment * [pipeline/pipleline_process_group] remove comment * [pipeline/pipleline_process_group] skip process group test * [pipeline/pipleline_process_group] remove test named function |
||
---|---|---|
.. | ||
components_to_test | ||
test_amp | ||
test_auto_parallel | ||
test_comm | ||
test_config | ||
test_context | ||
test_data | ||
test_data_pipeline_tensor_parallel | ||
test_ddp | ||
test_device | ||
test_engine | ||
test_fx | ||
test_gemini | ||
test_layers | ||
test_moe | ||
test_ops | ||
test_optimizer | ||
test_pipeline | ||
test_tensor | ||
test_trainer | ||
test_utils | ||
test_zero | ||
__init__.py |