mirror of https://github.com/hpcaitech/ColossalAI
f1e1836218
* support p2p communication with any type of object | pass test * reconstruct pipeline schedule with p2p_v2.py(support communication with List[Any]) | pass test * [engin/schedule] use p2p_v2 to recontruct pipeline_schedule * [pipeline/rpc] implement a demo for PP with cuda rpc framework * [pipeline/rpc] support interleaving | fix checkpoint bug | change logic when dispatch data in work_list to ensure steady 1F1B * [pipeline/rpc] implement distributed optimizer | test with assert_close * [pipeline/rpc] implement distributed optimizer | test with assert_close * [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy * [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy * [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy * [pipeline/pipleline_process_group] finish PipelineProcessGroup to manage local abd global rank in TP,DP and PP * [pipeline/pipleline_process_group] remove comment * [pipeline/pipleline_process_group] remove comment * [pipeline/pipleline_process_group] skip process group test * [pipeline/pipleline_process_group] remove test named function |
||
---|---|---|
.. | ||
amp | ||
auto_parallel | ||
builder | ||
cli | ||
communication | ||
context | ||
device | ||
engine | ||
fx | ||
gemini | ||
kernel | ||
logging | ||
nn | ||
pipeline | ||
registry | ||
tensor | ||
testing | ||
trainer | ||
utils | ||
zero | ||
__init__.py | ||
constants.py | ||
core.py | ||
global_variables.py | ||
initialize.py |