ColossalAI/tests
Kirigaya Kazuto f1e1836218
[pipeline/pipleline_process_group] finish PipelineProcessGroup to manage local abd global rank in TP,DP and PP (#1508)
* support p2p communication with any type of object | pass test

* reconstruct pipeline schedule with p2p_v2.py(support communication with List[Any]) | pass test

* [engin/schedule] use p2p_v2 to recontruct pipeline_schedule

* [pipeline/rpc] implement a demo for PP with cuda rpc framework

* [pipeline/rpc] support interleaving | fix checkpoint bug | change logic when dispatch data in work_list to ensure steady 1F1B

* [pipeline/rpc] implement distributed optimizer | test with assert_close

* [pipeline/rpc] implement distributed optimizer | test with assert_close

* [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy

* [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy

* [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy

* [pipeline/pipleline_process_group] finish PipelineProcessGroup to manage local abd global rank in TP,DP and PP

* [pipeline/pipleline_process_group] remove comment

* [pipeline/pipleline_process_group] remove comment

* [pipeline/pipleline_process_group] skip process group test

* [pipeline/pipleline_process_group] remove test named function
2022-09-01 17:45:47 +08:00
..
components_to_test [test] ignore 8 gpu test (#1080) 2022-06-08 23:14:18 +08:00
test_amp [test] refactored with the new rerun decorator (#763) 2022-04-15 00:33:04 +08:00
test_auto_parallel [autoparellel]add strategies constructor (#1505) 2022-08-30 16:32:09 +08:00
test_comm [communication] add p2p_v2.py to support communication with List[Any] (#1407) 2022-08-09 11:40:04 +08:00
test_config [pipeline] refactor the pipeline module (#1087) 2022-06-10 11:27:38 +08:00
test_context [test] refactored with the new rerun decorator (#763) 2022-04-15 00:33:04 +08:00
test_data [unittest] refactored unit tests for change in dependency (#838) 2022-04-22 15:39:07 +08:00
test_data_pipeline_tensor_parallel [engin/schedule] use p2p_v2 to recontruct pipeline_schedule (#1408) 2022-08-12 11:33:26 +08:00
test_ddp [zero] alleviate memory usage in ZeRODDP state_dict (#1398) 2022-08-02 15:49:13 +08:00
test_device [tensor] support runtime ShardingSpec apply (#1453) 2022-08-19 13:39:51 +08:00
test_engine [hotfix] remove potiential circle import (#1307) 2022-07-14 13:44:26 +08:00
test_fx [fx] Fix wrong index in annotation and minimal flops in ckpt solver (#1521) 2022-08-31 18:10:48 +08:00
test_gemini [zero] add chunk_managerV2 for all-gather chunk (#1441) 2022-08-11 19:17:24 +08:00
test_layers [FAW] cpu caching operations (#1520) 2022-08-30 14:50:02 +08:00
test_moe [test] refactored with the new rerun decorator (#763) 2022-04-15 00:33:04 +08:00
test_ops [FAW] export FAW in _ops (#1438) 2022-08-11 13:43:24 +08:00
test_optimizer [hotfix] fix CPUAdam kernel nullptr (#1410) 2022-08-05 19:45:45 +08:00
test_pipeline [pipeline/pipleline_process_group] finish PipelineProcessGroup to manage local abd global rank in TP,DP and PP (#1508) 2022-09-01 17:45:47 +08:00
test_tensor [tensor]add 1D device mesh (#1492) 2022-08-25 16:48:12 +08:00
test_trainer [pipeline] refactor the pipeline module (#1087) 2022-06-10 11:27:38 +08:00
test_utils [utils] Add use_reetrant=False in utils.activation_checkpoint (#1460) 2022-08-16 15:39:20 +08:00
test_zero [zero] zero optim state_dict takes only_rank_0 (#1384) 2022-07-29 13:22:50 +08:00
__init__.py [zero] Update sharded model v2 using sharded param v2 (#323) 2022-03-11 15:50:28 +08:00