ColossalAI/tests
Kirigaya Kazuto 9145aef2b4
[pipeline/rpc] implement distributed optimizer | test with assert_close (#1486)
* support p2p communication with any type of object | pass test

* reconstruct pipeline schedule with p2p_v2.py(support communication with List[Any]) | pass test

* [engin/schedule] use p2p_v2 to recontruct pipeline_schedule

* [pipeline/rpc] implement a demo for PP with cuda rpc framework

* [pipeline/rpc] support interleaving | fix checkpoint bug | change logic when dispatch data in work_list to ensure steady 1F1B

* [pipeline/rpc] implement distributed optimizer | test with assert_close

* [pipeline/rpc] implement distributed optimizer | test with assert_close
2022-08-25 10:49:01 +08:00
..
components_to_test [test] ignore 8 gpu test (#1080) 2022-06-08 23:14:18 +08:00
test_amp [test] refactored with the new rerun decorator (#763) 2022-04-15 00:33:04 +08:00
test_auto_parallel [autoparallel] integrate auto parallel with torch fx (#1479) 2022-08-23 14:23:08 +08:00
test_comm [communication] add p2p_v2.py to support communication with List[Any] (#1407) 2022-08-09 11:40:04 +08:00
test_config [pipeline] refactor the pipeline module (#1087) 2022-06-10 11:27:38 +08:00
test_context [test] refactored with the new rerun decorator (#763) 2022-04-15 00:33:04 +08:00
test_data [unittest] refactored unit tests for change in dependency (#838) 2022-04-22 15:39:07 +08:00
test_data_pipeline_tensor_parallel [engin/schedule] use p2p_v2 to recontruct pipeline_schedule (#1408) 2022-08-12 11:33:26 +08:00
test_ddp [zero] alleviate memory usage in ZeRODDP state_dict (#1398) 2022-08-02 15:49:13 +08:00
test_device [tensor] support runtime ShardingSpec apply (#1453) 2022-08-19 13:39:51 +08:00
test_engine [hotfix] remove potiential circle import (#1307) 2022-07-14 13:44:26 +08:00
test_fx [fx] fixed adapative pooling size concatenation error (#1489) 2022-08-25 09:05:07 +08:00
test_gemini [zero] add chunk_managerV2 for all-gather chunk (#1441) 2022-08-11 19:17:24 +08:00
test_layers [FAW] init an LFU implementation for FAW (#1488) 2022-08-24 17:37:22 +08:00
test_moe [test] refactored with the new rerun decorator (#763) 2022-04-15 00:33:04 +08:00
test_ops [FAW] export FAW in _ops (#1438) 2022-08-11 13:43:24 +08:00
test_optimizer [hotfix] fix CPUAdam kernel nullptr (#1410) 2022-08-05 19:45:45 +08:00
test_pipeline [pipeline/rpc] implement distributed optimizer | test with assert_close (#1486) 2022-08-25 10:49:01 +08:00
test_tensor [tensor] support runtime ShardingSpec apply (#1453) 2022-08-19 13:39:51 +08:00
test_trainer [pipeline] refactor the pipeline module (#1087) 2022-06-10 11:27:38 +08:00
test_utils [utils] Add use_reetrant=False in utils.activation_checkpoint (#1460) 2022-08-16 15:39:20 +08:00
test_zero [zero] zero optim state_dict takes only_rank_0 (#1384) 2022-07-29 13:22:50 +08:00
__init__.py [zero] Update sharded model v2 using sharded param v2 (#323) 2022-03-11 15:50:28 +08:00