Commit Graph

14 Commits (46931e3c32e8ccb6bddc46273653eca9d85152ac)

Author SHA1 Message Date
Jiarui Fang 556b9b7e1a
[hotfix] Dist Mgr gather torch version (#1284)
* make it faster

* [hotfix] torchvison fx tests

* [hotfix] rename duplicated named test_gpt.py

* [hotfix] dist mgr torch version
2022-07-13 00:18:56 +08:00
Jiarui Fang ae7d3f4927
[refactor] move process group from _DistSpec to ColoTensor. (#1203) 2022-07-06 16:15:16 +08:00
Jiarui Fang b5f25eb32a
[Tensor] add cpu group to ddp (#1200) 2022-07-05 14:58:28 +08:00
Jiarui Fang 060b917daf
[refactor] remove gpc dependency in colotensor's _ops (#1189) 2022-07-04 18:54:37 +08:00
Jiarui Fang aa7bef73d4
[Tensor] distributed view supports inter-process hybrid parallel (#1169) 2022-06-27 09:45:26 +08:00
ver217 634eecb98e
mark sanity_check of dist_spec_mgr as staticmethod (#1161) 2022-06-23 11:35:25 +08:00
ver217 ffa025e120
[tensor] dist spec s2s uses all-to-all (#1136)
* dist spec s2s uses all-to-all

* update unit test

* add sanity check

* polish unitest test with titans

* add sanity check for DistMgr

* add sanity check

Co-authored-by: jiaruifang <fangjiarui123@gmail.com>
2022-06-22 11:32:38 +08:00
Jiarui Fang 8cdce0399c
[ColoTensor] improves init functions. (#1150) 2022-06-21 18:28:38 +08:00
Jiarui Fang a00644079e
reorgnize colotensor directory (#1062)
* reorgnize colotensor directory

* polish code
2022-06-03 18:04:22 +08:00
ver217 7faef93326
fix dist spec mgr (#1045) 2022-05-31 12:14:39 +08:00
ver217 ad536e308e
[tensor] refactor colo-tensor (#992)
* refactor colo-tensor and update linear op

* polish code

* polish code

* update ops and unit tests

* update unit tests

* polish code

* rename dist_spec module

* polish code

* polish code

* remove unneeded import

* fix pipelinable
2022-05-19 12:44:59 +08:00
Jiarui Fang 802ac297cc
[Tensor] remove useless import in tensor dir (#997) 2022-05-18 14:54:51 +08:00
Ziyue Jiang 797a9dc5a9
add DistSpec for loss and test_model (#947) 2022-05-13 20:29:50 +08:00
ver217 67c33f57eb
[tensor] design DistSpec and DistSpecManager for ColoTensor (#934)
* add dist spec

* update linear op

* polish code

* polish code

* update embedding op

* polish unit tests

* polish unit tests

* polish comments

* polish code

* add test_dist_spec_mgr

* polish code

* refactor folder structure

* polish unit tests

* add get_process_group() for TensorSpec

* polish code
2022-05-13 15:13:52 +08:00