xcnick
85178a397a
[hotfix] fix error for torch 2.0 ( #2243 )
2022-12-30 23:11:55 +08:00
HELSON
2458659919
[zero] fix error for BEiT models ( #2169 )
...
* [zero] fix error for BEiT models
* [ColoParameter] add unpack operation for tuple arguments
* fix bugs
* fix chunkv2 unit testing
* add assertion for gradient state
2022-12-26 15:03:54 +08:00
Jiarui Fang
2827f41898
[Gemini] GeminiDPP convert to PyTorch Module. ( #2151 )
2022-12-20 10:19:36 +08:00
YuliangLiu0306
49216d7ab1
[autoparallel] fix bugs caused by negative dim key ( #1808 )
...
* [autoparallel] fix bugs caused by negative dim key
* fix import error
* fix matmul test issue
* fix unit test issue
2022-11-08 17:03:50 +08:00
HELSON
c6a1a62636
[hotfix] fix zero's incompatibility with checkpoint in torch-1.12 ( #1786 )
...
* [hotfix] fix zero's incompatibility with checkpoint in torch-1.12
* [zero] add cpu shard init
* [zero] add tiny example test
* [colo_tensor] fix bugs for torch-1.11
2022-11-02 16:11:34 +08:00
Jiarui Fang
a1476ea882
[NFC] polish doc style for ColoTensor ( #1457 )
2022-08-16 09:21:05 +08:00
HELSON
7a8702c06d
[colotensor] add Tensor.view op and its unit test ( #1343 )
...
[colotensor] add megatron initialization for gpt2
2022-07-21 10:53:15 +08:00
HELSON
f92c100ddd
[checkpoint] use gather_tensor in checkpoint and update its unit test ( #1339 )
2022-07-19 14:15:28 +08:00
HELSON
1b41686461
[hotfix] fix unit test test_module_spec ( #1321 )
2022-07-15 14:02:32 +08:00
HELSON
260a55804a
[hotfix] fix shape error in backward when using ColoTensor ( #1298 )
2022-07-13 23:06:12 +08:00
ver217
7aadcbd070
hotfix colotensor _scan_for_pg_from_args ( #1276 )
2022-07-12 20:46:31 +08:00
Jiarui Fang
c92f84fcdb
[tensor] distributed checkpointing for parameters ( #1240 )
2022-07-12 15:51:06 +08:00
Jiarui Fang
1aad903c15
[tensor] redistribute among different process groups ( #1247 )
...
* make it faster
* [tensor] rename convert_to_dist -> redistribute
* [tensor] ShardSpec and ReplicaSpec
* [tensor] redistribute among diff pgs
* polish code
2022-07-12 10:24:05 +08:00
Jiarui Fang
9bcd2fd4af
[tensor] a shorter shard and replicate spec ( #1245 )
2022-07-11 15:51:48 +08:00
Jiarui Fang
2699dfbbfd
[rename] convert_to_dist -> redistribute ( #1243 )
2022-07-11 13:05:44 +08:00
HELSON
f6add9b720
[tensor] redirect .data.__get__ to a tensor instance ( #1239 )
2022-07-11 11:41:29 +08:00
Jiarui Fang
4a76084dc9
[tensor] add zero_like colo op, important for Optimizer ( #1236 )
2022-07-08 14:55:27 +08:00
Jiarui Fang
3b500984b1
[tensor] fix some unittests ( #1234 )
2022-07-08 14:18:30 +08:00
HELSON
f071b500b6
[polish] polish __repr__ for ColoTensor, DistSpec, ProcessGroup ( #1235 )
2022-07-08 13:25:57 +08:00
Yi Zhao
04537bf83e
[checkpoint]support generalized scheduler ( #1222 )
2022-07-07 18:16:38 +08:00
Jiarui Fang
a98319f023
[tensor] torch function return colotensor ( #1229 )
2022-07-07 18:09:18 +08:00
Jiarui Fang
ae7d3f4927
[refactor] move process group from _DistSpec to ColoTensor. ( #1203 )
2022-07-06 16:15:16 +08:00
Jiarui Fang
060b917daf
[refactor] remove gpc dependency in colotensor's _ops ( #1189 )
2022-07-04 18:54:37 +08:00
Jiarui Fang
c463f8adf9
[tensor] remove gpc in tensor tests ( #1186 )
2022-06-29 14:08:40 +08:00
Jiarui Fang
1b657f9ce1
[tensor] revert local view back ( #1178 )
2022-06-27 18:38:34 +08:00
Jiarui Fang
aa7bef73d4
[Tensor] distributed view supports inter-process hybrid parallel ( #1169 )
2022-06-27 09:45:26 +08:00
Jiarui Fang
4b9bba8116
[ColoTensor] rename APIs and add output_replicate to ComputeSpec ( #1168 )
2022-06-24 13:08:54 +08:00
Jiarui Fang
f4ef224358
[Tensor] remove ParallelAction, use ComputeSpec instread ( #1166 )
2022-06-23 17:34:59 +08:00
Jiarui Fang
177c374401
remove gather out in parallel action ( #1163 )
2022-06-23 16:35:05 +08:00
Jiarui Fang
8cdce0399c
[ColoTensor] improves init functions. ( #1150 )
2022-06-21 18:28:38 +08:00
ver217
a3b66f6def
[tensor] refactor parallel action ( #1007 )
...
* refactor parallel action
* polish unit tests
2022-05-20 20:19:58 +08:00
ver217
ad536e308e
[tensor] refactor colo-tensor ( #992 )
...
* refactor colo-tensor and update linear op
* polish code
* polish code
* update ops and unit tests
* update unit tests
* polish code
* rename dist_spec module
* polish code
* polish code
* remove unneeded import
* fix pipelinable
2022-05-19 12:44:59 +08:00
Jiarui Fang
802ac297cc
[Tensor] remove useless import in tensor dir ( #997 )
2022-05-18 14:54:51 +08:00
ver217
67c33f57eb
[tensor] design DistSpec and DistSpecManager for ColoTensor ( #934 )
...
* add dist spec
* update linear op
* polish code
* polish code
* update embedding op
* polish unit tests
* polish unit tests
* polish comments
* polish code
* add test_dist_spec_mgr
* polish code
* refactor folder structure
* polish unit tests
* add get_process_group() for TensorSpec
* polish code
2022-05-13 15:13:52 +08:00
ver217
4ca732349e
[tensor] colo tensor overrides mul ( #927 )
...
* colo tensor overrides mul
* polish code
2022-05-10 16:04:08 +08:00
ver217
45b9124df4
[tensor] hijack addmm for colo tensor ( #923 )
...
* hijack addmm for colo tensor
* fix bugs
* polish unit test
* polish comments
2022-05-09 18:55:49 +08:00
Ziyue Jiang
c195d2814c
[Tensor] add from_pretrained support and bert pretrained test ( #921 )
...
* add from_pretrained support and test
* polish
* polish
* polish
* polish
2022-05-09 16:11:47 +08:00
Jiarui Fang
845856ea29
[Graph] building computing graph with ColoTensor, Linear only ( #917 )
2022-05-07 17:10:37 +08:00
Jiarui Fang
ab95ec9aea
[Tensor] init ColoParameter ( #914 )
2022-05-06 12:57:14 +08:00
Ziyue Jiang
f593a5637e
[Tensor] add embedding tp1d row ( #904 )
2022-04-29 14:10:05 +08:00
Ziyue Jiang
2c0d19d755
[Tensor] add ColoTensor TP1Dcol Embedding ( #899 )
2022-04-28 17:45:06 +08:00
Jiarui Fang
676f191532
[Tensor] activation is an attr of ColoTensor ( #897 )
2022-04-28 14:43:22 +08:00
Ziyue Jiang
cb182da7c5
[tensor] refine linear and add gather for laynorm ( #893 )
...
* refine linear and add function to ColoTensor
* add gather for layernorm
* polish
* polish
2022-04-28 10:55:40 +08:00
Jiarui Fang
26c49639d8
[Tensor] overriding paramters() for Module using ColoTensor ( #889 )
2022-04-27 15:28:59 +08:00
Ziyue Jiang
1d0aba4153
[tensor] add ColoTensor 1Dcol ( #888 )
2022-04-27 14:13:55 +08:00
Jiarui Fang
72cdc06875
[Tensor] make ColoTensor more robust for getattr ( #886 )
...
* [Tensor] make ColoTensor more robust for getattr
* polish
* polish
2022-04-27 10:57:49 +08:00
Ziyue Jiang
9bc5a77c31
[tensor] wrap function in the torch_tensor to ColoTensor ( #881 )
2022-04-26 20:13:56 +08:00
Jiarui Fang
7f76517a85
[Tensor] make a simple net works with 1D row TP ( #879 )
2022-04-26 18:11:47 +08:00
Jiarui Fang
909211453b
[Tensor] Add some attributes to ColoTensor ( #877 )
...
* [Tensor] add some function to ColoTensor
* torch.allclose
* rm torch.add
2022-04-26 15:10:47 +08:00
Ziyue Jiang
26d4ab8b03
[Tensor] Add function to spec and update linear 1Drow and unit tests ( #869 )
2022-04-26 10:15:26 +08:00