Commit Graph

27 Commits (adf5054ff8d1884053b5631585b479ffcd2279f4)

Author SHA1 Message Date
Jiarui Fang c92f84fcdb
[tensor] distributed checkpointing for parameters (#1240) 2022-07-12 15:51:06 +08:00
Jiarui Fang 9bcd2fd4af
[tensor] a shorter shard and replicate spec (#1245) 2022-07-11 15:51:48 +08:00
Jiarui Fang 3b500984b1
[tensor] fix some unittests (#1234) 2022-07-08 14:18:30 +08:00
Jiarui Fang f38006ea83
[checkpoint] checkpoint for ColoTensor Model (#1196) 2022-07-06 17:22:03 +08:00
Jiarui Fang ae7d3f4927
[refactor] move process group from _DistSpec to ColoTensor. (#1203) 2022-07-06 16:15:16 +08:00
Jiarui Fang 4b9bba8116
[ColoTensor] rename APIs and add output_replicate to ComputeSpec (#1168) 2022-06-24 13:08:54 +08:00
Frank Lee f8eec98ff5
[tensor] fixed non-serializable colo parameter during model checkpointing (#1153) 2022-06-22 11:43:38 +08:00
Jiarui Fang 49832b2344
[refactory] add nn.parallel module (#1068) 2022-06-06 15:34:41 +08:00
Jiarui Fang a00644079e
reorgnize colotensor directory (#1062)
* reorgnize colotensor directory

* polish code
2022-06-03 18:04:22 +08:00
Ziyue Jiang df9dcbbff6
[Tensor] add hybrid device demo and fix bugs (#1059) 2022-06-03 12:09:49 +08:00
Ziyue Jiang 7c530b9de2
[Tensor] add Parameter inheritance for ColoParameter (#1041)
* add Parameter inheritance for ColoParameter

* remove tricks

* remove tricks

* polish

* polish
2022-05-30 17:23:44 +08:00
Ziyue Jiang 6c5996a56e
[Tensor] add module check and bert test (#1031)
* add Embedding

* Add bert test

* polish

* add check module test

* polish

* polish

* polish

* polish
2022-05-26 18:15:42 +08:00
Ziyue Jiang 32291dd73f
[Tensor] add module handler for linear (#1021)
* add module spec for linear

* polish

* polish

* polish
2022-05-26 11:50:44 +08:00
ver217 007ca0df92
fix colo init context (#1026) 2022-05-25 20:41:58 +08:00
ver217 ad536e308e
[tensor] refactor colo-tensor (#992)
* refactor colo-tensor and update linear op

* polish code

* polish code

* update ops and unit tests

* update unit tests

* polish code

* rename dist_spec module

* polish code

* polish code

* remove unneeded import

* fix pipelinable
2022-05-19 12:44:59 +08:00
Ziyue Jiang d73c2b1d79
[Tensor] fix init context (#931)
* change torch.Parameter to ColoParameter

* fix post assignment for init context

* polish

* polish
2022-05-11 15:48:12 +08:00
Ziyue Jiang dfc88b85ea
[Tensor] simplify named param (#928)
* simplify ColoModulize

* simplify ColoModulize

* polish

* polish
2022-05-11 10:54:19 +08:00
Ziyue Jiang c195d2814c
[Tensor] add from_pretrained support and bert pretrained test (#921)
* add from_pretrained support and test

* polish

* polish

* polish

* polish
2022-05-09 16:11:47 +08:00
Jiarui Fang ab95ec9aea
[Tensor] init ColoParameter (#914) 2022-05-06 12:57:14 +08:00
Jiarui Fang d16671da75
[Tensor] initialize the ColoOptimizer (#898)
* [Tensor] activation is an attr of ColoTensor

* [Tensor] add optimizer

* only detach parameters in context

* polish code
2022-04-28 15:23:40 +08:00
Jiarui Fang 676f191532
[Tensor] activation is an attr of ColoTensor (#897) 2022-04-28 14:43:22 +08:00
Jiarui Fang 26c49639d8
[Tensor] overriding paramters() for Module using ColoTensor (#889) 2022-04-27 15:28:59 +08:00
Jiarui Fang d01d3b8cb0
colo init context add device attr. (#866) 2022-04-25 14:24:26 +08:00
Jiarui Fang 62f059251b
[Tensor] init a tp network training unittest (#849) 2022-04-24 16:43:44 +08:00
ver217 0dea140760
[hotfix] add deconstructor for stateful tensor (#848)
* add deconstructor for stateful tensor

* fix colo init context
2022-04-24 15:03:04 +08:00
YuliangLiu0306 35ea6e1023
[pipelinable]use pipelinable context to initialize non-pipeline model (#816)
* [CLI] add CLI launcher

* Revert "[CLI] add CLI launcher"

This reverts commit df7e6506d4.

* [pipeline]add module lazy init feature to support large model initization.

* [pipeline]add to_layer_list and partition method to support arbitrary non-pp model

* refactor the module structure

* polish

* [pipelinable]add unit test for pipelinable

* polish

* polish

* Fix CodeFactor issues.
2022-04-24 13:03:12 +08:00
Jiarui Fang 8789850eea
Init Conext supports lazy allocate model memory (#842) 2022-04-22 18:03:35 +08:00