Jiarui Fang
|
4575a3298b
|
[hotfix] ColoTensor pin_memory (#840)
|
2022-04-22 17:07:46 +08:00 |
Frank Lee
|
9f6f656952
|
[setup] use env var instead of option for cuda ext (#839)
|
2022-04-22 15:44:56 +08:00 |
Frank Lee
|
943982d29a
|
[unittest] refactored unit tests for change in dependency (#838)
|
2022-04-22 15:39:07 +08:00 |
github-actions[bot]
|
f271f34716
|
Automated submodule synchronization (#827)
Co-authored-by: github-actions <github-actions@github.com>
|
2022-04-22 15:24:58 +08:00 |
Frank Lee
|
01e9f834f5
|
[dependency] removed torchvision (#833)
* [dependency] removed torchvision
* fixed transforms
|
2022-04-22 15:24:35 +08:00 |
Jiarui Fang
|
cb5a4778e1
|
Revert "[WIP] Applying ColoTensor on TP-1D-row Linear. (#831)" (#835)
This reverts commit ac88de6dfc .
|
2022-04-22 14:45:57 +08:00 |
Frank Lee
|
5e00e6cf23
|
[setup] allow installation with python 3.6 (#834)
|
2022-04-22 14:17:51 +08:00 |
Jiarui Fang
|
ac88de6dfc
|
[WIP] Applying ColoTensor on TP-1D-row Linear. (#831)
* revert zero tensors back
* [tensor] init row 1d linear
|
2022-04-22 14:03:26 +08:00 |
Jiarui Fang
|
595bedf767
|
revert zero tensors back (#829)
|
2022-04-22 12:12:35 +08:00 |
Jiarui Fang
|
294a6060d0
|
[tensor] ZeRO use ColoTensor as the base class. (#828)
* [refactor] moving InsertPostInitMethodToModuleSubClasses to utils.
* [tensor] ZeRO use ColoTensor as the base class.
* polish
|
2022-04-22 12:00:48 +08:00 |
Ziyue Jiang
|
8e6fdb4f29
|
[tensor]fix test_linear (#826)
|
2022-04-21 17:18:56 +08:00 |
Ziyue Jiang
|
1a9e2c2dff
|
[tensor] fix kwargs in colo_tensor torch_funtion (#825)
|
2022-04-21 16:47:35 +08:00 |
Jiarui Fang
|
eb1b89908c
|
[refactor] moving InsertPostInitMethodToModuleSubClasses to utils. (#824)
|
2022-04-21 16:03:18 +08:00 |
Jiarui Fang
|
2ecc3d7a55
|
[tensor] lazy init (#823)
|
2022-04-21 15:40:23 +08:00 |
Jiarui Fang
|
68dcd51d41
|
[Tensor] update ColoTensor torch_function (#822)
* Revert "[zero] add ZeroTensorShardStrategy (#793)"
This reverts commit 88759e289e .
* [gemini] set cpu memory capacity
* [log] local throughput collecting
* polish
* polish
* polish
* polish code
* polish
* polish code
* add a new tensor structure and override linear for it
* polish
* polish
* polish
* polish
* polish
* polish
* polish
* polish
* polish
* polish
* polish
* [tensor] renaming and reorganize directory structure.
* rm useless dir
* polish
* polish
* [tensor] hander the function not wrapped
* polish
|
2022-04-21 14:25:27 +08:00 |
Jiarui Fang
|
660d2d1f1b
|
[Tensor] apply ColoTensor on Torch functions (#821)
* Revert "[zero] add ZeroTensorShardStrategy (#793)"
This reverts commit 88759e289e .
* [gemini] set cpu memory capacity
* [log] local throughput collecting
* polish
* polish
* polish
* polish code
* polish
* polish code
* add a new tensor structure and override linear for it
* polish
* polish
* polish
* polish
* polish
* polish
* polish
* polish
* polish
* polish
* polish
* [tensor] renaming and reorganize directory structure.
* rm useless dir
* polish
* polish
* [tensor] hander the function not wrapped
|
2022-04-21 14:21:10 +08:00 |
Jiarui Fang
|
0ce8924ceb
|
[tensor] reorganize files (#820)
|
2022-04-21 14:15:48 +08:00 |
Jiarui Fang
|
ab962b9735
|
[gemini] a new tensor structure (#818)
* Revert "[zero] add ZeroTensorShardStrategy (#793)"
This reverts commit 88759e289e .
* [gemini] set cpu memory capacity
* [log] local throughput collecting
* polish
* polish
* polish
* polish code
* polish
* polish code
* add a new tensor structure and override linear for it
* polish
* polish
* polish
* polish
* polish
* polish
* polish
* polish
* polish
* polish
* polish
|
2022-04-21 11:42:37 +08:00 |
github-actions[bot]
|
413ce30c45
|
Automated submodule synchronization (#819)
Co-authored-by: github-actions <github-actions@github.com>
|
2022-04-21 11:26:58 +08:00 |
github-actions[bot]
|
9aae4197bb
|
Automated submodule synchronization (#810)
Co-authored-by: github-actions <github-actions@github.com>
|
2022-04-20 13:57:12 +08:00 |
YuliangLiu0306
|
e1b3899824
|
Merge pull request #815 from FrankLeeeee/feature/check-cli
[cli] added check installation cli
|
2022-04-20 12:19:50 +08:00 |
FrankLeeeee
|
70ed11d07e
|
[cli] added check installation cli
|
2022-04-20 12:13:27 +08:00 |
YuliangLiu0306
|
c7eca40f51
|
Merge pull request #812 from FrankLeeeee/feature/cli
[cli] fixed single-node process launching
|
2022-04-20 11:40:07 +08:00 |
Jiarui Fang
|
3ddbd1bce1
|
[gemini] collect cpu-gpu moving volume in each iteration (#813)
|
2022-04-20 11:29:48 +08:00 |
FrankLeeeee
|
d522cb704e
|
[cli] fixed single-node process launching
|
2022-04-20 10:46:51 +08:00 |
Jiarui Fang
|
61c20b44bc
|
[log] local throughput metrics (#811)
* Revert "[zero] add ZeroTensorShardStrategy (#793)"
This reverts commit 88759e289e .
* [gemini] set cpu memory capacity
* [log] local throughput collecting
* polish
* polish
* polish
* polish code
* polish
|
2022-04-20 10:05:39 +08:00 |
ver217
|
dd92b90a68
|
[DO NOT MERGE] [zero] init fp16 params directly in ZeroInitContext (#808)
* init fp16 param directly
* polish code
|
2022-04-19 16:16:48 +08:00 |
Jiarui Fang
|
227d1cd4b3
|
[gemini] APIs to set cpu memory capacity (#809)
|
2022-04-19 16:05:22 +08:00 |
YuliangLiu0306
|
f6dcd23fb9
|
Merge pull request #807 from FrankLeeeee/feature/cli
[cli] fixed a bug in user args and refactored the module structure
|
2022-04-19 15:52:26 +08:00 |
FrankLeeeee
|
f63e91d280
|
[cli] fixed a bug in user args and refactored the module structure
|
2022-04-19 15:15:16 +08:00 |
Jiarui Fang
|
e761ad2cd7
|
Revert "[zero] add ZeroTensorShardStrategy (#793)" (#806)
|
2022-04-19 14:40:02 +08:00 |
HELSON
|
88759e289e
|
[zero] add ZeroTensorShardStrategy (#793)
|
2022-04-19 14:32:45 +08:00 |
Jiarui Fang
|
681addb512
|
[refactor] moving grad acc logic to engine (#804)
|
2022-04-19 14:03:21 +08:00 |
Frank Lee
|
05d9ae5999
|
[cli] add missing requirement (#805)
|
2022-04-19 13:56:59 +08:00 |
YuliangLiu0306
|
de2f581d43
|
[cli] added micro benchmarking for tp (#789)
* [CLI] add CLI launcher
* Revert "[CLI] add CLI launcher"
This reverts commit df7e6506d4 .
* [CLI]add cli benchmark feature
* fix CodeFactor issues.
* refactor the module structure.
|
2022-04-19 12:08:28 +08:00 |
YuliangLiu0306
|
cfadc9df8e
|
[cli] added distributed launcher command (#791)
* [CLI] add CLI launcher
* Revert "[CLI] add CLI launcher"
This reverts commit df7e6506d4 .
* [CLI]add cli launcher feature
* remove testing message used during developing
* refactor the module structure.
|
2022-04-19 10:59:44 +08:00 |
Jiarui Fang
|
97cd9b03b3
|
[log] display tflops if available (#802)
|
2022-04-19 10:13:28 +08:00 |
Jiarui Fang
|
4d9332b4c5
|
[refactor] moving memtracer to gemini (#801)
|
2022-04-19 10:13:08 +08:00 |
Jiarui Fang
|
8711c706f4
|
[hotfix] fix grad offload when enabling reuse_fp16_shard
|
2022-04-18 14:58:21 +08:00 |
ver217
|
f1fa1a675f
|
fix grad offload when enabling reuse_fp16_shard
|
2022-04-18 14:07:39 +08:00 |
HELSON
|
4c4388c46e
|
[hotfix] fix memory leak in zero (#781)
|
2022-04-18 13:57:03 +08:00 |
Ziyue Jiang
|
4b01da24cd
|
[TP] change the check assert in split batch 2d (#772)
|
2022-04-16 21:29:57 +08:00 |
ver217
|
846406a07a
|
[gemini] fix auto tensor placement policy (#775)
|
2022-04-16 21:29:31 +08:00 |
ver217
|
38102cf61a
|
update version (#779)
|
2022-04-16 17:09:24 +08:00 |
HELSON
|
a65cbb7e4e
|
[zero] refactor shard and gather operation (#773)
|
2022-04-15 14:41:31 +08:00 |
Frank Lee
|
5a1a095b92
|
[test] refactored with the new rerun decorator (#763)
* [test] refactored with the new rerun decorator
* polish test case
|
2022-04-15 00:33:04 +08:00 |
binmakeswell
|
deaf99f4c9
|
[readme] sync CN readme (#766)
|
2022-04-14 21:04:51 +08:00 |
ver217
|
6e553748a7
|
polish sharded optim docstr and warning (#770)
|
2022-04-14 21:03:59 +08:00 |
LuGY
|
80e37eec42
|
fix the ckpt bugs when using DDP (#769)
|
2022-04-14 21:03:24 +08:00 |
Jiarui Fang
|
1f698f4406
|
[readme] polish readme (#764)
* [readme] polish readme
* centering image
|
2022-04-14 17:34:08 +08:00 |