Commit Graph

498 Commits (f271f3471630d22ea21a608af7ce56870d0c8e63)
 

Author SHA1 Message Date
github-actions[bot] f271f34716
Automated submodule synchronization (#827)
3 years ago
Frank Lee 01e9f834f5
[dependency] removed torchvision (#833)
3 years ago
Jiarui Fang cb5a4778e1
Revert "[WIP] Applying ColoTensor on TP-1D-row Linear. (#831)" (#835)
3 years ago
Frank Lee 5e00e6cf23
[setup] allow installation with python 3.6 (#834)
3 years ago
Jiarui Fang ac88de6dfc
[WIP] Applying ColoTensor on TP-1D-row Linear. (#831)
3 years ago
Jiarui Fang 595bedf767
revert zero tensors back (#829)
3 years ago
Jiarui Fang 294a6060d0
[tensor] ZeRO use ColoTensor as the base class. (#828)
3 years ago
Ziyue Jiang 8e6fdb4f29
[tensor]fix test_linear (#826)
3 years ago
Ziyue Jiang 1a9e2c2dff
[tensor] fix kwargs in colo_tensor torch_funtion (#825)
3 years ago
Jiarui Fang eb1b89908c
[refactor] moving InsertPostInitMethodToModuleSubClasses to utils. (#824)
3 years ago
Jiarui Fang 2ecc3d7a55
[tensor] lazy init (#823)
3 years ago
Jiarui Fang 68dcd51d41
[Tensor] update ColoTensor torch_function (#822)
3 years ago
Jiarui Fang 660d2d1f1b
[Tensor] apply ColoTensor on Torch functions (#821)
3 years ago
Jiarui Fang 0ce8924ceb
[tensor] reorganize files (#820)
3 years ago
Jiarui Fang ab962b9735
[gemini] a new tensor structure (#818)
3 years ago
github-actions[bot] 413ce30c45
Automated submodule synchronization (#819)
3 years ago
github-actions[bot] 9aae4197bb
Automated submodule synchronization (#810)
3 years ago
YuliangLiu0306 e1b3899824
Merge pull request #815 from FrankLeeeee/feature/check-cli
3 years ago
FrankLeeeee 70ed11d07e [cli] added check installation cli
3 years ago
YuliangLiu0306 c7eca40f51
Merge pull request #812 from FrankLeeeee/feature/cli
3 years ago
Jiarui Fang 3ddbd1bce1
[gemini] collect cpu-gpu moving volume in each iteration (#813)
3 years ago
FrankLeeeee d522cb704e [cli] fixed single-node process launching
3 years ago
Jiarui Fang 61c20b44bc
[log] local throughput metrics (#811)
3 years ago
ver217 dd92b90a68
[DO NOT MERGE] [zero] init fp16 params directly in ZeroInitContext (#808)
3 years ago
Jiarui Fang 227d1cd4b3
[gemini] APIs to set cpu memory capacity (#809)
3 years ago
YuliangLiu0306 f6dcd23fb9
Merge pull request #807 from FrankLeeeee/feature/cli
3 years ago
FrankLeeeee f63e91d280 [cli] fixed a bug in user args and refactored the module structure
3 years ago
Jiarui Fang e761ad2cd7
Revert "[zero] add ZeroTensorShardStrategy (#793)" (#806)
3 years ago
HELSON 88759e289e
[zero] add ZeroTensorShardStrategy (#793)
3 years ago
Jiarui Fang 681addb512
[refactor] moving grad acc logic to engine (#804)
3 years ago
Frank Lee 05d9ae5999
[cli] add missing requirement (#805)
3 years ago
YuliangLiu0306 de2f581d43
[cli] added micro benchmarking for tp (#789)
3 years ago
YuliangLiu0306 cfadc9df8e
[cli] added distributed launcher command (#791)
3 years ago
Jiarui Fang 97cd9b03b3
[log] display tflops if available (#802)
3 years ago
Jiarui Fang 4d9332b4c5
[refactor] moving memtracer to gemini (#801)
3 years ago
Jiarui Fang 8711c706f4
[hotfix] fix grad offload when enabling reuse_fp16_shard
3 years ago
ver217 f1fa1a675f fix grad offload when enabling reuse_fp16_shard
3 years ago
HELSON 4c4388c46e
[hotfix] fix memory leak in zero (#781)
3 years ago
Ziyue Jiang 4b01da24cd
[TP] change the check assert in split batch 2d (#772)
3 years ago
ver217 846406a07a
[gemini] fix auto tensor placement policy (#775)
3 years ago
ver217 38102cf61a
update version (#779)
3 years ago
HELSON a65cbb7e4e
[zero] refactor shard and gather operation (#773)
3 years ago
Frank Lee 5a1a095b92
[test] refactored with the new rerun decorator (#763)
3 years ago
binmakeswell deaf99f4c9
[readme] sync CN readme (#766)
3 years ago
ver217 6e553748a7
polish sharded optim docstr and warning (#770)
3 years ago
LuGY 80e37eec42
fix the ckpt bugs when using DDP (#769)
3 years ago
Jiarui Fang 1f698f4406
[readme] polish readme (#764)
3 years ago
Frank Lee 920fe31526
[compatibility] used backward-compatible API for global process group (#758)
3 years ago
Frank Lee 4ea49cb536
[test] added a decorator for address already in use error with backward compatibility (#760)
3 years ago
Jiarui Fang 10ef8afdd2
[gemini] init genimi individual directory (#754)
3 years ago