Commit Graph

723 Commits (ff644ee5e416f64b43cd8a70fd32377c92281270)
 

Author SHA1 Message Date
Jiarui Fang 8af5f7423d
[tensor] an initial dea of tensor spec (#865)
3 years ago
Jiarui Fang 126ba573a8
[Tensor] add layer norm Op (#852)
3 years ago
Frank Lee a82da26f7e
[cli] refactored micro-benchmarking cli and added more metrics (#858)
3 years ago
Frank Lee ee222dfbf3
[usability] added assertion message in registry (#864)
3 years ago
HELSON f0e654558f
[gemini] polish code (#855)
3 years ago
Jiarui Fang 29159d9b5b
hotfix tensor unittest bugs (#862)
3 years ago
Frank Lee 1258af71cc
[ci] cache cuda extension (#860)
3 years ago
YuliangLiu0306 c6930d8ddf
[pipelinable]use ColoTensor to replace dummy tensor. (#853)
3 years ago
Ziyue Jiang bcc8655021
[Tensor ] Add 1Drow weight reshard by spec (#854)
3 years ago
ver217 d7e0303d1e
[zero] use GeminiMemoryManager when sampling model data (#850)
3 years ago
ver217 232142f402
[utils] refactor profiler (#837)
3 years ago
Jiarui Fang 62f059251b
[Tensor] init a tp network training unittest (#849)
3 years ago
ver217 0dea140760
[hotfix] add deconstructor for stateful tensor (#848)
3 years ago
ver217 0f7ed8c192
fix _post_init_method of zero init ctx (#847)
3 years ago
Ziyue Jiang 2a0a427e04
[tensor]add assert for colo_tensor 1Drow (#846)
3 years ago
Ziyue Jiang 05023ecfee
[Tensor] TP Linear 1D row (#843)
3 years ago
Frank Lee cf6d1c9284
[CLI] refactored the launch CLI and fixed bugs in multi-node launching (#844)
3 years ago
HELSON e5ea3fdeef
[gemini] add GeminiMemoryManger (#832)
3 years ago
YuliangLiu0306 35ea6e1023
[pipelinable]use pipelinable context to initialize non-pipeline model (#816)
3 years ago
Jiarui Fang ea0a2ed25f
[hotfix] the bug of numel() in ColoTensor (#845)
3 years ago
LuGY c1e8d2001e
modefied the pp build for ckpt adaptation (#803)
3 years ago
Jiarui Fang 8789850eea
Init Conext supports lazy allocate model memory (#842)
3 years ago
Jiarui Fang 4575a3298b
[hotfix] ColoTensor pin_memory (#840)
3 years ago
Frank Lee 9f6f656952
[setup] use env var instead of option for cuda ext (#839)
3 years ago
Frank Lee 943982d29a
[unittest] refactored unit tests for change in dependency (#838)
3 years ago
github-actions[bot] f271f34716
Automated submodule synchronization (#827)
3 years ago
Frank Lee 01e9f834f5
[dependency] removed torchvision (#833)
3 years ago
Jiarui Fang cb5a4778e1
Revert "[WIP] Applying ColoTensor on TP-1D-row Linear. (#831)" (#835)
3 years ago
Frank Lee 5e00e6cf23
[setup] allow installation with python 3.6 (#834)
3 years ago
Jiarui Fang ac88de6dfc
[WIP] Applying ColoTensor on TP-1D-row Linear. (#831)
3 years ago
Jiarui Fang 595bedf767
revert zero tensors back (#829)
3 years ago
Jiarui Fang 294a6060d0
[tensor] ZeRO use ColoTensor as the base class. (#828)
3 years ago
Ziyue Jiang 8e6fdb4f29
[tensor]fix test_linear (#826)
3 years ago
Ziyue Jiang 1a9e2c2dff
[tensor] fix kwargs in colo_tensor torch_funtion (#825)
3 years ago
Jiarui Fang eb1b89908c
[refactor] moving InsertPostInitMethodToModuleSubClasses to utils. (#824)
3 years ago
Jiarui Fang 2ecc3d7a55
[tensor] lazy init (#823)
3 years ago
Jiarui Fang 68dcd51d41
[Tensor] update ColoTensor torch_function (#822)
3 years ago
Jiarui Fang 660d2d1f1b
[Tensor] apply ColoTensor on Torch functions (#821)
3 years ago
Jiarui Fang 0ce8924ceb
[tensor] reorganize files (#820)
3 years ago
Jiarui Fang ab962b9735
[gemini] a new tensor structure (#818)
3 years ago
github-actions[bot] 413ce30c45
Automated submodule synchronization (#819)
3 years ago
github-actions[bot] 9aae4197bb
Automated submodule synchronization (#810)
3 years ago
YuliangLiu0306 e1b3899824
Merge pull request #815 from FrankLeeeee/feature/check-cli
3 years ago
FrankLeeeee 70ed11d07e [cli] added check installation cli
3 years ago
YuliangLiu0306 c7eca40f51
Merge pull request #812 from FrankLeeeee/feature/cli
3 years ago
Jiarui Fang 3ddbd1bce1
[gemini] collect cpu-gpu moving volume in each iteration (#813)
3 years ago
FrankLeeeee d522cb704e [cli] fixed single-node process launching
3 years ago
Jiarui Fang 61c20b44bc
[log] local throughput metrics (#811)
3 years ago
ver217 dd92b90a68
[DO NOT MERGE] [zero] init fp16 params directly in ZeroInitContext (#808)
3 years ago
Jiarui Fang 227d1cd4b3
[gemini] APIs to set cpu memory capacity (#809)
3 years ago