Commit Graph

636 Commits (e17a43184b6e28afb6a4757076d27018cbf8c5d0)
 

Author SHA1 Message Date
LuGY 2883040286
[example] change qkv processing (#870)
3 years ago
Jiarui Fang 96211c2cc8
[tensor] customized op returns ColoTensor (#875)
3 years ago
Ziyue Jiang 26d4ab8b03
[Tensor] Add function to spec and update linear 1Drow and unit tests (#869)
3 years ago
Frank Lee 11f54c7b6b
[doc] improved docstring and assertion messages for the engine module (#871)
3 years ago
Frank Lee 1c34382678
[doc] improved assertion messages in trainer (#873)
3 years ago
Frank Lee 7a64fae33a
[doc] improved error messages in initialize (#872)
3 years ago
Jiarui Fang 1190b2c4a4
[tensor] add cross_entrophy_loss (#868)
3 years ago
HELSON 3107817172
[gemini] add stateful tensor container (#867)
3 years ago
Jiarui Fang d01d3b8cb0
colo init context add device attr. (#866)
3 years ago
Frank Lee 2238758c2e
[usability] improved error messages in the context module (#856)
3 years ago
Frank Lee 9fdebadd69
[doc] improved docstring in the amp module (#857)
3 years ago
Frank Lee b862d89d00
[doc] improved docstring in the logging module (#861)
3 years ago
Frank Lee 8004c8e938
[doc] improved docstring in the communication module (#863)
3 years ago
Jiarui Fang 8af5f7423d
[tensor] an initial dea of tensor spec (#865)
3 years ago
Jiarui Fang 126ba573a8
[Tensor] add layer norm Op (#852)
3 years ago
Frank Lee a82da26f7e
[cli] refactored micro-benchmarking cli and added more metrics (#858)
3 years ago
Frank Lee ee222dfbf3
[usability] added assertion message in registry (#864)
3 years ago
HELSON f0e654558f
[gemini] polish code (#855)
3 years ago
Jiarui Fang 29159d9b5b
hotfix tensor unittest bugs (#862)
3 years ago
Frank Lee 1258af71cc
[ci] cache cuda extension (#860)
3 years ago
YuliangLiu0306 c6930d8ddf
[pipelinable]use ColoTensor to replace dummy tensor. (#853)
3 years ago
Ziyue Jiang bcc8655021
[Tensor ] Add 1Drow weight reshard by spec (#854)
3 years ago
ver217 d7e0303d1e
[zero] use GeminiMemoryManager when sampling model data (#850)
3 years ago
ver217 232142f402
[utils] refactor profiler (#837)
3 years ago
Jiarui Fang 62f059251b
[Tensor] init a tp network training unittest (#849)
3 years ago
ver217 0dea140760
[hotfix] add deconstructor for stateful tensor (#848)
3 years ago
ver217 0f7ed8c192
fix _post_init_method of zero init ctx (#847)
3 years ago
Ziyue Jiang 2a0a427e04
[tensor]add assert for colo_tensor 1Drow (#846)
3 years ago
Ziyue Jiang 05023ecfee
[Tensor] TP Linear 1D row (#843)
3 years ago
Frank Lee cf6d1c9284
[CLI] refactored the launch CLI and fixed bugs in multi-node launching (#844)
3 years ago
HELSON e5ea3fdeef
[gemini] add GeminiMemoryManger (#832)
3 years ago
YuliangLiu0306 35ea6e1023
[pipelinable]use pipelinable context to initialize non-pipeline model (#816)
3 years ago
Jiarui Fang ea0a2ed25f
[hotfix] the bug of numel() in ColoTensor (#845)
3 years ago
LuGY c1e8d2001e
modefied the pp build for ckpt adaptation (#803)
3 years ago
Jiarui Fang 8789850eea
Init Conext supports lazy allocate model memory (#842)
3 years ago
Jiarui Fang 4575a3298b
[hotfix] ColoTensor pin_memory (#840)
3 years ago
Frank Lee 9f6f656952
[setup] use env var instead of option for cuda ext (#839)
3 years ago
Frank Lee 943982d29a
[unittest] refactored unit tests for change in dependency (#838)
3 years ago
github-actions[bot] f271f34716
Automated submodule synchronization (#827)
3 years ago
Frank Lee 01e9f834f5
[dependency] removed torchvision (#833)
3 years ago
Jiarui Fang cb5a4778e1
Revert "[WIP] Applying ColoTensor on TP-1D-row Linear. (#831)" (#835)
3 years ago
Frank Lee 5e00e6cf23
[setup] allow installation with python 3.6 (#834)
3 years ago
Jiarui Fang ac88de6dfc
[WIP] Applying ColoTensor on TP-1D-row Linear. (#831)
3 years ago
Jiarui Fang 595bedf767
revert zero tensors back (#829)
3 years ago
Jiarui Fang 294a6060d0
[tensor] ZeRO use ColoTensor as the base class. (#828)
3 years ago
Ziyue Jiang 8e6fdb4f29
[tensor]fix test_linear (#826)
3 years ago
Ziyue Jiang 1a9e2c2dff
[tensor] fix kwargs in colo_tensor torch_funtion (#825)
3 years ago
Jiarui Fang eb1b89908c
[refactor] moving InsertPostInitMethodToModuleSubClasses to utils. (#824)
3 years ago
Jiarui Fang 2ecc3d7a55
[tensor] lazy init (#823)
3 years ago
Jiarui Fang 68dcd51d41
[Tensor] update ColoTensor torch_function (#822)
3 years ago