Commit Graph

  • 3300580eb4 added buffer sync to naive amp model wrapper FrankLeeeee 2022-03-02 02:54:45 +0000
  • 6f22fb1906
    add a common util for hooks registered on parameter. (#292) Jiarui Fang 2022-03-02 14:38:22 +0800
  • 34c2e7568e fix bug about single node with cpu offload ver217 2022-03-02 14:38:08 +0800
  • 3b64dcc439
    bug fix: pass hook_list to engine (#273) Jie Zhu 2022-03-02 14:25:52 +0800
  • 670fbc15f8 polish code jiaruifang 2022-03-02 13:57:01 +0800
  • edbaaa5145 fix CPU offload grad accumulation ver217 2022-03-02 13:55:59 +0800
  • 74535fb293 add param as a param for function signature passed to BaseHookMag jiaruifang 2022-03-02 13:51:29 +0800
  • 9200bdd3da polish code jiaruifang 2022-03-02 13:42:58 +0800
  • 4cc315e321 polish code jiaruifang 2022-03-02 13:40:06 +0800
  • a9cd7af746 polish code jiaruifang 2022-03-02 13:35:35 +0800
  • 6e9836afb8 polish code. Add remove API jiaruifang 2022-03-02 13:34:37 +0800
  • ec4dffa6df add a common utils for hooks registered on parameter. jiaruifang 2022-03-02 13:20:53 +0800
  • 7cdc366d52
    change parameter name Jie Zhu 2022-03-02 13:18:13 +0800
  • e4cc58424e fix shard grad shape and add comments ver217 2022-03-02 11:40:14 +0800
  • f7a14e0029 Merge branch 'feature/zero' into develop lclgy 2022-03-01 19:20:46 +0800
  • b61ea71734 ignore shard model v2 test ver217 2022-03-01 18:55:26 +0800
  • 1880f9a592 add sharded grad and refactor grad hooks ver217 2022-03-01 18:52:55 +0800
  • feb52f307b Merge branch 'develop' into feature/zero ver217 2022-03-01 18:31:44 +0800
  • 2e6c195990
    Merge branch 'develop' into bug-fix Jie Zhu 2022-03-01 18:26:19 +0800
  • 3280869358
    Feature/zero (#279) Jiarui Fang 2022-03-01 18:17:01 +0800
  • 7411d10648 add sharded grad and refactor grad hooks ver217 2022-03-01 18:12:51 +0800
  • 4709e16a69
    Merge branch 'hpcaitech:feature/zero' into feature/zero LuGY 2022-03-01 15:54:34 +0800
  • a912ab309a Added CPU Adam lclgy 2022-03-01 15:43:09 +0800
  • 672b1c9ec8 polish zero dp unittests jiaruifang 2022-03-01 15:40:54 +0800
  • e3c57e61bf Added CPU Adam lclgy 2022-03-01 15:43:09 +0800
  • 1819e25249 polish zero dp unittests jiaruifang 2022-03-01 15:40:54 +0800
  • 5320cc5c44 remove deepspeed implementation and refactor for the reconstructed zero module FrankLeeeee 2022-03-01 07:24:55 +0000
  • 05892e2541
    Merge branch 'develop' into feature/zero Jiarui Fang 2022-03-01 15:12:37 +0800
  • e5f17affed
    [WIP] Yet another sharded model implementation (#274) Jiarui Fang 2022-03-01 14:55:43 +0800
  • f84b15719f add community group and update issue template(#271) binmakeswell 2022-02-28 17:07:14 +0800
  • ec49c6833a update experimental visualization (#253) Sze-qq 2022-02-28 16:03:13 +0800
  • 8836140725 add Chinese README binmakeswell 2022-02-18 16:28:37 +0800
  • 701a06730f add community group and update issue template(#271) binmakeswell 2022-02-28 17:07:14 +0800
  • 71aae4903c added compatibility CI and options for release ci FrankLeeeee 2022-02-28 08:40:06 +0000
  • 5f4b78bbb3 fix test_zero_level_1.py::test_zero_level_1 unitest jiaruifang 2022-03-01 12:29:53 +0800
  • 230c044fd7
    Merge branch 'feature/docs/zh-Hans' into feature/docs/zh-Hans binmakeswell 2022-03-01 12:03:16 +0800
  • 28bf993581 add proposal and user group in issue template binmakeswell 2022-03-01 11:48:12 +0800
  • d919a5ec4f
    add proposal and user group in issue template (#276) binmakeswell 2022-03-01 11:52:44 +0800
  • f1d2105e87 add proposal and user group in issue template binmakeswell 2022-03-01 11:48:12 +0800
  • 19d03a9d75 torch.concat -> torch.cat jiaruifang 2022-03-01 11:02:27 +0800
  • 77f0ef4804 Merge branch 'feature/zero' of https://github.com/hpcaitech/ColossalAI into jiaruifang/shardmodelv2 jiaruifang 2022-03-01 10:47:02 +0800
  • 52b679e405 [WIP] Yes another implementation of shardModel. Using a better hook method. jiaruifang 2022-03-01 10:43:46 +0800
  • 623d33ca7b
    bug fix: pass hook_list to engine Jie Zhu 2022-02-28 18:10:26 +0800
  • 27f8d7bce5
    add community group (#271) binmakeswell 2022-02-28 17:07:14 +0800
  • b029149a38 add community group binmakeswell 2022-02-28 16:56:39 +0800
  • 7795c2dfbc [WIP] initialize the shard param class jiaruifang 2022-02-28 16:54:55 +0800
  • 93f8eb6732 [WIP] initialize the shard param class jiaruifang 2022-02-28 16:54:55 +0800
  • f72b75f58f added pypi publication CI and remove formatting CI FrankLeeeee 2022-02-28 07:17:37 +0000
  • 3ff567c569
    update experimental visualization (#253) Sze-qq 2022-02-28 16:03:13 +0800
  • 82fd74cd37 add unit test for Zero3ParameterManager jiaruifang 2022-02-28 15:15:38 +0800
  • 4eebd47e6a clip_grad support zero3 and add unit test ver217 2022-02-28 15:14:45 +0800
  • 177841b180 update experimental visualization Sze-qq 2022-02-23 23:30:52 +0800
  • ede36ecb59 added pypi publication CI and remove formatting CI FrankLeeeee 2022-02-28 07:17:37 +0000
  • 17f2ed3d09 add unit test for Zero3ParameterManager jiaruifang 2022-02-28 15:15:38 +0800
  • 037272e7d8 clip_grad support zero3 and add unit test ver217 2022-02-28 15:14:45 +0800
  • a8815017a7 Merge branch 'develop' of https://github.com/hpcaitech/ColossalAI into jiaruifang/zero3_param_mgr_unitest jiaruifang 2022-02-28 14:58:41 +0800
  • f1960e2ec6 Add an unitest for class Zero3ParameterManager. Test the shape of sharded tensors. jiaruifang 2022-02-28 14:50:36 +0800
  • adf7e50325 Added TPExpert for special situation 1SAA 2022-02-27 22:28:39 +0800
  • 127ed7d349 Added TPExpert for special situation 1SAA 2022-02-27 22:28:39 +0800
  • 6cecbde811 refactor reconstructed zero code FrankLeeeee 2022-02-27 06:56:09 +0000
  • 35bb56802a refactor reconstructed zero code FrankLeeeee 2022-02-27 06:56:09 +0000
  • b5b612a26c
    Fixed parameter initialization in FFNExpert (#251) HELSON 2022-02-27 14:01:25 +0800
  • abe0a29317
    fix bugs of hook and add unit tests (#252) ver217 2022-02-27 14:00:58 +0800
  • bdbbefb451 update unit test ver217 2022-02-25 19:49:43 +0800
  • 518eddfbac fix bug ver217 2022-02-25 16:05:16 +0800
  • ad495fcef9 Merge 'origin/feature/zero' into feature/zero ver217 2022-02-25 15:31:16 +0800
  • d1ea1f27fe polish code and add state dict hook ver217 2022-02-25 15:26:22 +0800
  • 804fd3e222 Update checkpointing.py BoxiangW 2022-02-25 15:11:34 +0800
  • c4f9a760a8 First version of checkpointing BoxiangW 2022-02-25 14:57:13 +0800
  • a49a826145 Fixed parameter initialization in FFNExpert 1SAA 2022-02-23 19:09:31 +0800
  • 726a4abb66
    fixed CI dataset directory; fixed import error of 2.5d accuracy (#255) アマデウス 2022-02-24 14:33:45 +0800
  • 957ecbbce1 fixed CI dataset directory; fixed import error of 2.5d accuracy zbian 2022-02-24 11:33:52 +0800
  • 9b39481ff3 fix bugs of hook and add unit tests ver217 2022-02-23 19:25:19 +0800
  • 948f215e69 add offload ver217 2022-02-22 11:16:38 +0800
  • 53bc70a43b fix sub module streams ver217 2022-02-21 15:54:59 +0800
  • ca29f0d27b fix bugs of hook and add unit tests ver217 2022-02-23 19:25:19 +0800
  • d7ae280abe update zero stage 1 develop FrankLeeeee 2022-02-09 05:56:22 +0000
  • 568f7578ab add offload ver217 2022-02-22 11:16:38 +0800
  • e7a1136436 Optimized MoE layer and fixed some bugs; 1SAA 2022-02-18 20:42:31 +0800
  • 4d9f6271fe Optimized MoE layer and fixed some bugs; 1SAA 2022-02-18 20:42:31 +0800
  • 218c061e2d fixed padding index issue for vocab parallel embedding layers; updated 3D linear to be compatible with examples in the tutorial zbian 2022-02-17 22:03:39 +0800
  • 34d7c0401a update setup info (#233) ver217 2022-02-15 15:15:03 +0800
  • a3c0b6508e fix sub module streams ver217 2022-02-21 15:54:59 +0800
  • fdcdd4de59 Optimized MoE layer and fixed some bugs Decreased moe tests 1SAA 2022-02-18 20:42:31 +0800
  • 21f48df342 add Chinese README binmakeswell 2022-02-18 16:28:37 +0800
  • 4f90df6e48 add Chinese README binmakeswell 2022-02-18 16:28:37 +0800
  • 9dc0f0f4ea add gather full param ctx ver217 2022-02-18 15:31:04 +0800
  • 283fcd82f7 add gather full param ctx ver217 2022-02-18 15:31:04 +0800
  • ce58b3c511
    Implement naive zero3 (#240) ver217 2022-02-18 10:54:38 +0800
  • 837d8d3bbf fixed padding index issue for vocab parallel embedding layers; updated 3D linear to be compatible with examples in the tutorial zbian 2022-02-17 22:03:39 +0800
  • 829d01f991 add TODOs in comments ver217 2022-02-17 17:48:12 +0800
  • bb6e07f4d2 add zero3 param manager ver217 2022-02-17 17:21:31 +0800
  • 2399c4e065 naive zero3 works well ver217 2022-02-17 15:16:19 +0800
  • 4c3376587c
    update setup info (#233) ver217 2022-02-15 15:15:03 +0800
  • 37f407e79a update setup info ver217 2022-02-15 14:21:51 +0800
  • 0180630ed7 update setup info ver217 2022-02-15 12:35:35 +0800
  • b9f8521f8c Automated submodule synchronization github-actions 2022-02-09 00:01:25 +0000
  • f5ca88ec97 fixed apex import (#227) v0.0.2 Frank Lee 2022-02-14 18:04:57 +0800
  • eb3fda4c28 updated readme and change log (#224) Frank Lee 2022-02-14 17:22:48 +0800
  • 578ea0583b update setup and workflow (#222) ver217 2022-02-14 17:09:30 +0800