Commit Graph

  • ce7b2c9ae3 update README and images path (#384) binmakeswell 2022-03-11 13:53:38 +0800
  • 2fcd4f38ee fix format (#379) ScalableEKNN 2022-03-10 18:35:41 +0800
  • 5a4a3b77d9 fix format (#376) Jiang Zhuo 2022-03-10 17:15:59 +0800
  • ce886a9062 fix format (#374) lucasliunju 2022-03-10 16:12:51 +0800
  • 526a318032 [unit test] Refactored test cases with component func (#339) Frank Lee 2022-03-11 14:09:09 +0800
  • de46450461 Added activation offload (#331) LuGY 2022-03-11 10:08:10 +0800
  • 272ebfb57d [bug] shard param during initializing the ShardedModelV2 (#381) Jiarui Fang 2022-03-10 19:28:03 +0800
  • 8c18eb0998 [profiler] Fixed bugs in CommProfiler and PcieProfiler (#377) HELSON 2022-03-10 17:54:55 +0800
  • b5f43acee3 [zero] find miss code (#378) Jiarui Fang 2022-03-10 17:51:50 +0800
  • 6b6002962a [zero] zero init context collect numel of model (#375) Jiarui Fang 2022-03-10 16:31:02 +0800
  • 1ed7c24c02 Added PCIE profiler to dectect data transmission (#373) HELSON 2022-03-10 16:24:57 +0800
  • d9217e1960 Revert "[zero] bucketized tensor cpu gpu copy (#368)" jiaruifang 2022-03-10 15:39:09 +0800
  • a8cd5e8e81 Update README-zh-Hans.md (#367) Xue Fuzhao 2022-03-10 13:49:50 +0800
  • 1c88dd43e2 Fix/format (#366) Shen Chenhui 2022-03-10 13:32:56 +0800
  • 0db43fa995 fix format (#364) Ziheng Qin 2022-03-10 12:12:42 +0800
  • 8539898ec6 flake8 style change (#363) RichardoLuo 2022-03-10 11:47:51 +0800
  • 53bb3bcc0a fix format (#362) Kai Wang (Victor Kai) 2022-03-10 11:33:21 +0800
  • a77d73f22b fix format parallel_context.py (#359) ziyu huang 2022-03-10 09:29:32 +0800
  • c695369af0 fix format constants.py (#358) Zangwei 2022-03-09 23:35:41 +0800
  • 4a0f8c2c50 fix format parallel_2p5d (#357) Yuer867 2022-03-09 21:42:30 +0800
  • 7eb87f516d flake8 style (#352) Liang Bowen 2022-03-09 17:34:43 +0800
  • 54ee8d1254 Fix/format colossalai/engine/paramhooks/(#350) Xu Kai 2022-03-09 17:28:17 +0800
  • e83970e3dc fix format ColossalAI\colossalai\context\process_group_initializer Maruyama_Aya 2022-03-09 16:23:33 +0800
  • 3b88eb2259 Flake8 code restyle yuxuan-lou 2022-03-09 15:17:01 +0800
  • af801cb4df fix format setup.py (#343) xyupeng 2022-03-09 15:11:35 +0800
  • 148207048e Qifan formated file ColossalAI\colossalai\nn\layer\parallel_1d\layers.py (#342) xuqifan897 2022-03-08 22:45:27 -0800
  • 3a51d909af fix format (#332) Cautiousss 2022-03-09 10:35:05 +0800
  • cbb6436ff0 fix format for dir-[parallel_3d] (#333) DouJS 2022-03-09 10:31:43 +0800
  • eaac03ae1d [formart] format fixed for kernel\cuda_native codes (#335) ExtremeViscent 2022-03-09 01:44:20 +0000
  • 00670c870e [zero] bucketized tensor cpu gpu copy (#368) Jiarui Fang 2022-03-10 14:41:08 +0800
  • 44e4891f57 [zero] able to place params on cpu after zero init context (#365) Jiarui Fang 2022-03-10 14:08:58 +0800
  • b66f3b994c increase the timeout limit in CI temporarily ver217 2022-03-10 11:48:20 +0800
  • 52d055119b increase the timeout limit in CI temporarily ver217 2022-03-10 11:00:31 +0800
  • 253e54d98a fix grad shape ver217 2022-03-09 18:03:39 +0800
  • ea2872073f [zero] global model data memory tracer (#360) Jiarui Fang 2022-03-10 11:20:04 +0800
  • cb34cd384d [test] polish zero related unitest (#351) Jiarui Fang 2022-03-10 09:57:26 +0800
  • 534e0bb118 Fixed import bug for no-tensorboard environment (#354) HELSON 2022-03-09 19:48:04 +0800
  • c57e089824 [profile] added example for ProfilerContext (#349) HELSON 2022-03-09 17:35:28 +0800
  • 532ae79cb0 add test sharded optim with cpu adam (#347) ver217 2022-03-09 17:30:02 +0800
  • 10e2826426 move async memory to an individual directory (#345) Jiarui Fang 2022-03-09 16:31:25 +0800
  • 425bb0df3f Added Profiler Context to manage all profilers (#340) HELSON 2022-03-09 16:12:41 +0800
  • d0ae0f2215 [zero] update sharded optim v2 (#334) ver217 2022-03-09 16:09:36 +0800
  • 2b8cddd40e skip bert in test engine ver217 2022-03-09 14:18:23 +0800
  • d41a9f12c6 install transformers in CI ver217 2022-03-09 13:46:55 +0800
  • f5f0ad266e fix bert unit test ver217 2022-03-09 13:38:20 +0800
  • 5663616921 polish code jiaruifang 2022-03-09 12:09:07 +0800
  • d271f2596b polish engine unitest jiaruifang 2022-03-09 12:03:49 +0800
  • 354c0f9047 polish code jiaruifang 2022-03-09 11:35:11 +0800
  • 4d94cd513e adapting bert unitest interface jiaruifang 2022-03-09 11:26:10 +0800
  • 7977422aeb add bert for unitest and sharded model is not able to pass the bert case jiaruifang 2022-03-09 10:39:02 +0800
  • 3d5d64bd10 refactored grad scaler (#338) Frank Lee 2022-03-09 11:52:43 +0800
  • 6a3188167c set criterion as optional in colossalai initialize (#336) Frank Lee 2022-03-09 11:51:22 +0800
  • 3213554cc2 [profiler] add adaptive sampling to memory profiler (#330) Jie Zhu 2022-03-09 11:07:10 +0800
  • 1388671699 [zero] Update sharded model v2 using sharded param v2 (#323) ver217 2022-03-08 18:18:06 +0800
  • 799d105bb4 using pytest parametrize jiaruifang 2022-03-08 12:03:35 +0800
  • dec24561cf show pytest parameterize jiaruifang 2022-03-08 11:51:32 +0800
  • 11bddb6e55 [zero] update zero context init with the updated test utils (#327) Jiarui Fang 2022-03-08 14:45:01 +0800
  • 6268446b81 [test] refactored testing components (#324) Frank Lee 2022-03-08 10:19:18 +0800
  • 4f26fabe4f fixed strings in profiler outputs (#325) HELSON 2022-03-07 17:08:56 +0800
  • de0468c7a8 [zero] zero init context (#321) Jiarui Fang 2022-03-07 16:14:40 +0800
  • 73bff11288 Added profiler communication operations 1SAA 2022-03-04 10:17:45 +0800
  • d275b98b7d add badge and contributor list binmakeswell 2022-03-04 18:04:51 +0800
  • a3269de5c9 [zero] cpu adam kernel (#288) LuGY 2022-03-04 16:05:15 +0800
  • 90d3aef62c [zero] yet an improved sharded param (#311) Jiarui Fang 2022-03-04 15:49:23 +0800
  • c9e7d9582d [zero] polish shard strategy (#310) Jiarui Fang 2022-03-04 15:35:07 +0800
  • 3092317b80 polish code ver217 2022-03-04 13:44:38 +0800
  • 36f9a74ab2 fix sharded param hook and unit test ver217 2022-03-04 13:40:48 +0800
  • 001ca624dd impl shard optim v2 and add unit test ver217 2022-03-04 11:49:02 +0800
  • 74f77e314b [zero] a shard strategy in granularity of tensor (#307) Jiarui Fang 2022-03-04 11:59:35 +0800
  • 80364c7686 [zero] sharded tensor (#305) Jiarui Fang 2022-03-04 10:46:13 +0800
  • d344689274 [profiler] primary memory tracer Jie Zhu 2022-03-04 09:35:23 +0800
  • dfc3fafe89 update unit testing CI rules FrankLeeeee 2022-03-03 07:42:46 +0000
  • bbbfe9b2c9 added compatibility CI and options for release ci FrankLeeeee 2022-02-28 08:40:06 +0000
  • 115bcc0b41 added pypi publication CI and remove formatting CI FrankLeeeee 2022-02-28 07:17:37 +0000
  • b105371ace rename shared adam to sharded optim v2 ver217 2022-03-03 15:55:27 +0800
  • 70814dc22f fix master params dtype ver217 2022-03-03 15:50:30 +0800
  • 795210dd99 add fp32 master params in sharded adam ver217 2022-03-03 15:42:53 +0800
  • a109225bc2 add sharded adam ver217 2022-03-03 15:06:18 +0800
  • 8f74fbd9c9 polish license (#300) Jiarui Fang 2022-03-03 14:11:45 +0800
  • e17e92c54d Polish sharded parameter (#297) Jiarui Fang 2022-03-03 12:42:57 +0800
  • 7aef75ca42 [zero] add sharded grad and refactor grad hooks for ShardedModel (#287) ver217 2022-03-02 18:28:29 +0800
  • 9afb5c8b2d fixed typo in ShardParam (#294) Frank Lee 2022-03-02 17:26:23 +0800
  • 27155b8513 added unit test for sharded optimizer (#293) Frank Lee 2022-03-02 17:15:54 +0800
  • e17e54e32a added buffer sync to naive amp model wrapper (#291) Frank Lee 2022-03-02 16:47:17 +0800
  • 8d653af408 add a common util for hooks registered on parameter. (#292) Jiarui Fang 2022-03-02 14:38:22 +0800
  • f867365aba bug fix: pass hook_list to engine (#273) Jie Zhu 2022-03-02 14:25:52 +0800
  • 5a560a060a Feature/zero (#279) Jiarui Fang 2022-03-01 18:17:01 +0800
  • 08eccfe681 add community group and update issue template(#271) binmakeswell 2022-02-28 17:07:14 +0800
  • 3312d716a0 update experimental visualization (#253) Sze-qq 2022-02-28 16:03:13 +0800
  • 753035edd3 add Chinese README binmakeswell 2022-02-18 16:28:37 +0800
  • 82023779bb Added TPExpert for special situation 1SAA 2022-02-27 22:28:39 +0800
  • 36b8477228 Fixed parameter initialization in FFNExpert (#251) HELSON 2022-02-27 14:01:25 +0800
  • e13293bb4c fixed CI dataset directory; fixed import error of 2.5d accuracy (#255) アマデウス 2022-02-24 14:33:45 +0800
  • 219df6e685 Optimized MoE layer and fixed some bugs; 1SAA 2022-02-18 20:42:31 +0800
  • 3dba070580 fixed padding index issue for vocab parallel embedding layers; updated 3D linear to be compatible with examples in the tutorial zbian 2022-02-17 22:03:39 +0800
  • 24f8583cc4 update setup info (#233) ver217 2022-02-15 15:15:03 +0800
  • bc0009b523
    fixed bug in activation checkpointing test (#387) Frank Lee 2022-03-11 14:48:11 +0800
  • cc408f9aa4 Merge branch 'hotfix/offloadbug' into jiaruifang/cuda_memcollector jiaruifang 2022-03-11 14:46:16 +0800
  • 59d303cb3e
    Update test_activation_checkpointing.py LuGY 2022-03-11 14:41:54 +0800
  • b33659a9d0
    [zero] polish ShardedOptimV2 unittest (#385) Jiarui Fang 2022-03-11 14:40:01 +0800