Commit Graph

  • b62759e87d Merge branch 'main' of https://github.com/hpcaitech/ColossalAI into log/local jiaruifang 2022-04-20 09:43:10 +0800
  • 9e754d2115 polish jiaruifang 2022-04-20 09:41:32 +0800
  • 643bc453ee [log] local throughput collecting jiaruifang 2022-04-20 09:36:51 +0800
  • b9284b98a9 Automated submodule synchronization github-actions 2022-04-20 00:01:12 +0000
  • dd92b90a68
    [DO NOT MERGE] [zero] init fp16 params directly in ZeroInitContext (#808) ver217 2022-04-19 16:16:48 +0800
  • 227d1cd4b3
    [gemini] APIs to set cpu memory capacity (#809) Jiarui Fang 2022-04-19 16:05:22 +0800
  • f6dcd23fb9
    Merge pull request #807 from FrankLeeeee/feature/cli YuliangLiu0306 2022-04-19 15:52:26 +0800
  • 70b509409f Merge branch 'main' of https://github.com/hpcaitech/ColossalAI into gemini/cpu_cap jiaruifang 2022-04-19 15:49:41 +0800
  • a9c0e45306 polish code ver217 2022-04-19 15:49:21 +0800
  • 7e5716fbf4 [gemini] set cpu memory capacity jiaruifang 2022-04-19 15:48:19 +0800
  • 26bf0ebdda init fp16 param directly ver217 2022-04-19 15:34:12 +0800
  • f63e91d280 [cli] fixed a bug in user args and refactored the module structure FrankLeeeee 2022-04-19 15:14:54 +0800
  • e761ad2cd7
    Revert "[zero] add ZeroTensorShardStrategy (#793)" (#806) Jiarui Fang 2022-04-19 14:40:02 +0800
  • 27896f5b27 Revert "[zero] add ZeroTensorShardStrategy (#793)" jiaruifang 2022-04-19 14:39:42 +0800
  • 6331fcbbc7 Revert "[zero] add ZeroTensorShardStrategy (#793)" Jiarui Fang 2022-04-19 14:38:29 +0800
  • 88759e289e
    [zero] add ZeroTensorShardStrategy (#793) HELSON 2022-04-19 14:32:45 +0800
  • 681addb512
    [refactor] moving grad acc logic to engine (#804) Jiarui Fang 2022-04-19 14:03:21 +0800
  • 05d9ae5999
    [cli] add missing requirement (#805) Frank Lee 2022-04-19 13:56:59 +0800
  • 1ae47bd746 [cli] add missing requirement FrankLeeeee 2022-04-19 13:53:04 +0800
  • de2f581d43
    [cli] added micro benchmarking for tp (#789) YuliangLiu0306 2022-04-19 12:08:28 +0800
  • e5beb69bfd fix a bug jiaruifang 2022-04-19 11:54:34 +0800
  • 77b5704011
    Merge branch 'hpcaitech:main' into main YuliangLiu0306 2022-04-19 11:28:13 +0800
  • a25697ac3e Revert "[CLI] add CLI launcher" YuliangLiu0306 2022-04-19 11:27:00 +0800
  • 38229cd2d8
    Merge branch 'main' into feature/cli_benchmark YuliangLiu0306 2022-04-19 11:09:44 +0800
  • 8dee7e53d3 [zero] add ZeroTensorShardStrategy 1SAA 2022-04-19 11:07:47 +0800
  • cfadc9df8e
    [cli] added distributed launcher command (#791) YuliangLiu0306 2022-04-19 10:59:44 +0800
  • 1c0eb1fbd8 polish jiaruifang 2022-04-19 10:29:23 +0800
  • 5d7764034b polish code jiaruifang 2022-04-19 10:28:17 +0800
  • 0b329915fb Merge branch 'main' of https://github.com/hpcaitech/ColossalAI into refactor/rm_circle_import jiaruifang 2022-04-19 10:17:39 +0800
  • 61301df9b8 moving grad acc unittest to engine jiaruifang 2022-04-19 10:15:33 +0800
  • 97cd9b03b3
    [log] display tflops if available (#802) Jiarui Fang 2022-04-19 10:13:28 +0800
  • 4d9332b4c5
    [refactor] moving memtracer to gemini (#801) Jiarui Fang 2022-04-19 10:13:08 +0800
  • 3981aa2cf5
    Revert "[log] add tflops logs" Jiarui Fang 2022-04-19 09:52:40 +0800
  • a60d17c816 Revert "[refactor] moving memtracer to gemini (#790)" Jiarui Fang 2022-04-19 09:50:20 +0800
  • fd5f81c457 [refactor] remove circle imports jiaruifang 2022-04-19 09:44:40 +0800
  • 751caebd94
    Automated submodule synchronization (#771) github-actions[bot] 2022-04-19 09:43:55 +0800
  • 1477c387ad
    [refactor] moving memtracer to gemini (#790) Jiarui Fang 2022-04-19 09:07:11 +0800
  • f82f08e58c Automated submodule synchronization github-actions 2022-04-19 00:01:13 +0000
  • 659ed8f636 polish workflow file FrankLeeeee 2022-04-19 02:33:36 +0800
  • 7f710e213c polish workflow file FrankLeeeee 2022-04-19 02:01:20 +0800
  • c605bb0a49 [ci] updated workflow with proxy FrankLeeeee 2022-04-19 01:38:16 +0800
  • 69a2ead40c revert pipeline unittest jiaruifang 2022-04-18 22:46:54 +0800
  • 2de201508e remove some unitests jiaruifang 2022-04-18 22:16:16 +0800
  • b46359e5be remove imports in class jiaruifang 2022-04-18 21:51:50 +0800
  • a0109ef871 close tests/test_data_pipeline_tensor_parallel/test_cifar_with_data_pipeline_tensor.py jiaruifang 2022-04-18 21:11:26 +0800
  • 7685738434 Merge branch 'main' of https://github.com/hpcaitech/ColossalAI into refactor/gemini jiaruifang 2022-04-18 19:39:59 +0800
  • 66d3e6425f
    modefied the pp build for ckpt adaptation (#795) LuGY 2022-04-18 19:23:16 +0800
  • 1f887f987c modefied the pp build for ckpt adaptation lclgy 2022-04-18 18:55:59 +0800
  • 516e6ef800 modefied the pp build for ckpt adaptation lclgy 2022-04-15 20:20:51 +0800
  • ac2f9e3656
    [log] add tflops logs Jiarui Fang 2022-04-18 18:19:21 +0800
  • fb3cace00f refactor the module structure. liuyuliang 2022-04-18 18:15:25 +0800
  • f0b3733d16 refactor the module structure. liuyuliang 2022-04-18 18:06:46 +0800
  • b753c907e7 polish jiaruifang 2022-04-18 17:56:36 +0800
  • 2cb488bd70 fix CodeFactor issues. liuyuliang 2022-04-18 17:39:06 +0800
  • 731ab39394 polish jiaruifang 2022-04-18 17:34:39 +0800
  • 16b2677fed polish jiaruifang 2022-04-18 17:27:46 +0800
  • 92bcc66229 remove testing message used during developing liuyuliang 2022-04-18 17:19:24 +0800
  • 031ad4e3da polish jiaruifang 2022-04-18 17:19:23 +0800
  • d0009f38e5 polish jiaruifang 2022-04-18 17:15:19 +0800
  • 2b23a3ea5f [log] add tflops logs jiaruifang 2022-04-18 17:11:44 +0800
  • 6fa97700aa [CLI]add cli launcher feature liuyuliang 2022-04-18 16:43:31 +0800
  • 3b64803e4a Revert "[CLI] add CLI launcher" liuyuliang 2022-04-18 16:40:20 +0800
  • 6649dc3309 [CLI]add cli benchmark feature liuyuliang 2022-04-18 16:35:35 +0800
  • 13bd3e2060
    [refactor] moving memtracer to gemini Jiarui Fang 2022-04-18 16:30:39 +0800
  • b088eb8ac3 Revert "[CLI] add CLI launcher" liuyuliang 2022-04-18 16:14:11 +0800
  • 551359cb7d
    Merge branch 'hpcaitech:main' into main YuliangLiu0306 2022-04-18 16:08:55 +0800
  • 3a29ae6f51 Merge branch 'feature/cli' of github.com:YuliangLiu0306/ColossalAI into feature/cli liuyuliang 2022-04-18 15:49:32 +0800
  • 152de162bc
    Merge branch 'hpcaitech:main' into feature/cli YuliangLiu0306 2022-04-18 15:49:08 +0800
  • 37bf06ace5 use click module to support cli feature liuyuliang 2022-04-18 15:48:42 +0800
  • acb71b8010 move gemini unitest to an individual dir jiaruifang 2022-04-18 15:18:14 +0800
  • 1411c5ed1b Merge branch 'main' of https://github.com/hpcaitech/ColossalAI into refactor/gemini jiaruifang 2022-04-18 15:00:10 +0800
  • 5a6eea8f70 fix the stm bug jiaruifang 2022-04-18 14:59:57 +0800
  • 8711c706f4
    [hotfix] fix grad offload when enabling reuse_fp16_shard Jiarui Fang 2022-04-18 14:58:21 +0800
  • cd10b815c7
    reuse develop branch for fast CI Jiarui Fang 2022-04-18 14:45:26 +0800
  • c441b4d145 polish jiaruifang 2022-04-18 14:36:01 +0800
  • f1fa1a675f fix grad offload when enabling reuse_fp16_shard ver217 2022-04-18 14:07:39 +0800
  • 4c4388c46e
    [hotfix] fix memory leak in zero (#781) HELSON 2022-04-18 13:57:03 +0800
  • 9e221eb3b6 [hotfix] fix bugs in zero 1SAA 2022-04-16 21:16:02 +0800
  • 0ecbf4efaf [refactor] moving memtracer to gemini jiaruifang 2022-04-18 12:46:33 +0800
  • 4b01da24cd
    [TP] change the check assert in split batch 2d (#772) Ziyue Jiang 2022-04-16 21:29:57 +0800
  • 846406a07a
    [gemini] fix auto tensor placement policy (#775) ver217 2022-04-16 21:29:31 +0800
  • 38102cf61a
    update version (#779) v0.1.3 ver217 2022-04-16 17:09:24 +0800
  • 9a0dfe314f update version ver217 2022-04-16 17:07:59 +0800
  • 80da77aced
    Merge branch 'hpcaitech:main' into main YuliangLiu0306 2022-04-15 18:22:22 +0800
  • 2b7e7b60fa Merge branch 'feature/cli' of github.com:YuliangLiu0306/ColossalAI into feature/cli liuyuliang 2022-04-15 18:12:51 +0800
  • 5304e96078
    Merge branch 'hpcaitech:main' into feature/cli YuliangLiu0306 2022-04-15 18:11:58 +0800
  • 8993378ed0 [CLI]add TP benchmark liuyuliang 2022-04-15 18:10:32 +0800
  • 384ead231a fix auto tensor placement policy ver217 2022-04-15 17:24:13 +0800
  • a65cbb7e4e
    [zero] refactor shard and gather operation (#773) HELSON 2022-04-15 14:41:31 +0800
  • 331b07a405 refactor shard and gather operation 1SAA 2022-04-15 14:08:58 +0800
  • a636ef920c use world_size to check Wesley 2022-04-15 14:00:59 +0800
  • ce87394dd5 change the check assert in split batch 2d Wesley 2022-04-15 11:37:01 +0800
  • 5a1a095b92
    [test] refactored with the new rerun decorator (#763) Frank Lee 2022-04-15 00:33:04 +0800
  • d0f58b4e0a polish test case FrankLeeeee 2022-04-14 23:17:33 +0800
  • deaf99f4c9
    [readme] sync CN readme (#766) binmakeswell 2022-04-14 21:04:51 +0800
  • 6e553748a7
    polish sharded optim docstr and warning (#770) ver217 2022-04-14 21:03:59 +0800
  • 80e37eec42
    fix the ckpt bugs when using DDP (#769) LuGY 2022-04-14 21:03:24 +0800
  • d45bfae00c polish sharded optim docstr and warning ver217 2022-04-14 18:56:38 +0800
  • 2b4fc4168a fix the ckpt bugs when using DDP lclgy 2022-04-14 17:38:23 +0800
  • 9c9de1554e [readme] sync CN readme binmakeswell 2022-04-14 17:50:17 +0800