Commit Graph

  • 92a4bbab76 Params Checkpointing on parallelisms BoxiangW 2022-03-16 12:14:40 +0800
  • f0d6e2208b
    [polish] add license meta to setup.py (#427) Frank Lee 2022-03-16 12:05:56 +0800
  • eee3fb1a0d polish code. jiaruifang 2022-03-16 11:55:13 +0800
  • 1641d2e4e6 polish code jiaruifang 2022-03-16 11:48:05 +0800
  • 7b61871f4e merge 2 files into 1 jiaruifang 2022-03-16 11:45:33 +0800
  • 75aeed945d add license meta to setup.py FrankLeeeee 2022-03-16 10:41:12 +0800
  • 5d7dc3525b
    [hotfix] run cpu adam unittest in pytest (#424) Jiarui Fang 2022-03-16 10:39:55 +0800
  • 54229cd33e
    [log] better logging display with rich (#426) Jiarui Fang 2022-03-16 09:51:15 +0800
  • c411b5973e remove deepspeed in zero requirements jiaruifang 2022-03-16 09:47:13 +0800
  • b530039e92 better logger using rich jiaruifang 2022-03-16 09:24:23 +0800
  • dbb5a08f7c [hotfix] really run cpu adam test in unittests jiaruifang 2022-03-16 09:11:56 +0800
  • cb338eed76 polish code jiaruifang 2022-03-16 09:02:07 +0800
  • 4c7641e868 polish code jiaruifang 2022-03-16 08:53:36 +0800
  • 7fde1442e9 Added ViLT-MLM model extremeviscent 2022-03-16 06:09:58 +0800
  • c04e6457de free param.grad ver217 2022-03-15 19:04:36 +0800
  • 69098c2d0a use double buffer to handle grad ver217 2022-03-15 17:07:35 +0800
  • 571ab6340e added model synchronization for MoE model 1SAA 2022-03-15 14:13:29 +0800
  • 31e2c5d3a4 hybrid adam and place part of OS on cuda jiaruifang 2022-03-15 13:47:40 +0800
  • 3f70a2b12f
    removed noisy function during evaluation of MoE router (#419) HELSON 2022-03-15 12:06:09 +0800
  • 41644a6184 removed noisy function during evaluation of MoE router 1SAA 2022-03-15 11:45:23 +0800
  • adebb3e041
    [zero] cuda margin space for OS (#418) Jiarui Fang 2022-03-15 12:02:19 +0800
  • ca27e6eff6 Merge branch 'main' of github.com:hpcaitech/ColossalAI into jiaruifang/dao jiaruifang 2022-03-15 11:30:58 +0800
  • 56bb412e72
    [polish] use GLOBAL_MODEL_DATA_TRACER (#417) Jiarui Fang 2022-03-15 11:29:46 +0800
  • c868e5496e [zero] cuda margin memory jiaruifang 2022-03-15 11:28:49 +0800
  • 1316dd923d hotfix unittets bug jiaruifang 2022-03-15 11:08:55 +0800
  • 903ba33d82 Merge branch 'main' of github.com:hpcaitech/ColossalAI into jiaruifang/dao jiaruifang 2022-03-15 11:06:43 +0800
  • 23ba3fc450
    [zero] refactory ShardedOptimV2 init method (#416) Jiarui Fang 2022-03-15 10:45:55 +0800
  • ca1e7b3300 polish code jiaruifang 2022-03-15 10:36:30 +0800
  • 1999977a3f [polish] define GLOBAL_MODEL_DATA_TRACER jiaruifang 2022-03-15 10:26:01 +0800
  • e79ea44247
    [fp16] refactored fp16 optimizer (#392) Frank Lee 2022-03-15 10:05:38 +0800
  • 9b8a3585fa polish code jiaruifang 2022-03-15 10:01:56 +0800
  • c1389b6637 [zero] refactory the ShardedOptimV2 init method. Remove duplicated shard strategy jiaruifang 2022-03-15 09:57:17 +0800
  • 5ab8fbd79f Merge branch 'develop' of https://github.com/hpcaitech/ColossalAI into develop jiaruifang 2022-03-15 09:29:05 +0800
  • 7621b71182 Automated submodule synchronization github-actions 2022-03-15 00:01:06 +0000
  • f8a0e7fb01
    Merge pull request #412 from hpcaitech/develop Frank Lee 2022-03-14 22:48:56 +0800
  • 21dc54e019
    [zero] memtracer to record cuda memory usage of model data and overall system (#395) Jiarui Fang 2022-03-14 22:05:30 +0800
  • edf38f462e Merge branch 'develop' of github.com:hpcaitech/ColossalAI into jiarufang/memtracer jiaruifang 2022-03-14 21:41:03 +0800
  • a37bf1bc42
    [hotfix] rm test_tensor_detector.py (#413) Jiarui Fang 2022-03-14 21:39:48 +0800
  • 981c6d9526 rm test_tensor_detector.py jiaruifang 2022-03-14 21:05:10 +0800
  • 639270f926 skip test_tensor_detector.py jiaruifang 2022-03-14 21:02:33 +0800
  • 4bb6e2a7b1 polish code jiaruifang 2022-03-14 20:56:07 +0800
  • b1203e32a1 Merge branch 'develop' of github.com:hpcaitech/ColossalAI into jiarufang/memtracer jiaruifang 2022-03-14 20:53:58 +0800
  • 370f567e7d
    [zero] new interface for ShardedOptimv2 (#406) Jiarui Fang 2022-03-14 20:48:41 +0800
  • 35e8f966e0 polish code jiaruifang 2022-03-14 18:06:03 +0800
  • a9c27be42e
    Added tensor detector (#393) LuGY 2022-03-14 18:01:46 +0800
  • 3d1c85601c Allowed change include_cpu when detect() lclgy 2022-03-14 17:58:41 +0800
  • dbb581065f polish code jiaruifang 2022-03-14 17:55:27 +0800
  • 32296cf462
    Merge pull request #409 from 1SAA/develop Frank Lee 2022-03-14 17:43:45 +0800
  • b5aba3ba33 Added the - states lclgy 2022-03-14 17:40:04 +0800
  • 74025806b9 offload grad when flushing bucket ver217 2022-03-14 17:22:16 +0800
  • 907ac4a2dc fixed error when no collective communication in CommProfiler 1SAA 2022-03-14 16:43:21 +0800
  • 5e24124b6f Merge branch 'develop' of github.com:hpcaitech/ColossalAI into jiaruifang/polish_optimv2 jiaruifang 2022-03-14 17:20:41 +0800
  • c511e80b01 Merge branch 'develop' of github.com:hpcaitech/ColossalAI into jiarufang/memtracer jiaruifang 2022-03-14 17:15:09 +0800
  • 62b08acc72
    update hf badge link (#410) Frank Lee 2022-03-14 17:07:01 +0800
  • b331429b3a update hf badge link FrankLeeeee 2022-03-14 08:23:51 +0000
  • 2fe68b359a
    Merge pull request #403 from ver217/feature/shard-strategy Frank Lee 2022-03-14 16:29:28 +0800
  • 980f0fa5c4 polish code jiaruifang 2022-03-14 16:25:07 +0800
  • 4417ee1409 Merge branch 'develop' of github.com:hpcaitech/ColossalAI into jiaruifang/polish_optimv2 jiaruifang 2022-03-14 16:23:24 +0800
  • cf92a779dc
    added huggingface badge (#407) Frank Lee 2022-03-14 16:23:02 +0800
  • c276ea7f0d added huggingface badge FrankLeeeee 2022-03-14 08:21:32 +0000
  • 7b29007e22 new interface for sharded optim v2 jiaruifang 2022-03-14 16:20:17 +0800
  • dfd0363f68
    polished output format for communication profiler and pcie profiler (#404) HELSON 2022-03-14 16:07:45 +0800
  • 63469c0f91 polish code ver217 2022-03-14 15:48:55 +0800
  • 6fb23ce81c polished output format for communication profiler and pcie profiler 1SAA 2022-03-14 15:29:24 +0800
  • 54fd37f0e0 polish unit test ver217 2022-03-14 15:06:02 +0800
  • 88804aee49 add bucket tensor shard strategy ver217 2022-03-14 14:48:32 +0800
  • 5cb699f70d polish code jiaruifang 2022-03-14 13:41:00 +0800
  • 9a08abbac1 polish code jiaruifang 2022-03-14 13:27:48 +0800
  • b277be8b65 refactored fp16 optimizer FrankLeeeee 2022-03-09 02:06:42 +0000
  • aaead33cfe
    Merge pull request #397 from hpcaitech/create-pull-request/patch-sync-submodule Frank Lee 2022-03-14 10:11:06 +0800
  • 6098bc4cce Automated submodule synchronization github-actions 2022-03-14 00:01:12 +0000
  • 6937f85004
    Merge pull request #402 from oikosohn/oikosohn-patch-1 Frank Lee 2022-03-13 22:40:04 +0800
  • ff4f5d7231
    fix typo in CHANGE_LOG.md sohn 2022-03-13 23:34:34 +0900
  • 0f21216b7b fix unittest bug jiaruifang 2022-03-13 20:53:25 +0800
  • c0c7e7f7ef Merge branch 'main' of https://github.com/hpcaitech/ColossalAI into jiarufang/memtracer jiaruifang 2022-03-13 20:49:49 +0800
  • 265b1c01fc Merge branch 'develop' of https://github.com/hpcaitech/ColossalAI into jiarufang/memtracer jiaruifang 2022-03-13 20:46:42 +0800
  • fc5101f24c
    Merge pull request #401 from hpcaitech/develop Frank Lee 2022-03-13 11:09:17 +0800
  • fc2fd0abe5
    Merge pull request #400 from hpcaitech/hotfix/readme Frank Lee 2022-03-13 09:12:59 +0800
  • 6d3a4f51bf fixed broken badge link Frank Lee 2022-03-13 09:11:48 +0800
  • f7a42d3968 Automated submodule synchronization github-actions 2022-03-12 00:01:07 +0000
  • 834d9a83d0 Merge branch 'develop' of https://github.com/hpcaitech/ColossalAI into jiarufang/memtracer jiaruifang 2022-03-11 18:18:54 +0800
  • 7c079d9c33
    [hotfix] fixed bugs in ShardStrategy and PcieProfiler (#394) HELSON 2022-03-11 18:12:46 +0800
  • 98c66468d5 Merge branch 'develop' of https://github.com/hpcaitech/ColossalAI into jiaruifang/cuda_memcollector jiaruifang 2022-03-11 18:08:30 +0800
  • 27b43fa51d fixed bugs in ShardStrategy and PcieProfiler 1SAA 2022-03-11 18:00:31 +0800
  • ffd7beedd5 Add memory tracer for model data and system usage jiaruifang 2022-03-11 17:55:13 +0800
  • ebbaea9167 Added tensor detector lclgy 2022-03-11 16:47:00 +0800
  • 275768b3fc Add a function which about check multiple gpu communicate by peer to peer available in single machine. And add a file which can test multiple gpus support p2p or not. Reason: nvidia is not support peer to peer by PCIe, hence, multiple gpus contact with each other by peer to peer need to fix NvLink. However, this hardware environment is not supported by every user, so tips are needed. chenjunejie 2022-03-11 17:39:14 +0800
  • f417a88b7f Fixed bugs in PcieProfiler and ShardStrategy binmakeswell 2022-03-10 15:35:06 +0800
  • bd7247a1a1 [polish] fix format (#370) binmakeswell 2022-03-10 15:35:06 +0800
  • 895f40fe3b
    Merge pull request #382 from FrankLeeeee/hotfix/refactor-fp16-optim Frank Lee 2022-03-11 16:44:59 +0800
  • d180b4664c remove pull request info FrankLeeeee 2022-03-11 08:42:51 +0000
  • b19793e71c refactored fp16 optimizer FrankLeeeee 2022-03-09 02:06:42 +0000
  • 195982133d [polish] fix format (#370) binmakeswell 2022-03-10 15:35:06 +0800
  • a816bf7d18 [polish] fix format (#370) binmakeswell 2022-03-10 15:35:06 +0800
  • 20f4020f61
    Merge a missing fix into main (#390) Frank Lee 2022-03-11 16:22:47 +0800
  • dac69b8739 fixed broken badge link (#389) Frank Lee 2022-03-11 16:00:59 +0800
  • 9732db9252 [polish] fix format (#370) binmakeswell 2022-03-10 15:35:06 +0800
  • e21e851b9b fixed broken badge link FrankLeeeee 2022-03-11 07:59:18 +0000
  • 1e4bf85cdb fixed bug in activation checkpointing test (#387) Frank Lee 2022-03-11 14:48:11 +0800
  • 3af13a2c3e [zero] polish ShardedOptimV2 unittest (#385) Jiarui Fang 2022-03-11 14:40:01 +0800