Commit Graph

  • 115df0c60e
    Merge pull request #1 from hpcaitech/fix/format FredHuang99 2022-04-02 14:51:37 +0800
  • 84b1bee8e9
    [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/cuda_util.h code style (#641) Xue Fuzhao 2022-04-02 14:38:40 +0800
  • d5b950d5f8 [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/cuda_util.h code style XueFuzhao 2022-04-02 06:36:36 +0000
  • 954e4ce202
    [NFC] polish colossalai/context/process_group_initializer/initializer_sequence.py colossalai/context/process_group_initializer initializer_tensor.py code style (#639) Cautiousss 2022-04-02 14:30:04 +0800
  • 87237860b5 [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/cuda_util.h code style XueFuzhao 2022-04-02 06:24:22 +0000
  • b73b9ee53a
    [NFC] polish colossalai/builder/pipeline.py code style (#638) Ziheng Qin 2022-04-02 14:22:41 +0800
  • a3d291eed0 [NFC] polish colossalai/context/process_group_initializer/initializer_sequence.py colossalai/context/process_group_initializer initializer_tensor.py code style 何晓昕 2022-04-02 14:21:23 +0800
  • 5bed093dd0 [NFC] polish colossalai/builder/pipeline.py code style Ziheng Qin 2022-04-02 14:20:40 +0800
  • db72eb63e9 fixed bugs in CPU adam 1SAA 2022-04-02 01:24:49 +0800
  • f5d3a9c2b0
    polish checkpoint docstring (#637) ver217 2022-04-02 13:34:33 +0800
  • b73b82a514 polish checkpoint docstring ver217 2022-04-02 13:32:07 +0800
  • c6afafa75a
    [NFC] polish colossalai/kernel/cuda_native/csrc/cpu_adam.cpp code style (#636) Sze-qq 2022-04-02 13:28:57 +0800
  • be9a0f6185 [NFC] polish colossalai/kernel/cuda_native/csrc/cpu_adam.cpp code style Sze-qq 2022-04-02 12:51:25 +0800
  • c4b1dddd0d
    [NFC] polish colossalai/kernel/cuda_native/csrc/multi_tensor_l2norm_kernel.cu code style (#635) Wangbo Zhao 2022-04-02 10:45:04 +0800
  • 5f1f3bbb1c [NFC] polish colossalai/kernel/cuda_native/csrc/multi_tensor_l2norm_kernel.cu code style wangbo-zhao 2022-04-02 10:13:09 +0800
  • 02174d4450
    [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/dropout_kernels.cu and cross_entropy.cu code style (#634) ExtremeViscent 2022-04-02 02:29:45 +0100
  • d96f12b391 [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/dropout_kernels.cu and cross_entropy.cu code style ExtremeViscent 2022-04-02 01:19:24 +0100
  • b20ec914c6
    Merge 7fde1442e9 into 055fbf5be6 ExtremeViscent 2022-04-02 01:10:09 +0100
  • 4a21a8d017
    Merge branch 'hpcaitech:main' into feature/monitoring SMesForoush 2022-04-01 21:20:15 +0430
  • c92d35c4a8
    '[NFC] polish <colossalai/engine/_base_engine.py> code style' (#631) RichardoLuo 2022-04-01 23:41:09 +0800
  • 0b6d430b9c '[NFC] polish <colossalai/engine/_base_engine.py> code style' RichardoLuo 2022-04-01 23:32:24 +0800
  • b6df1366cd
    [NFC] polish colossalai/communication/ring.py code style (#630) Zangwei 2022-04-01 21:38:05 +0800
  • 56524c443d [NFC] polish colossalai/communication/ring.py code style zhengzangw 2022-04-01 21:28:51 +0800
  • ac1e0c1991
    [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/transform_kernels.cu code stype (#629) puck_WCR 2022-04-01 20:20:54 +0800
  • 055fbf5be6
    [zero] adapt zero for unsharded paramters (Optimizer part) (#601) HELSON 2022-04-01 20:10:47 +0800
  • 478ea7130e adapt zero for unshard model (Optimizer part) 1SAA 2022-04-01 11:27:27 +0800
  • 2f282808e4 [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/transform_kernels.cu code stype WANG-CR 2022-04-01 19:13:45 +0800
  • b2e4c68cce
    [NFC] polish colossalai/kernel/cuda_native/csrc/multi_tensor_lamb.cu code stype (#628) superhao1995 2022-04-01 19:03:01 +0800
  • 6daf31acf2 [NFC] polish colossalai/kernel/cuda_native/csrc/multi_tensor_lamb.cu code stype superhao1995 2022-04-01 18:16:38 +0800
  • 18b5f25056 refactor pipeline---put runtime schedule into engine. liuyuliang 2022-04-01 18:12:04 +0800
  • 4cc2ae2432
    [NFC] polish <colossalai/context/process_group_initializer/initializer_data.py> code stype (#626) Jiang Zhuo 2022-04-01 17:55:06 +0800
  • 87dbc198e3 [NFC] polish <colossalai/context/process_group_initializer/initializer_data.py> code stype 姜卓 2022-04-01 17:52:51 +0800
  • 229382c844
    [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/cuda_util.cu code stype (#625) KAIYUAN GAN 2022-04-01 17:45:53 +0800
  • 764384ed8a [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/cuda_util.cu code stype GaryGky 2022-04-01 17:43:27 +0800
  • 1fbb26a77c [NFC] polish colossalai/context/process_group_initializer/initializer_pipeline.py code stype 姜卓 2022-04-01 17:32:59 +0800
  • d8c5df484e polish jiaruifang 2022-04-01 17:24:28 +0800
  • 4b355d7657 [NFC] polish <colossalai/kernel/cuda_native/csrc/kernels/general_kernels.cu> code stype 姜卓 2022-04-01 17:19:08 +0800
  • 354b7954d1
    [model checkpoint] added unit tests for checkpoint save/load (#599) アマデウス 2022-04-01 16:53:32 +0800
  • 28b515d610
    [model checkpoint] updated checkpoint hook (#598) アマデウス 2022-04-01 16:53:03 +0800
  • 77ad24bf94
    [model checkpoint] updated saving/loading for 3d layers (#597) アマデウス 2022-04-01 16:52:47 +0800
  • 93089ed708
    [model checkpoint] updated saving/loading for 2.5d layers (#596) アマデウス 2022-04-01 16:52:33 +0800
  • 6302069c0e
    [model checkpoint] updated communication ops for cpu tensors (#590) アマデウス 2022-04-01 16:52:20 +0800
  • c50bfb807b
    [model checkpoint] updated saving/loading for 1d layers (#594) アマデウス 2022-04-01 16:51:52 +0800
  • 7636d518e1
    [model checkpoint] updated saving/loading for 2d layers (#595) アマデウス 2022-04-01 16:50:34 +0800
  • cd13b63832
    [model checkpoint] reworked unified layers for ease of save/load states (#593) アマデウス 2022-04-01 16:49:56 +0800
  • acae68eb04
    [model checkpoint] updated checkpoint save/load utils (#592) アマデウス 2022-04-01 16:49:21 +0800
  • d75d3473a8
    Merge branch 'main' into feature/checkpoint-utils Jiarui Fang 2022-04-01 16:49:03 +0800
  • 1c40ee8749
    [TP] add assert for tp1d (#621) Ziyue Jiang 2022-04-01 16:44:23 +0800
  • 3a80a67d5f Merge branch 'main' of github.com:hpcaitech/ColossalAI into feature/tensor_shape_check Wesley 2022-04-01 16:38:12 +0800
  • a57cf6efd2 add assert for tp1d Wesley 2022-04-01 16:37:52 +0800
  • 369a288bf3
    polish utils docstring (#620) ver217 2022-04-01 16:36:47 +0800
  • a9c1a4a506 polish utils docstring ver217 2022-04-01 16:32:22 +0800
  • e619a651fb
    polish optimizer docstring (#619) ver217 2022-04-01 16:27:03 +0800
  • 39b1c6e070 polish optimizer docstring ver217 2022-04-01 16:21:21 +0800
  • 8432dc7080
    polish moe docsrting (#618) ver217 2022-04-01 16:15:36 +0800
  • d31a63ec92 polish moe docsrting ver217 2022-04-01 16:13:41 +0800
  • 5ff6e2a8f0
    [NFC] polish colossalai/context/process_group_initializer/process_group_initializer.py code stype (#617) ziyu huang 2022-04-01 16:13:29 +0800
  • e444ca2b83 [NFC] polish colossalai/context/process_group_initializer/process_group_initializer.py code stype “Arsmart123 2022-04-01 16:11:54 +0800
  • c5b488edf8
    polish amp docstring (#616) ver217 2022-04-01 16:09:39 +0800
  • 74ead70e6a polish amp docstring ver217 2022-04-01 16:06:40 +0800
  • eda13056e2 polish jiaruifang 2022-04-01 15:51:12 +0800
  • f69507dd22
    update rst (#615) ver217 2022-04-01 15:46:38 +0800
  • f9ee785eb8 update rst ver217 2022-04-01 15:35:59 +0800
  • 640be94c36 polish code jiaruifang 2022-04-01 15:27:14 +0800
  • 538cf03079 polish jiaruifang 2022-04-01 15:23:12 +0800
  • 5525436a19 [zero] initialize a stateful tensor manager jiaruifang 2022-04-01 15:18:49 +0800
  • 93f14d2a33
    [zero] test zero tensor utils (#609) FredHuang99 2022-04-01 15:16:59 +0800
  • cbc1a97bba
    fix format (#613) Shawn-Kong 2022-03-31 23:57:19 -0700
  • 0ef8819c67
    polish docstring of zero (#612) ver217 2022-04-01 14:50:56 +0800
  • 2500daa7a9 polish docstring of zero ver217 2022-04-01 14:47:58 +0800
  • 0958525d87
    Update test_tensor_move.py FredHuang99 2022-04-01 14:41:07 +0800
  • f398372e99 fix format evin K 2022-03-31 23:29:07 -0700
  • a16f76dcad
    fix format (#611) Yuer867 2022-04-01 14:19:27 +0800
  • 02b187c14f
    [zero] add sampling time for memstats collector (#610) LuGY 2022-04-01 14:03:00 +0800
  • 23e4e96228 fix typo lclgy 2022-04-01 13:59:43 +0800
  • 71c472090c fix format Yuer867 2022-04-01 13:56:31 +0800
  • dbe651b2f7 [zero] add sampling time for memstats collector lclgy 2022-04-01 13:51:17 +0800
  • d18c2ec496
    fix format (#607) xyupeng 2022-04-01 13:31:06 +0800
  • 62187e3121
    fix format (#608) xuqifan897 2022-03-31 22:30:39 -0700
  • cd2395d0da
    Update test_tensor_move.py FredHuang99 2022-04-01 13:11:53 +0800
  • 90b475adf6 fix format Qifan Xu 2022-03-31 21:44:02 -0700
  • 9bee119104
    [hotfix] fix sharded optim zero grad (#604) ver217 2022-04-01 12:41:20 +0800
  • bc0a07c486 fix format xypeng 2022-04-01 12:40:30 +0800
  • d2fbbdec67 polish comments ver217 2022-04-01 12:01:04 +0800
  • bc45d9af52 fix sharded optim zero grad ver217 2022-04-01 11:52:08 +0800
  • 1bcec0496e fix format xypeng 2022-04-01 11:51:07 +0800
  • 297b8baae2
    [model checkpoint] add gloo groups for cpu tensor communication (#589) アマデウス 2022-04-01 10:15:52 +0800
  • 54e688b623
    moved ensure_path_exists to utils.common (#591) アマデウス 2022-04-01 09:46:33 +0800
  • e956d93ac2
    [refactor] memory utils (#577) Jiarui Fang 2022-04-01 09:22:33 +0800
  • 20e35faf17
    Merge branch 'main' into jiaruifang/memory_utils Jiarui Fang 2022-04-01 09:22:26 +0800
  • d6fd4a0f73 Added unit tests for checkpoint save/load zbian 2022-03-31 23:33:22 +0800
  • ed1fcc223e updated checkpoint hook zbian 2022-03-31 23:31:03 +0800
  • 657f1b371a updated saving/loading for 3d layers zbian 2022-03-31 23:29:20 +0800
  • bceefe9181 updated saving/loading for 2.5d layers zbian 2022-03-31 23:28:04 +0800
  • 5baeb5a4b8 updated saving/loading for 2d layers zbian 2022-03-31 23:26:40 +0800
  • 9e5057dd72 updated saving/loading for 1d layers zbian 2022-03-31 23:25:04 +0800
  • 9722e4149d reworked unified layers for ease of save/load states zbian 2022-03-31 23:20:34 +0800
  • 131fd8f831 updated checkpoint save/load utils zbian 2022-03-31 23:17:52 +0800
  • 08643b02ba moved ensure_path_exists to utils.common zbian 2022-03-31 23:13:53 +0800
  • e08cd35251 updated communication ops for cpu tensors zbian 2022-03-31 23:09:13 +0800