Commit Graph

  • d9262da635 update hf model: add rope config and add qkv x54-729 2023-12-12 12:19:20 +0800
  • d6eeacfeb2 bug lijiaxing 2023-12-12 10:36:04 +0800
  • cc5b15349d
    fix(metric): add metric dtype control (#533) Pryest 2023-12-11 19:36:31 +0800
  • fdce50a000 fix default behavior Pryest 2023-12-11 17:27:09 +0800
  • c7db6db066 fix the bug so that the sequence parallel norm is all-reduced when overlap is False yingtongxiong 2023-12-11 17:36:33 +0800
  • 347370a58a fix demo config to avoid implicity Pryest 2023-12-11 16:25:33 +0800
  • 6c0ff4820f
    feat(model): support llama model with checkpoint loading (#532) jiaxingli 2023-12-11 16:25:24 +0800
  • 649af64c59 fix(metric): add metric dtype control Pryest 2023-12-11 16:16:21 +0800
  • b63b8e58bd modeling lijiaxing 2023-12-11 15:43:59 +0800
  • a83b02acf4 modeling lijiaxing 2023-12-11 15:31:46 +0800
  • e57ca246d9 importerror lijiaxing 2023-12-11 13:53:48 +0800
  • 472671688f importerror lijiaxing 2023-12-11 13:38:37 +0800
  • 4b7fa26d80 support hf llama lijiaxing 2023-12-08 20:13:34 +0800
  • 41edd074a6 support hf llama lijiaxing 2023-12-08 16:43:56 +0800
  • 6def66fb07 support hf llama lijiaxing 2023-12-08 16:08:15 +0800
  • 9d824d66ec support hf llama lijiaxing 2023-12-08 12:43:16 +0800
  • 5c0925cd6c feat(metrics): make float32 logits off by default 877825076@qq.com 2023-12-08 00:46:53 +0800
  • 66e4a8a847 auto resume lijiaxing 2023-12-07 19:42:03 +0800
  • 81ffb3d824
    fix(test): fix type_ids unpack bug (#530) Guoteng 2023-12-07 18:47:19 +0800
  • bbc1a01fe5 fix: update ci for type_ids unpack bug fix 877825076@qq.com 2023-12-07 13:17:02 +0800
  • 68159c22a4 auto resume lijiaxing 2023-12-07 10:24:52 +0800
  • 3f49409681 Merge branch 'develop' of https://github.com/InternLM/InternLM into storage_multipart_upload lijiaxing 2023-12-07 10:23:05 +0800
  • 1da080a58e auto resume lijiaxing 2023-12-07 10:19:48 +0800
  • 828033aed5
    fix(storage): unify the name of ak & sk (#527) jiaxingli 2023-12-06 15:31:44 +0800
  • d0d39fa3ef storage lijiaxing 2023-12-06 15:24:57 +0800
  • ff62cf2a7c storage lijiaxing 2023-12-06 15:15:54 +0800
  • 809ad9ebc8
    fix the type_ids when micro_num=1 and use_flash_attn=False (#516) ytxiong 2023-12-06 14:38:28 +0800
  • 112c34ae09
    feat(grad_norm): vocab grad norm profiling (#519) jiaopenglong 2023-12-06 13:52:42 +0800
  • 9fc252f40e
    add output embedding tf32 option (#523) jiaopenglong 2023-12-06 13:50:59 +0800
  • c581cc4c02
    fix(model): add IS_SEQUENCE_PARALLEL check for norm module (#528) ytxiong 2023-12-06 12:06:22 +0800
  • 16f8ec2354 fix the spell bug and move the sequence judge to training_internlm yingtongxiong 2023-12-06 12:03:23 +0800
  • bffb515d30 fix lint yingtongxiong 2023-12-06 11:03:10 +0800
  • a9d5ad1b5f replace the named_children by named_modules yingtongxiong 2023-12-06 11:01:07 +0800
  • 2b28923949 remove comments yingtongxiong 2023-12-06 10:35:40 +0800
  • e6c0d7bf62 fix lint yingtongxiong 2023-12-05 21:03:00 +0800
  • 62d193c763 add IS_SEQUENCE_PARALLEL check for norm module yingtongxiong 2023-12-05 20:58:26 +0800
  • a34c31c08e change ak sk name lijiaxing 2023-12-05 17:44:56 +0800
  • c3a636ba0c change ak sk name lijiaxing 2023-12-05 15:07:52 +0800
  • 3410362f4c change ak sk name lijiaxing 2023-12-05 13:57:05 +0800
  • 5c2c247e21 Merge branch 'storage_multipart_upload' of https://github.com/li126com/InternLM into storage_multipart_upload lijiaxing 2023-12-05 12:21:30 +0800
  • 4128f1dbe6 change ak sk name lijiaxing 2023-12-05 12:15:33 +0800
  • 72cb7d6869 fix ci test_pipeline JiaoPL 2023-12-05 12:10:41 +0800
  • 5b101f2377 Merge branch 'develop' of https://github.com/InternLM/InternLM into storage_multipart_upload lijiaxing 2023-12-05 11:56:04 +0800
  • 843653de05 Merge branch 'develop' into feat/vocab_grad_norm JiaoPL 2023-12-05 11:53:14 +0800
  • 2dbbab7418
    fix test_checkpoint (#526) jiaxingli 2023-12-04 15:38:13 +0800
  • 3b322618a4 fix test_checkpoint lijiaxing 2023-12-04 15:08:57 +0800
  • 1738bee002
    feat(storage): use multipart upload when using oss (#520) jiaxingli 2023-12-01 17:05:58 +0800
  • 66bffffe5c
    add unit test case (#524) kkscilife 2023-12-01 16:12:39 +0800
  • 66d6efd004 overlap gating further Wenwen Qu 2023-11-23 17:46:32 +0800
  • d74ad7cca7 change assert condition for tutel Wenwen Qu 2023-11-17 18:58:52 +0800
  • d20aa41d86 implement overlap moe forward Wenwen Qu 2023-11-16 19:43:47 +0800
  • 3443ab1f5b merge operand if noisy_gate_policy is not used Wenwen Qu 2023-11-28 16:17:49 +0800
  • 95263fa1d0 merge operands in topk gating Wenwen Qu 2023-11-28 14:52:50 +0800
  • 0b6a75c334 add unit test case wangmengke 2023-12-01 14:54:41 +0800
  • cdb8cfc929
    Merge branch 'develop' into storage_multipart_upload jiaxingli 2023-12-01 14:17:37 +0800
  • c7e83fd611 storage lijiaxing 2023-12-01 14:14:55 +0800
  • 03e53871a7 storage lijiaxing 2023-12-01 14:08:45 +0800
  • b7f721fffb storage lijiaxing 2023-12-01 11:14:34 +0800
  • 3b7fb97e04 storage lijiaxing 2023-12-01 11:10:04 +0800
  • b3be333aa2
    fix(ci): fix test model ckpt ci test (#518) Guoteng 2023-11-30 19:16:35 +0800
  • 4467f827d1 add output embedding tf32 option JiaoPL 2023-11-30 16:49:00 +0800
  • b79d5ea7ae
    test(workflow): add workflow for loss test and change trigger event (#513) kkscilife 2023-11-30 11:04:07 +0800
  • 7f7d9d9a2c assign rank ali li126com 2023-11-29 12:05:06 +0000
  • 90bf9adac4 assign rank ali li126com 2023-11-29 11:54:59 +0000
  • fb3006de1e assign rank ali li126com 2023-11-29 11:49:51 +0000
  • 06cdcc3654 upload lijiaxing 2023-11-29 11:08:40 +0800
  • 83ebebd5bc add grad_norm profiling interval && refactor save grad norm JiaoPL 2023-11-28 20:41:29 +0800
  • 757e19e01a
    1. fix(config): rampup_batch_size defalut value BC. (#515) Guoteng 2023-11-28 19:33:46 +0800
  • 4a9c3c73ce optimize trigger event wangmengke 2023-11-28 17:21:29 +0800
  • 4e4fb52898 multipart upload lijiaxing 2023-11-28 15:37:26 +0800
  • 4eed07a3c3 compute vocab grad norm && save pt JiaoPL 2023-11-28 12:13:23 +0800
  • f37c8442f3 fix(ci): fix test model ckpt ci test 877825076@qq.com 2023-11-27 16:39:39 +0800
  • 9780c44917 fix comments gaoyang07 2023-11-25 23:34:18 +0800
  • fdbdfcff34 remove micro_bsz gaoyang07 2023-11-25 22:44:20 +0800
  • 06e8301861
    name (#514) jiaxingli 2023-11-24 18:24:54 +0800
  • acf8fb9712 1. fix(config): rampup_batch_size defalut value BC. 2. fix(config): standardize config parameter access. 3. feat(launch): add warmup_process_group 4. feat(memory): add cuda_memory_analyze 877825076@qq.com 2023-11-24 16:45:18 +0800
  • 19157361e0 fix the type_ids when micro_num=1 and use_flash_attn=False yingtongxiong 2023-11-24 16:44:17 +0800
  • 05d0a8d821 name lijiaxing 2023-11-24 15:55:50 +0800
  • 6549ebebdf change trigger event wangmengke 2023-11-24 15:33:38 +0800
  • c5ea82b074 add workflow for loss test wangmengke 2023-11-24 15:22:19 +0800
  • b59641715a
    Feat(QA): Check loss when swapping micro_num and micro_bsz && Check grad norm (#510) jiaxingli 2023-11-24 12:05:14 +0800
  • 0d3811c029
    feat(model): add rope_base interface (#512) Shuo Zhang 2023-11-23 16:30:14 +0800
  • b12dd9621f feat(model): add rope_base interface Shuo Zhang 2023-11-23 15:34:50 +0800
  • 4ed3388a2f check grad norm lijiaxing 2023-11-23 14:20:48 +0800
  • ed1d9c3b7c check grad norm lijiaxing 2023-11-23 14:16:48 +0800
  • 61346c24f6 check loss lijiaxing 2023-11-22 10:29:32 +0800
  • 35aa093afe Merge branch 'develop' of https://github.com/InternLM/InternLM into improve_unitest lijiaxing 2023-11-22 10:28:42 +0800
  • 7776693373
    feat(doc): add GPU memory info for 7B & 20B models (#507) jiaxingli 2023-11-21 19:20:02 +0800
  • f5aea7e08c
    fix(timeout): larger timeout (#495) jiaopenglong 2023-11-21 19:19:22 +0800
  • 972b4f02c0
    update timeout thresholds jiaopenglong 2023-11-21 17:40:13 +0800
  • 18a17a434c doc fix lijiaxing 2023-11-21 17:28:22 +0800
  • 4aa7c21a76 doc fix lijiaxing 2023-11-21 17:14:53 +0800
  • 8bd85a6f5e Merge branch 'develop' of https://github.com/InternLM/InternLM into improve_unitest lijiaxing 2023-11-17 16:56:16 +0800
  • f47ec9a34c memory_test lijiaxing 2023-11-17 16:53:17 +0800
  • eba2b859fc
    feat(seed): set global seed for every model initialization (#496) v0.2.1dev20231121 jiaxingli 2023-11-17 14:42:50 +0800
  • 679ed3c8ca
    test(workflow): add model init test (#504) kkscilife 2023-11-17 09:59:34 +0800
  • 0bfc86205e
    feat(train): support_rampup_batch_size and fix bugs (#493) Guoteng 2023-11-16 19:51:01 +0800
  • 4a6987d5e7
    unitest_only_forward (#484) jiaxingli 2023-11-16 15:30:57 +0800
  • 569988ac57 reduce timeout wangmengke 2023-11-16 15:21:04 +0800
  • 81a02014d1 add model init test wangmengke 2023-11-16 15:15:20 +0800