Commit Graph

  • 610e011133 Merge branch 'feature_add_fsdp3' of https://github.com/zaglc/InternLM into feature_add_fsdp3 huangting4201 2023-10-08 17:16:06 +0800
  • 132a841d42 modify args_sanity_check for fsdp with pipeline and fsdp with moe zaglc 2023-10-08 16:27:14 +0800
  • 36b687c882 Merge branch 'feature_add_fsdp3' of https://github.com/zaglc/InternLM into feature_add_fsdp3 huangting4201 2023-10-08 16:11:03 +0800
  • eb14dae005 fix conflicts zaglc 2023-10-08 15:49:47 +0800
  • 7d52276c13 add support for FSDP with tp zaglc 2023-10-08 15:33:31 +0800
  • c19b88a3fa feat: support volc tos li126com 2023-10-08 15:07:02 +0800
  • bf475b6940 debug yingtongxiong 2023-10-08 13:20:29 +0800
  • 787e0e0940 Recover use_dynamic_ntk_rope. Pryest 2023-10-07 20:59:47 +0800
  • 4a714966fc
    Merge branch 'develop' into fix/inference_forward Pryest 2023-10-07 20:18:48 +0800
  • e5bf40e38f Fix errant inference_forward. Pryest 2023-10-07 20:13:30 +0800
  • 337f1efe96 fix(monitor):make MonitorTracker thread to daemon 877825076@qq.com 2023-10-07 15:19:43 +0800
  • 8b65e2e3c4
    fix(doc): fix huggingface url (#392) Guoteng 2023-10-07 15:03:44 +0800
  • c907d8bfa3 fix(doc): fix huggingface url 877825076@qq.com 2023-10-07 15:00:42 +0800
  • e5a2909af0 Merge remote-tracking branch 'upstream/develop' into feat/deepspeed_sp merge upstream/develop yingtongxiong 2023-10-07 14:04:00 +0800
  • 10aa63f0e1 support optimized sp yingtongxiong 2023-10-07 14:03:47 +0800
  • 1d7d774ecf fix(init): allow resume_tb_folder is an empty string 877825076@qq.com 2023-10-07 13:28:07 +0800
  • 11a4a8bb44 restore logic for empty fp32 group Qu Wenwen 2023-10-07 13:27:13 +0800
  • e1ecfa51ec Merge remote-tracking branch 'upstream/develop' into bugs/fix_mixed_precision_with_pipeline Qu Wenwen 2023-10-07 12:08:23 +0800
  • 45e31b84a7 test pp li126com 2023-09-28 18:12:50 +0800
  • 870dd7ddc6 modify code submit haijunlv 2023-09-28 14:09:26 +0800
  • 79d7c392a6 fix bugs for pipeline Wenwen Qu 2023-09-28 13:44:14 +0800
  • 4f9e8cd70d
    Doc(config): add auto_resume annotation into example config (#380) Guoteng 2023-09-28 13:39:02 +0800
  • 375240e039
    feat(moe): add local data parallel support for experts (#376) Wenwen Qu 2023-09-28 13:38:02 +0800
  • 6ff13c126b
    Update 7B_sft.py Guoteng 2023-09-27 22:07:53 +0800
  • 5757a88ca4
    Update 7B_sft.py Guoteng 2023-09-27 21:38:45 +0800
  • 3d8690bf8b
    update auto_resume 7B_sft.py Guoteng 2023-09-27 21:37:43 +0800
  • fd7138af38 doc(config): add auto_resume related comments 877825076@qq.com 2023-09-27 21:20:39 +0800
  • 5ab0dc8dc2 pp test li126com 2023-09-27 21:19:05 +0800
  • c8242572f2
    fix the moe loss as none for panel_metrics (#379) Ryan (张磊) 2023-09-27 20:29:50 +0800
  • c3dbb35f19 fix the moe loss as none for panel_metrics zhanglei 2023-09-27 20:13:15 +0800
  • f13aea905b add moe unit test for e2e zhanglei 2023-09-27 19:58:46 +0800
  • eb4db08477 add moe unit test zhanglei 2023-09-27 19:48:27 +0800
  • 6195ea724f do not set ep size from config Wenwen Qu 2023-09-27 19:22:51 +0800
  • c4c43bf157 fix model checkpoint for local dp mode of expert Wenwen Qu 2023-09-27 19:00:36 +0800
  • 2b863bd099 add local data parallel support for experts Wenwen Qu 2023-09-27 18:39:15 +0800
  • 80d4744c42 merge upstream/develop into feature_add_moe_data Wenwen Qu 2023-09-27 18:26:10 +0800
  • 7e505f3c59 refactor code for sync_model_param() Wenwen Qu 2023-09-27 18:06:42 +0800
  • 00478761f7 update 7B-sft.py Wenwen Qu 2023-09-27 17:57:30 +0800
  • 9a1bd616d0 Merge branch 'feature_add_moe_data' of https://github.com/blankde/InternLM into feature_add_moe_data Wenwen Qu 2023-09-27 17:55:05 +0800
  • e2b7a7fa89 set default expert parallel size Wenwen Qu 2023-09-27 17:51:58 +0800
  • 80f1eb9a36 more wrap zaglc 2023-09-27 17:35:28 +0800
  • e34e7307c9
    docs(doc): add tf32 docs (#374) ytxiong 2023-09-27 15:55:44 +0800
  • 136d55ec30
    feat(moe): add moe module (#182) Wenwen Qu 2023-09-27 15:54:53 +0800
  • f5caa1c048 Merge branch 'feature_add_moe' into feature_add_moe_data Wenwen Qu 2023-09-27 15:01:35 +0800
  • f96764868d change condition for compatibility Wenwen Qu 2023-09-27 12:36:03 +0800
  • ae063b225d merge develop yingtongxiong 2023-09-27 11:57:48 +0800
  • 07038d1224
    docs(doc/code-docs): update document image for InternLM parallel architecture (#373) Season 2023-09-27 11:50:22 +0800
  • 70dd578895 modify the gitignore yingtongxiong 2023-09-27 11:45:41 +0800
  • 43683b2975 add english doc yingtongxiong 2023-09-27 11:37:55 +0800
  • 630c713ef5 Merge branch 'feature_add_moe' into feature_add_moe_data Wenwen Qu 2023-09-27 11:30:15 +0800
  • 591b4edb1d update moe config file Wenwen Qu 2023-09-27 11:28:36 +0800
  • 5677bd7aab update english translation in readthedocs zigzagcai 2023-09-27 11:19:11 +0800
  • 59b7530129 fix(model/modeling_internlm.py): fix checkpoint=False runtime error huangting4201 2023-09-27 11:18:04 +0800
  • e57a99a810 add docs for tf32 in mix precision yingtongxiong 2023-09-27 11:17:03 +0800
  • 2a70262ceb fix merge bugs Wenwen Qu 2023-09-27 11:17:03 +0800
  • 52d65f80e6 docs(doc/code-docs): remove fuzzy translation in sphinx files zigzagcai 2023-09-27 11:10:12 +0800
  • b1f85462eb docs(doc/imgs): update image for internlm parallel architecture zigzagcai 2023-09-27 10:38:37 +0800
  • 27930174ae merge develop yingtongxiong 2023-09-27 10:38:49 +0800
  • 7d52e223a8 fix bug mwiacx 2023-09-27 09:36:50 +0800
  • b2134f4dd4 update find_subset_with_target_sum function mwiacx 2023-09-27 09:22:45 +0800
  • 8a63cb51ef modify moe config file Wenwen Qu 2023-09-26 21:01:06 +0800
  • 54af6ba297 merge develop into feature_add_moe Wenwen Qu 2023-09-26 21:00:22 +0800
  • 655e9dae40
    Feat(norm)/support fused precision (#319) Wenwen Qu 2023-09-26 20:39:55 +0800
  • c703938fb3 change ckpt name zaglc 2023-09-26 19:16:16 +0800
  • 83bd11f2b2 fix(context/parallel_context.py): only log warning for fsdp huangting4201 2023-09-26 18:59:54 +0800
  • 69ff9f2f5c pp test li126com 2023-09-26 17:42:16 +0800
  • 96171d5f28 fix bug for loading ckpts when zero1 < dp_size zaglc 2023-09-26 17:36:59 +0800
  • 7df4643c89 update mixed_precision.po Wenwen Qu 2023-09-26 17:09:38 +0800
  • 966aed32b7 add english docs yingtongxiong 2023-09-26 17:01:17 +0800
  • 96b20cd43f
    doc(usage): add dynamic ntk into doc (#367) YWMditto 2023-09-26 16:58:46 +0800
  • 4f40b43b09
    Update mixed_precision.po Wenwen Qu 2023-09-26 16:57:47 +0800
  • 056996f8b3 fix(fsdp_optimizer.py): wait grad async huangting4201 2023-09-26 16:54:29 +0800
  • 9d0c41e85b
    Update mixed_precision.rst Wenwen Qu 2023-09-26 16:45:11 +0800
  • 79adccc0a5 add zh docs for tf32 yingtongxiong 2023-09-26 16:33:36 +0800
  • fbcd509ff9 test pp li126com 2023-09-26 16:11:56 +0800
  • c5a7e76ada fix(train.py): fix ci lint error huangting4201 2023-09-26 15:51:18 +0800
  • 344f543c4c reformat docs Wenwen Qu 2023-09-26 15:49:12 +0800
  • f3f2511e74 feat(solver/optimizer): add new file fsdp_optimizer.py huangting4201 2023-09-26 15:46:47 +0800
  • 7d2d9fc2f0 add doc for En. version Wenwen Qu 2023-09-26 15:09:39 +0800
  • 74815f5f87 add long text generation in doc/usage.md YWMditto 2023-09-26 14:33:10 +0800
  • 1f26b6d88a add long text generation in doc/usage.md YWMditto 2023-09-26 14:30:17 +0800
  • cab875c41e add long text generation in doc/usage.md YWMditto 2023-09-26 14:20:35 +0800
  • 55ae973cf0 update moe config file Wenwen Qu 2023-09-26 12:17:39 +0800
  • 75774e0b5e Merge branch 'feature_add_moe' of https://github.com/blankde/InternLM into feature_add_moe Wenwen Qu 2023-09-26 11:52:52 +0800
  • 3c8fee01b2 add compatible code for old version Wenwen Qu 2023-09-26 11:51:34 +0800
  • 033e646191 feat: add pp test li126com 2023-09-25 20:50:39 +0800
  • c1e30cff2c
    feat(numa): bind numa if possible (#320) jiaxingli 2023-09-25 19:34:52 +0800
  • 95e800e10b add doc for fused precision Wenwen Qu 2023-09-25 19:30:07 +0800
  • 9284303a6d
    doc(monitor): add light monitoring doc (#352) jiaopenglong 2023-09-25 19:28:09 +0800
  • c139a05a94 try_bind_numa should not raise exception 877825076@qq.com 2023-09-25 17:50:02 +0800
  • 847cc819dd
    fix(monitor): add volc and aliyun jobid (#338) jiaopenglong 2023-09-25 17:58:32 +0800
  • 83070d3454
    Merge branch 'develop' into feat/numa jiaxingli 2023-09-25 17:33:49 +0800
  • 064965527b
    fix(config): monitor config key error when args_check is False (#362) jiaopenglong 2023-09-25 17:30:36 +0800
  • 6b7ca1c6b3 fix load ckpt bug2 zaglc 2023-09-25 16:11:50 +0800
  • 5b62a3957a fix load ckpt bug zaglc 2023-09-25 16:08:40 +0800
  • 1986116527 add moe module to `__init__.py` zhanglei 2023-09-25 15:36:35 +0800
  • 26a7397752
    fix(storage): fix try_get_storage_backend (#359) Guoteng 2023-09-25 15:16:25 +0800
  • cbb26d9136 unit tests for fused precision Wenwen Qu 2023-09-25 14:50:54 +0800
  • 0565e17b8d fix monitor config key error when args_check is False JiaoPL 2023-09-25 14:16:53 +0800
  • 64d4159f89 merge develop JiaoPL 2023-09-25 14:10:19 +0800