Commit Graph

  • c423f1159b add moe_type to model config Wenwen Qu 2024-01-09 15:56:59 +0800
  • dcfdab6aaf refactor code for all2all to support output_splits Wenwen Qu 2024-01-09 15:37:26 +0800
  • 09a2b5ba50 Add meta instruction in chat x54-729 2024-01-09 15:37:07 +0800
  • fe0c342f9d get moe setting from gpc Wenwen Qu 2024-01-09 15:26:13 +0800
  • 91480c5b63
    fix(pipeline): avoid allreduce for dense model (#570) Wenwen Qu 2024-01-09 10:34:22 +0800
  • f5226b5152 refactor code Wenwen Qu 2024-01-08 16:23:53 +0800
  • 41f8283a3e refactor code Wenwen Qu 2024-01-08 16:03:55 +0800
  • c3854f924a refactor code Wenwen Qu 2024-01-08 14:33:19 +0800
  • fdd60691d3 move all2all to utils Wenwen Qu 2024-01-08 13:16:17 +0800
  • 820fa582cf Merge branch 'clean-new-main' into 'new-main' zhangwenwei 2024-01-08 03:43:45 +0000
  • 61aa7354ff [Refactor]: refactor with pure documentations and examples zhangwenwei 2024-01-08 03:43:45 +0000
  • f5e65c0f30
    Update train.py Season 2024-01-06 23:03:19 +0800
  • f2704f7aef re-lint lijiaxing 2024-01-05 17:04:42 +0800
  • 184b5bff39 avoid allreduce when num_expert=1 Wenwen Qu 2024-01-04 15:10:21 +0800
  • 4e6db4af0f avoid allreduce for dense model in pp Wenwen Qu 2024-01-04 13:35:56 +0800
  • 7e2db90df1
    Update web_demo.py XHr 2024-01-04 13:32:50 +0800
  • dd9d69edbf
    Update web_demo.py XHr 2024-01-04 13:11:40 +0800
  • 0b00fd331d
    web XHr 2024-01-04 13:06:44 +0800
  • 07c98c4a39 remove suffix for gate key Wenwen Qu 2024-01-04 10:51:56 +0800
  • 196514d87f refactor code Wenwen Qu 2024-01-03 17:39:37 +0800
  • 695d76eb31 update model x54-729 2024-01-03 15:21:37 +0800
  • 9d400c262a update rope sin&cos x54-729 2024-01-01 19:23:31 +0800
  • 5539f9db50
    fix when resuming lr_scheduler without loading optimizer (#565) v0.2.1dev20240102 Yang Gao 2023-12-29 20:22:39 +0800
  • 220953d7e5
    fix(metrics): remove redundant cuda memory in metric calculations (#557) Guoteng 2023-12-29 20:21:24 +0800
  • f8eaf618af revert worker_init_fn since we move gc disable after dataloader launching zigzagcai 2023-12-29 19:34:38 +0800
  • e9208728cb fix 877825076@qq.com 2023-12-29 16:47:19 +0800
  • c39d758a8a
    feat(logger): add tensorboard key value buffer (#549) Guoteng 2023-12-29 16:23:47 +0800
  • fc60986ed0 fix(utils): fix split cuda memory leak 877825076@qq.com 2023-12-29 16:15:35 +0800
  • 483bd706dd fix when resuming lr_scheduler without loading optimizer gaoyang07 2023-12-29 15:15:16 +0800
  • 1d217eb94e fix single fwd 877825076@qq.com 2023-12-29 13:41:45 +0800
  • aaaf4d7b0e
    fix(chat): fix stream_chat in modeling_internlm(hf) to avoid decode error (#560) djsaber 2023-12-29 13:03:44 +0800
  • 2a9228e91f fix e2etest zigzagcai 2023-12-29 11:01:05 +0800
  • c437ffbfc9 fix 877825076@qq.com 2023-12-28 20:03:17 +0800
  • 456a6953f8 fix 877825076@qq.com 2023-12-28 19:59:44 +0800
  • 83989b57ae fix 877825076@qq.com 2023-12-28 19:44:35 +0800
  • 06ececeb00 fix 877825076@qq.com 2023-12-28 19:34:19 +0800
  • 70e84a21f8 feat: avoid calling item() in fwd/bwd 877825076@qq.com 2023-12-28 19:12:02 +0800
  • e7686e7fb8
    Update modeling_internlm.py djsaber 2023-12-28 13:36:12 +0800
  • acff1a00c9 diag all reduce lijiaxing 2023-12-28 11:06:38 +0800
  • ac7f45232b support sequence parallel Qu Wenwen 2023-12-27 12:03:30 +0800
  • 508711cc97
    Update modeling_internlm.py djsaber 2023-12-26 16:52:06 +0800
  • 97e7d03d09 fixed the issue that the HF model spontaneously conducted multiple rounds of Q&A and stream_chat method generates garbled characters daijun1 2023-12-26 16:32:33 +0800
  • 48e25fd849 fix(metrics): remove redundant cuda memory in metric calculations 877825076@qq.com 2023-12-25 22:28:10 +0800
  • ac7509389b
    fix(tools): set add_eos_token=True in tokenizer.py (#555) x54-729 2023-12-22 21:57:14 +0800
  • 7cbdb6e1f5 update README & unittest x54-729 2023-12-22 21:13:09 +0800
  • 8c40539f6f add bos&eos in tools/tokenizer x54-729 2023-12-22 21:08:26 +0800
  • cb922d44e2
    fix(readme): fix deprecated model path in code examples (#554) Yining Li 2023-12-22 20:56:27 +0800
  • 4de0233a20 fix deprecated model path ly015 2023-12-22 20:46:29 +0800
  • fc1f05c265
    [Doc] update deployment guide based on lmdeploy v0.1.0 (#551) Lyu Han 2023-12-21 11:06:19 +0800
  • 2c55bd5768 redo lijiaxing 2023-12-20 17:23:45 +0800
  • 55c7dd513d redo lijiaxing 2023-12-20 17:22:21 +0800
  • d418eba094
    fix(model): add ckpt_type constraint when loading ckpts (#542) jiaxingli 2023-12-20 16:43:27 +0800
  • a58bf853db
    change into reserved (#550) kkscilife 2023-12-20 14:41:09 +0800
  • 57b07d540b update deployment guide based on lmdeploy v0.1.0 lvhan028 2023-12-20 14:37:25 +0800
  • 2f156d931e change into reserved kkscilife 2023-12-20 14:00:26 +0800
  • 872cbd1479 fix bug lijiaxing 2023-12-19 20:17:29 +0800
  • d3ca22cf3d no overlap for save ckpt lijiaxing 2023-12-19 17:49:49 +0800
  • 9bf24d9768 no overlap for save ckpt lijiaxing 2023-12-19 17:45:55 +0800
  • 662391b211 fix 877825076@qq.com 2023-12-19 17:11:31 +0800
  • 3bc936e00d feat(logger): add tensorboard key value buffer 877825076@qq.com 2023-12-19 16:59:57 +0800
  • df0acdee43 fix pylint zigzagcai 2023-12-19 15:40:18 +0800
  • 78400c21b8 move manual gc before train loop starts zigzagcai 2023-12-19 15:29:55 +0800
  • eae9b97ab2 wgt reformat 877825076@qq.com 2023-12-19 00:02:00 +0800
  • d9c9f7c9ee fix lijiaxing 2023-12-18 21:37:17 +0800
  • de53b17506
    fix token grad norm with tp (#547) jiaopenglong 2023-12-18 18:33:28 +0800
  • 513ebb9c3a
    fix(moe): fix moe zero mode bug (#548) Wenwen Qu 2023-12-18 14:39:42 +0800
  • 35778efff3 update moe config to fit training on 8 GPU Qu Wenwen 2023-12-18 14:02:33 +0800
  • c801336732 fix moe zero mode bugs Qu Wenwen 2023-12-18 13:43:45 +0800
  • 06565951f3 fix token grad norm with tp JiaoPL 2023-12-16 20:50:24 +0800
  • 2afeebe5b0 fix pylint zigzagcai 2023-12-15 19:58:22 +0800
  • 1fea658561 fix pylint zigzagcai 2023-12-15 18:52:54 +0800
  • 55ef29df80 auto save lijiaxing 2023-12-15 18:29:04 +0800
  • 81f51fd0ff auto save lijiaxing 2023-12-15 18:21:17 +0800
  • 2dc8ddd582 fix oom issue in dataloaders zigzagcai 2023-12-15 14:08:40 +0800
  • 51dd3da03e optimize model ckpt and reduce checkpointing overhead zigzagcai 2023-12-15 11:42:04 +0800
  • 68d6abc64a
    doc(readme): update 7b/20b chat model information (#537) Yining Li 2023-12-14 17:46:03 +0800
  • 6fa125b7cf fix readme ly015 2023-12-14 17:41:02 +0800
  • 7d67795f7e add assert lijiaxing 2023-12-14 15:01:04 +0800
  • cb89111010 add valid_pack_mode into example config 877825076@qq.com 2023-12-14 14:43:35 +0800
  • cd91e92bd7 add valid_pack_mode 877825076@qq.com 2023-12-14 14:40:12 +0800
  • 136aa7c5a5 feat(eval): unify evaluation 877825076@qq.com 2023-12-14 14:07:40 +0800
  • 6ad1afd2c4 Merge branch 'develop' of https://github.com/InternLM/InternLM into hf_llama lijiaxing 2023-12-14 12:24:19 +0800
  • f68f34234d overlap_param lijiaxing 2023-12-13 19:02:19 +0800
  • bbb5651582
    fix(model): change model_type `LLAMA` to `LLAMA2` (#539) jiaxingli 2023-12-13 17:24:45 +0800
  • 5fd492655e fix bug lijiaxing 2023-12-13 15:14:36 +0800
  • d7555e8216 Merge branch 'develop' of https://github.com/InternLM/InternLM into hf_llama lijiaxing 2023-12-13 14:54:37 +0800
  • 5ecb6aa712
    fix(pp): fix no-packed dataset load micro batch error (#538) Guoteng 2023-12-13 14:48:32 +0800
  • 7cc343dafc fix based on comment 877825076@qq.com 2023-12-13 12:15:22 +0800
  • 39a2fb5677 fix(pp): fix no-packed dataset load micro batch error 877825076@qq.com 2023-12-13 01:08:49 +0800
  • 430e559364 update 7b evaluation results ly015 2023-12-12 16:39:13 +0800
  • 432bd5ee9f
    fix the bug so that the sequence parallel norm is all-reduced when overlap is False (#534) ytxiong 2023-12-12 16:22:39 +0800
  • d4a81fad5d modifications by pre-commit hook ly015 2023-12-12 16:17:56 +0800
  • 4187bfbfe8 update chat model information in README ly015 2023-12-12 15:41:57 +0800
  • d904730be7
    feat(ckpt): support auto resume in Volc and Ali (#529) jiaxingli 2023-12-12 13:27:24 +0800
  • b5f2a3ead4 update convert script x54-729 2023-12-12 12:23:57 +0800
  • d9262da635 update hf model: add rope config and add qkv x54-729 2023-12-12 12:19:20 +0800
  • d6eeacfeb2 bug lijiaxing 2023-12-12 10:36:04 +0800
  • cc5b15349d
    fix(metric): add metric dtype control (#533) Pryest 2023-12-11 19:36:31 +0800
  • fdce50a000 fix default behavior Pryest 2023-12-11 17:27:09 +0800
  • c7db6db066 fix the bug so that the sequence parallel norm is all-reduced when overlap is False yingtongxiong 2023-12-11 17:36:33 +0800