Commit Graph

227 Commits (internlm2-reward)

Author SHA1 Message Date
liukuikun d163169143
[Docs] chat format (#595)
* [Docs] chat format

* Update chat_format.md
2024-01-17 12:22:09 +08:00
Songyang Zhang 3ebe24d92c
[Doc] Update Evaluation (#588)
* [Doc] Update Evaluation

* Update performance

* Update performance

* Update README.md

* Update README_zh-CN.md

---------

Co-authored-by: Kai Chen <chenkaidev@gmail.com>
2024-01-17 12:20:42 +08:00
Range King 13cd9d9b21
[Docs] Fix typos in README (#594) 2024-01-17 11:41:08 +08:00
Wenwei Zhang 468982bc76
[Doc]: Resolve comments in documentation (#587)
* fix typos and try pass lint

* fix wrong path in CI

* fix wrong path in readme

* update lint doc

* update doc

* update doc
2024-01-17 10:47:06 +08:00
Lyu Han c40b34798c
[Doc]: update deployment guide (#591) 2024-01-17 10:32:43 +08:00
fly2tomato 2ae6225891
[Doc]: add openaoe docs (#586)
* add openaoe docs

add openaoe docs

* Update openaoe.md

* Update openaoe_zh_cn.md

* Update openaoe.md
2024-01-17 10:25:28 +08:00
vansin 830944d061
Update README.md (#589) 2024-01-17 10:18:37 +08:00
Wenwei Zhang dbec726c62
Update main branch and docs (#585)
* [Refactor]: refactor with pure documentations and examples

* update model information

* update model information

* Check-in lmdeploy user guide

* Update chat format doc

* update cn doc

* clean doc
2024-01-17 09:46:11 +08:00
djsaber aaaf4d7b0e
fix(chat): fix stream_chat in modeling_internlm(hf) to avoid decode error (#560)
* fixed the issue that the HF model spontaneously conducted multiple rounds of Q&A and stream_chat method generates garbled characters

Signed-off-by: daijun1 <daijun1@eccom.com.cn>

* Update modeling_internlm.py

fixed the issue that the HF model spontaneously conducted multiple rounds of Q&A and stream_chat method generates garbled characters

* Update modeling_internlm.py

Correct spelling mistakes: chche -> cache

---------

Signed-off-by: daijun1 <daijun1@eccom.com.cn>
Co-authored-by: daijun1 <daijun1@eccom.com.cn>
2023-12-29 13:03:44 +08:00
x54-729 ac7509389b
fix(tools): set add_eos_token=True in tokenizer.py (#555) 2023-12-22 21:57:14 +08:00
Yining Li cb922d44e2
fix(readme): fix deprecated model path in code examples (#554) 2023-12-22 20:56:27 +08:00
Lyu Han fc1f05c265
[Doc] update deployment guide based on lmdeploy v0.1.0 (#551) 2023-12-21 11:06:19 +08:00
Yining Li 68d6abc64a
doc(readme): update 7b/20b chat model information (#537)
* update chat model information in README

* modifications by pre-commit hook

* update 7b evaluation results

* fix readme
2023-12-14 17:46:03 +08:00
Pryest 3028f07cb7
fix(readme): update README with original weight download link (#460)
* update README with original weight download link.

* add extra info for original model weights.

* edit typo and polish
2023-11-01 14:49:48 +08:00
vansin 2c6cfde332
fix(web_demo): remove <eoh> in user prompt (#440)
delete <eoh>
2023-10-27 22:44:30 +08:00
ytxiong 42e5f6f8a9
fix(optimizer):broadcast main (#452)
* add broadcast synchronize

* add synchronize
2023-10-26 17:54:48 +08:00
ytxiong f653e5af01
add broadcast synchronize (#451) 2023-10-26 17:38:51 +08:00
x54-729 7b1b892084
fix(tools): fix InternLMTokenizer to fit transformers==4.34.0 2023-10-23 18:35:30 +08:00
Shuo Zhang e611817442
fix(doc): add 20b releasing info to readme (#330)
* fix(eval): StreamingDataset does not have an __len__ method.

* doc(readme): update readme

* update readme

* update readme

* update readme

* update readme

* update readme

* update readme
2023-09-20 16:46:45 +08:00
Shuo Zhang 5e5d160685
fix(readme): fix readme about 20B releasing (#329)
* fix(eval): StreamingDataset does not have an __len__ method.

* doc(readme): update readme

* update readme

* update readme

* update readme

* update readme

* update readme
2023-09-20 16:26:43 +08:00
Shuo Zhang 2a09ebd5c1
doc(readme): update readme, add 20B releasing info (#328)
* fix(eval): StreamingDataset does not have an __len__ method.

* doc(readme): update readme

* update readme
2023-09-20 16:04:43 +08:00
kkscilife bfefc4ea3c
test(ci_scripts): move ci env (#317)
* change partition and runner label

* change rm action to mv

* use spot

* use rsync to move test files

* remove *

* remove *

* change into llm_s partition

---------

Co-authored-by: wangmengke <wangmengke@pjlab.org.cn>
2023-09-19 14:52:32 +08:00
huangting4201 2710fa7343
Merge develop to main (#314)
* feat: add unitest for model (#300)

* feat: add unitest for model

* feat:add model test

* Merge main to develop (#309)

* fix(chat): fix stream_chat to return generator (#123)

* fix(configs/7B_sft.py): model dtype float16 to bfloat16 (#302)

* fix(convert2hf.py): fix the rotary_emb.inv_freq KeyError (#299)

---------

Co-authored-by: yingtongxiong <974106207@qq.com>
Co-authored-by: zhjunqin <zhjunqin@users.noreply.github.com>
Co-authored-by: jiangtann <39088437+jiangtann@users.noreply.github.com>

* docs(doc/code-docs): add figure for training docs (#307)

* add training image for docs

* docs(doc/code-docs): add training img for en doc

* docs(doc/code-docs): fix en docs for initialize

* docs(doc/code-docs): update conf file for readthedocs

* docs(doc/code-docs): fix typos

* docs(doc/code-docs): fix typos for reathedocs

* docs(doc/code-docs): minor typo fix for reathedocs

* docs(doc/code-docs): fix readthedocs conf file

* docs(doc/code-docs): update training image

* docs(doc/code-docs): fix typos

* docs(doc/code-docs): update training image

* docs(doc/code-docs): move training image to section initialize

* docs(doc/code-docs): fix lint

* add badge about reathedocs status

* Merge main to develop (#312)

* fix(chat): fix stream_chat to return generator (#123)

* fix(configs/7B_sft.py): model dtype float16 to bfloat16 (#302)

* fix(convert2hf.py): fix the rotary_emb.inv_freq KeyError (#299)

* docs(doc/code-docs): update quickstart usage (#301)

* docs(usage.md): update usage.md

* docs(doc/code-docs): update en usage

---------

Co-authored-by: huangting4201 <huangting3@sensetime.com>

* docs(doc/code-docs): update en usage

---------

Co-authored-by: yingtongxiong <974106207@qq.com>
Co-authored-by: zhjunqin <zhjunqin@users.noreply.github.com>
Co-authored-by: jiangtann <39088437+jiangtann@users.noreply.github.com>
Co-authored-by: huangting4201 <huangting3@sensetime.com>

* feat: more tgs (#310)

* feat:more tgs

* feat:add more tgs

* feat:more tgs

* feat: add optimizer_unitest (#303)

* feat: add optimizer_unitest

* feat: add optimizer test

* feat: add optimizer test

* feat:add optimizer test

* fianl change

* feat:add optimizer test

* feat:add optimizer test

* feat:add optimizer test

---------

Co-authored-by: jiaxingli <43110891+li126com@users.noreply.github.com>
Co-authored-by: yingtongxiong <974106207@qq.com>
Co-authored-by: zhjunqin <zhjunqin@users.noreply.github.com>
Co-authored-by: jiangtann <39088437+jiangtann@users.noreply.github.com>
Co-authored-by: Season <caizheng@pjlab.org.cn>
Co-authored-by: huangting4201 <huangting3@sensetime.com>
2023-09-15 19:12:38 +08:00
huangting4201 42802a2b31
docs(doc/code-docs): update quickstart usage (#301)
* docs(usage.md): update usage.md

* docs(doc/code-docs): update en usage

---------

Co-authored-by: huangting4201 <huangting3@sensetime.com>
2023-09-15 15:29:58 +08:00
jiangtann 09e71cebf3
fix(convert2hf.py): fix the rotary_emb.inv_freq KeyError (#299) 2023-09-11 20:17:11 +08:00
huangting4201 e354410bd2
fix(configs/7B_sft.py): model dtype float16 to bfloat16 (#302) 2023-09-11 20:06:22 +08:00
zhjunqin 8420115b5e
fix(chat): fix stream_chat to return generator (#123) 2023-09-10 23:46:45 +08:00
yingtongxiong 2ec20707d0 Merge remote-tracking branch 'origin/develop' 2023-09-08 20:42:55 +08:00
Guoteng 85e39aae67
fix(ckpt): fix snapshot none load error and remove file lock (#298) 2023-09-08 20:41:53 +08:00
yingtongxiong 9481df976f Merge remote-tracking branch 'origin/develop' 2023-09-08 17:58:04 +08:00
Sun Peng 1ee31ff9b1
feat: add runtime diag (#297)
* feat: add runtime diag

* add diag_outlier_ratio

---------

Co-authored-by: yingtongxiong <974106207@qq.com>
2023-09-08 17:56:46 +08:00
Season 06807a6fd5
docs(doc/code-docs): refine profiler docs (#295)
* add detailed profiler guide

* added torch profiler detailed docs

* add english docs for profiler page

* docs(code-docs/source/profiler.rst): resize profiler trace image

* docs(code-docs/source/profiler.rst): fix typo

* docs(doc/imgs/torch_profiler_trace.png): update trace image
2023-09-08 16:58:36 +08:00
Sun Peng 0423426c4c
fix: fix the bug to do bcast in a stream (#294)
* fix: fix the bug to do bcast in a stream

* fix: fix the bug to do bcast in a stream

---------

Co-authored-by: yingtongxiong <974106207@qq.com>
2023-09-08 13:53:40 +08:00
yingtongxiong 0c276d8de2 Merge remote-tracking branch 'origin/main' into develop 2023-09-08 10:19:54 +08:00
Sun Peng b7a8af8133
Feat/sync grad use async op (#277)
* fix/brocast should not in commu stream

* fix/brocast should not in commu stream

* feat: support allreduce grad using async op

* fix bug of async op

* use reduceop.avg

* use torch flat

* delete unused stream

* delete unused stream

* feat: overap allreduce with memcapy

---------

Co-authored-by: yingtongxiong <974106207@qq.com>
2023-09-07 21:51:30 +08:00
jiaopenglong 7c99e01ca7
fix(monitor): add alert switch and refactor monitor config (#285)
* add monitor switch

* add switch to light monitor

* fix alert_address is empty

* fix light monitor heartbeat

* init light_monitor on rank_log only

* add comments to the monitoring config

* optimize config
2023-09-07 21:49:05 +08:00
Guoteng 37b8c6684e
feat(utils): add timeout warpper for key functions (#286) 2023-09-07 17:26:17 +08:00
huangting4201 671c752de6
docs(doc/code-docs): support zh cn readthedocs (#289)
* feat(code-docs): test auto doc

* feat(code-docs): test auto doc

* feat(code-docs): test auto doc

* feat(code-docs): test auto doc

* docs(doc/code-docs): add zh_CN structure

* docs(doc/code-docs): test install.md

* docs(doc/code-docs): source file to zh

* docs(doc/code-docs): update source files

* docs(doc/code-docs): add locales en

* docs(doc/code-docs): add locales en install

* docs(doc/code-docs): add locales en example

* docs(doc/code-docs): update en checkpoint

* add en translation for parallel.rst docs

* add en translation for profiler.po docs

* docs(doc/code-docs): update en monitor

* add en translation for monuitor, qa, training docs

* add en translation for quickstart docs

* docs(doc/code-docs): update monitor.po and usage.po

* docs(doc/code-docs): fix typos

* docs(doc/code-docs): update en parallel

* docs(doc/code-docs): update en parallel

* docs(doc/code-docs): update en usage

* docs(doc/code-docs): update en profilier

* docs(doc/code-docs): update en initialize

* docs(doc/code-docs): update en initialize

* docs(doc/code-docs): update en initialize

* docs(doc/code-docs): update en initialize

---------

Co-authored-by: zigzagcai <caizheng@pjlab.org.cn>
2023-09-07 16:11:08 +08:00
Season b6d909d43e
docs(*): add documentation and reST files for readthedocs (#272)
* add initial reST files for readthedocs

* fix typos

* docs refine and minor fix

* add references for parallel training section

* fix reST format

* fix reST format

* fix reST format

* add comments for trainer API

* add link to step-by-step quickstart guide

* docs(code-docs/source/parallel.rst): add paper link url

* docs(code-docs/source/parallel.rst): add paper link url

* use MyST to render markdown

* docs(code-docs/source/initialize.rst): update model init

* add requirements for myst-parser

* reuse install and usage markdown

* docs(code-docs/source/index.rst): add example and q&a

* docs(doc/code-docs/*): docs refine

* docs(code-docs/source/parallel.rst): update docs for zero config

* docs(code-docs/source/example.rst): fix typos for example.rst

* docs(code-docs/source/example.rst): refine docs

* docs(code-docs/source/example): update example

* docs(code-docs/source/example): delete useless example

* docs(code-docs/source/*): fix image display issue

* docs(code-docs/source/parallel.rst): add docs for communication overlap

* docs(code-docs/source/conf.py): update conf.py

* docs(code-docs/source/example): update example 30B demo

* docs(code-docs/source/parallel.rst): update pipeline parallel

* docs(code-docs/source/parallel.rst): update pipeline parallel

* docs(code-docs/source/parallel.rst): update pipeline parallel

* docs(code-docs/source/parallel.rst): update pipeline parallel

* docs(code-docs/source/parallel.rst): update ZeRO1.5

* docs(code-docs/source/parallel.rst): update ZeRO1.5

* docs(code-docs/source): fix word spelling error

---------

Co-authored-by: huangting4201 <huangting3@sensetime.com>
2023-09-06 15:36:03 +08:00
Wenwen Qu 7f687bf4b3
fix(core/context): use dummy mode to generate random numbers in model construction (#266)
* change mode to dummy in model construction and restore to data when done

* add comments

* move set_mode(.DATA) to initialize_model(.)
2023-09-06 14:34:11 +08:00
Guoteng ff181bc5f8
fix(ckpt): fix checkpoint reload bug (#282)
1. fix only_load tuple convert bug.
2. fix reload_zero_fp32_buff copy bug
2023-09-06 04:05:04 +08:00
Guoteng 8acf823a04
fix(storage): fix and refactor storage api (#281) 2023-09-06 01:15:09 +08:00
jiaopenglong 8d8d811e10
feat(monitor): add light monitor (#275)
* add light monitor

* filter key of metrics dict

* test no light_monitor case

* mv init_light_monitor to initialize_distributed_env
2023-09-05 19:24:01 +08:00
ytxiong 9445faf5be
fix(model): set tensor parallel attribute for mlp (#271)
* set is_tensor_parallel attribute for mlp

* fix lint
2023-09-05 19:03:02 +08:00
yingtongxiong 0fb8d4141f Merge remote-tracking branch 'origin/main' into develop 2023-09-05 17:50:35 +08:00
Sun Peng 7f61505fa0
fix/broadcast should not in commu stream (#276)
* fix/brocast should not in commu stream

* fix/brocast should not in commu stream

---------

Co-authored-by: yingtongxiong <974106207@qq.com>
2023-09-05 17:47:50 +08:00
yingtongxiong 3f07d414e7 Merge branch 'develop' of github.com:InternLM/InternLM into develop 2023-09-05 17:46:27 +08:00
yingtongxiong 0e62d41137 Merge branch 'main' into develop 2023-09-05 17:45:26 +08:00
Guoteng f6e007f95b
feat(ckpt): fix checkpoint bugs and add feature enhancements. (#259)
* fix(ckpt): ckpt bug fix and api refactor
1. fix latest ckpt query bug
2. add ckpt unit test
3. fix storage manager boto3/local client get_fns bug
4. fix only model load case zero fp32 buffer overwrite model weights bug.
5. add ckpt_type and add zero reload ci-test

* fix(ckpt): fix ckpt and trainer bug

* fix and refactor

* fix base on comment

* feat: add legacy api
2023-09-05 17:40:48 +08:00
Shuo Zhang 5238f15e2d
fix(eval): no need to check length of valid_dl when using streaming dataset (#274)
* fix(eval): StreamingDataset does not have an __len__ method.

* fix(eval): StreamingDataset has no len method
2023-09-04 23:14:07 +08:00