InternLM

Commit Graph

Select branches

Hide Pull Requests

add_xcomposer_testcase

internlm2-reward

main

new_model_release

#1

#10

#100

#106

#111

#114

#116

#120

#121

#123

#124

#125

#126

#128

#132

#133

#136

#138

#139

#14

#140

#141

#142

#143

#144

#145

#147

#151

#152

#153

#154

#155

#156

#158

#159

#160

#161

#163

#165

#166

#170

#174

#175

#176

#178

#180

#182

#183

#184

#185

#188

#189

#190

#192

#193

#194

#195

#196

#197

#198

#199

#200

#201

#203

#204

#206

#207

#208

#210

#211

#212

#213

#214

#216

#217

#218

#219

#220

#222

#223

#224

#225

#226

#227

#228

#229

#23

#230

#231

#233

#238

#239

#24

#240

#242

#243

#245

#246

#247

#249

#25

#250

#251

#252

#254

#255

#256

#257

#259

#260

#261

#262

#263

#264

#265

#266

#27

#271

#272

#273

#274

#275

#276

#277

#279

#281

#282

#284

#285

#286

#289

#292

#293

#294

#295

#297

#298

#299

#30

#300

#301

#302

#303

#304

#306

#307

#308

#309

#310

#311

#312

#313

#314

#315

#317

#319

#32

#320

#321

#322

#324

#325

#326

#328

#329

#330

#338

#339

#34

#342

#345

#352

#354

#359

#36

#360

#362

#367

#373

#374

#375

#376

#378

#379

#38

#380

#382

#383

#39

#391

#392

#393

#396

#397

#398

#4

#40

#403

#404

#405

#407

#408

#411

#412

#413

#417

#418

#419

#420

#421

#422

#423

#424

#426

#427

#428

#429

#43

#436

#439

#44

#440

#443

#448

#449

#450

#451

#452

#453

#454

#455

#456

#460

#462

#464

#465

#466

#467

#468

#470

#471

#474

#475

#476

#477

#478

#479

#48

#480

#481

#484

#485

#486

#488

#489

#490

#491

#492

#493

#495

#496

#498

#499

#5

#502

#504

#506

#507

#51

#510

#512

#513

#514

#515

#516

#517

#518

#519

#52

#520

#521

#522

#523

#524

#526

#527

#528

#529

#530

#531

#532

#533

#534

#536

#537

#538

#539

#540

#541

#542

#543

#544

#545

#546

#547

#548

#549

#550

#551

#554

#555

#557

#560

#562

#563

#564

#565

#566

#567

#569

#57

#570

#577

#580

#582

#583

#583

#585

#586

#587

#588

#589

#59

#590

#591

#593

#594

#595

#596

#597

#598

#599

#6

#601

#605

#61

#611

#613

#616

#619

#620

#621

#622

#624

#625

#626

#627

#629

#63

#632

#633

#634

#635

#636

#637

#638

#640

#641

#65

#650

#651

#652

#658

#663

#666

#667

#67

#674

#689

#692

#695

#7

#703

#705

#710

#717

#72

#721

#727

#73

#730

#732

#735

#740

#744

#745

#746

#751

#752

#753

#754

#755

#764

#767

#769

#77

#78

#780

#786

#795

#80

#802

#803

#805

#806

#814

#816

#817

#818

#819

#820

#821

#822

#824

#827

#835

#838

#89

#9

#91

#97

#98

#99

v0.1.0

v0.2.0

v0.2.1dev20230901

v0.2.1dev20230908

v0.2.1dev20230909

v0.2.1dev20230915

v0.2.1dev20231121

v0.2.1dev20240102

69e0d63b33

Update passkey_retrieval.py tpoisonooo 2023-10-19 16:30:22 +0800
d2cc5f61e3 feat(tools): add passkey retrieval test tpoisonooo 2023-10-19 16:18:42 +0800
3ea46324dd

fix: unitest (#424) jiaxingli 2023-10-19 15:19:40 +0800
19cddd5060 fix: unitest li126com 2023-10-19 14:41:25 +0800
4742271154 add memory pool yingtongxiong 2023-10-19 13:21:33 +0800
83c47d07d1 fix is_no_pp_or_last_stage logic Wenwen Qu 2023-10-19 10:59:54 +0800
2c5395fdfd

Doc(moe): add documentation for moe training (#411) Wenwen Qu 2023-10-19 10:01:12 +0800
3ea94f2e2a

fix(utils): disable bench_net in gputest.py (#421) Guoteng 2023-10-19 10:00:57 +0800
4b5bdedff2

feat(monitor): send exception to light monitor (#420) jiaopenglong 2023-10-18 21:00:21 +0800
5d0151d7b0 update trainer_result in ci JiaoPL 2023-10-18 19:37:15 +0800
6480e03949 refactor code for assert Wenwen Qu 2023-10-18 19:22:33 +0800
e3aff2c23e update try_import_send_exception JiaoPL 2023-10-18 19:19:55 +0800
30f610b1fa

Test(pp): test pipeline parallel (#413) jiaxingli 2023-10-18 17:53:08 +0800
c0d9063a8d change code comments Wenwen Qu 2023-10-18 17:37:20 +0800
e1db83899b restore moe config file Qu Wenwen 2023-10-18 14:08:48 +0800
3421d1197a Merge 'upstream/develop' into doc/add_moe__doc Qu Wenwen 2023-10-18 14:06:49 +0800
12f897f553 fix interleave type assert bug Wenwen Qu 2023-10-18 13:56:42 +0800
e3d128230b fix(utils): disable bench_net in gputest.py 877825076@qq.com 2023-10-18 12:26:24 +0800
bf6dbf07fa add share embedding weight support for moe Wenwen Qu 2023-10-18 11:39:04 +0800
a5aeab2a3f memory profiling test yingtongxiong 2023-10-17 19:54:21 +0800
aa5e34d815

compatible with old ckpt (#418) Wenwen Qu 2023-10-17 17:25:36 +0800
16ef7b7889 add test yingtongxiong 2023-10-17 17:16:39 +0800
2538a19927 fix InternLMTokenizer to fit transformers==4.34.0 x54-729 2023-10-17 16:54:51 +0800
5abe519c4c remove full weight for block 0 yingtongxiong 2023-10-17 16:37:06 +0800
9f08c95541 compatible with old ckpt Qu Wenwen 2023-10-17 16:22:32 +0800
44f5c51747 send exception to light monitor JiaoPL 2023-10-17 15:49:17 +0800
5c38cb6409 add head overlap yingtongxiong 2023-10-17 15:38:24 +0800
a5c6e457b9 Merge branch 'feat/fstp' of https://github.com/yingtongxiong/InternLM into feat/fstp yingtongxiong 2023-10-17 15:17:03 +0800
6408b944c2 support fine grained yingtongxiong 2023-10-17 15:14:39 +0800
b51cf4ebc3 Merge branch 'feat/fstp' of github.com:yingtongxiong/InternLM into feat/fstp chenxun.p 2023-10-17 15:10:27 +0800
6682f5d92a fix reduce scatter async bug chenxun.p 2023-10-17 15:10:07 +0800
eeef07934a

fix(moe): fix moe compatibility for fsdp and memory profiling (#417) Wenwen Qu 2023-10-17 14:13:48 +0800
666dabd0a8 update moe config Qu Wenwen 2023-10-17 11:36:44 +0800
4e99a7fdbc feat(train/training_internlm.py): remove abnormal tgs when calculating avg tgs huangting4201 2023-10-17 11:30:44 +0800
74d6c71ad9 fix moe compatibility for fsdp and memory profiling Qu Wenwen 2023-10-17 11:26:29 +0800
229cc5c68c impl reduce scatter async chenxun.p 2023-10-17 11:15:54 +0800
d1af0d6aee feat(model/linear.py): block-grained backward huangting4201 2023-10-17 10:13:56 +0800
0d1fa037dd feat(model/linear.py): set block 0 full weight huangting4201 2023-10-16 20:13:59 +0800
6ce78a4e09 fix layer grad_norm with pp JiaoPL 2023-10-16 19:43:30 +0800
82204eea59 support hybrid overlap yingtongxiong 2023-10-16 16:35:14 +0800
7920168179 fix set layer name JiaoPL 2023-10-14 22:45:35 +0800
7d68509c4f set layer name to parameters after init_model JiaoPL 2023-10-14 22:32:10 +0800
37e0c86e5a

fix(init): allow resume_tb_folder is an empty string (#391) Guoteng 2023-10-13 16:46:14 +0800
71a0388b87

feat(storage): support volc oss ckpt saving (#397) jiaxingli 2023-10-13 16:44:29 +0800
646f1b45fa rm debug log JiaoPL 2023-10-13 12:25:46 +0800
f2358b9432 Merge branch 'develop' into feat/layer_grad_norm JiaoPL 2023-10-13 12:12:24 +0800
641ee14bbf update layer norm to tensorboard JiaoPL 2023-10-13 12:07:58 +0800
d0f0c22cac feat(model/linear.py): change pre backward from wqkv to block huangting4201 2023-10-13 11:10:23 +0800
a94f429a67 compute layer norms and replace total_norm with it JiaoPL 2023-10-12 21:25:30 +0800
d0b1346993 feat(model/linear.py): support block allgather overlap huangting4201 2023-10-12 19:42:08 +0800
816ecf8e04 fix moe and zero1 check in args_sanity_check Qu Wenwen 2023-10-12 10:56:59 +0800
93bb5c2760 add doc for moe Qu Wenwen 2023-10-12 10:42:16 +0800
5fd5a8a32b support fine-grained overlap yingtongxiong 2023-10-11 17:36:41 +0800
792b066f15 communication overlap yingtongxiong 2023-10-11 10:57:12 +0800
9a731b6e9b

fix(optimizer/fsdp_optimizer.py): fsdp process empty params group (#408) huangting4201 2023-10-10 20:06:04 +0800
a63d7773db fix(optimizer/fsdp_optimizer.py): fsdp process empty params group huangting4201 2023-10-10 19:59:53 +0800
f6ff8e61c6 Merge remote-tracking branch 'upstream/develop' into develop huangting4201 2023-10-10 19:53:18 +0800
c94be64fd2 merge origin yingtongxiong 2023-10-10 17:13:46 +0800
0fac845c36 overlap grad_input computation and grad_weight reduce_scatter yingtongxiong 2023-10-10 17:06:13 +0800
5fb6d99c11 feat(configs/7B_sft.py): update parallel config comment huangting4201 2023-10-10 11:45:11 +0800
db637542a6 fix lint yingtongxiong 2023-10-09 22:19:21 +0800
dd67ab948d merge develop yingtongxiong 2023-10-09 21:40:02 +0800
1b7935dd98 merge upstream develop yingtongxiong 2023-10-09 21:35:52 +0800
a8dea6313f fix the ci incompatible in config yingtongxiong 2023-10-09 21:33:26 +0800
b3645b0244

fix(model): fix errant inference_forward (#396) Pryest 2023-10-09 21:29:11 +0800
66eba48c9f Fit to flash attention 1.0.5. Pryest 2023-10-09 21:15:40 +0800
b38ba5dad2 Fit to flash attention 1.0.5. Pryest 2023-10-09 21:03:16 +0800
007e58a4af merge upstream develop yingtongxiong 2023-10-09 20:54:26 +0800
a3580acb6c Fit to flash attention 1.0 Pryest 2023-10-09 20:46:17 +0800
a35ce4c888 Fit to flash attention 1.0 Pryest 2023-10-09 20:43:21 +0800
f191853bf4 fix lint yingtongxiong 2023-10-09 20:39:57 +0800
78353e12cf Fix bugs. Pryest 2023-10-09 20:27:03 +0800
29df765f65 refactor code yingtongxiong 2023-10-09 20:23:32 +0800
5d39c332fe restore train.py yingtongxiong 2023-10-09 20:08:49 +0800
ef9e7cc622 modify the config yingtongxiong 2023-10-09 20:05:39 +0800
144731c35c fix evaluation bug in pp yingtongxiong 2023-10-09 20:04:27 +0800
a075153adf

feat(train): add fsdp training option (#293) zaglc 2023-10-09 18:59:31 +0800
45c846f7df feat(configs/7B_sft.py): adapt to old version config huangting4201 2023-10-09 18:28:41 +0800
54e561665e remove useless code for no-pp yingtongxiong 2023-10-09 18:08:15 +0800
0fa1083780 Merge remote-tracking branch 'upstream/develop' into feat/fstp merge upstream develop yingtongxiong 2023-10-09 18:06:57 +0800
949431f228 modify the config yingtongxiong 2023-10-09 18:06:22 +0800
21c1a7fa47 support evaluation with fstp yingtongxiong 2023-10-09 18:01:06 +0800
582ee000bd

feat(moe):support zero for expert local dp (#404) Wenwen Qu 2023-10-09 17:45:26 +0800
edd7f9e8e1 feat(configs/7B_sft.py): move fsdp config to parallel zero1 huangting4201 2023-10-09 17:39:52 +0800
189a313da6 support fstp and refactor code yingtongxiong 2023-10-09 17:26:20 +0800
e8fcbb1ad5 fix above codes: *treat optim.zero_world_size and optim.zero_local_rank as list in model_checkpoint.py and test_model_checkpoint.py *add overlap and zero check for moe in args_sanity_check(.) Qu Wenwen 2023-10-09 16:05:43 +0800
67fad5c894 feat: support volc oss li126com 2023-10-09 14:52:14 +0800
bd809a61f2 fix(internlm/model): reset dropout_selective_checkpoint=True huangting4201 2023-10-09 14:47:10 +0800
916647c0a1

fix(pipeline): fix bugs for pipeline when enable mixed precision (#382) Wenwen Qu 2023-10-09 14:01:15 +0800
9aef11e89c

make seed in different tensor rank different (#405) ytxiong 2023-10-09 13:53:52 +0800
c69481daef merge upstream develop yingtongxiong 2023-10-09 13:32:41 +0800
0e26f52a89 make seed in different tensor rank different yingtongxiong 2023-10-09 13:26:08 +0800
856f88e97b move optim.dtype to each param group Qu Wenwen 2023-10-09 12:39:03 +0800
c018e9216f support zero for expert local dp Qu Wenwen 2023-10-09 11:20:01 +0800
4ebe6715bd restore logic for empty fp32 group Qu Wenwen 2023-10-09 11:20:01 +0800
5bca32e4dc fix(internlm/train/training_internlm.py): update wrap class and fix lint error huangting4201 2023-10-09 11:11:04 +0800
b444264e89 doc:gpu num li126com 2023-10-08 20:46:13 +0800
2e94870967 fix(internlm/train/training_internlm.py): remove set IS_TENSOR_PARALLEL attr huangting4201 2023-10-08 20:22:40 +0800
1b71b19e23 fix(internlm/utils/parallel.py): fix circular import huangting4201 2023-10-08 17:23:29 +0800
bd4af3a31f modify the all2all yingtongxiong 2023-10-08 17:21:17 +0800