InternLM/internlm/initialize
huangting4201 025ca55dfe
test(tests/test_training): add training e2e tests for loss spike and loss accuracy (#304)
* tests(test_training): add test case for loss accuracy

* tests(test_training): update test cases

* ci(.github/workflows/e2e_test.yaml): remove pull submodule

* ci(.github/workflows/e2e_test.yaml): update ci env and remove useless env var

* test(tests/test_training): add 16 GPUs test cases

* test(tests/test_training): fix training_16GPU_8DP2PP test case error

* test(tests/test_training): add new case for interleaved pp

* test(tests/test_training): remove redundant code

* test(tests/test_training): update ci job timeout minutes to 30m

* feat(initialize/launch.py): check num_chunks and interleaved_overlap

---------

Co-authored-by: huangting4201 <huangting3@sensetime.com>
2023-09-19 14:55:40 +08:00
..
legacy feat(ckpt): fix checkpoint bugs and add feature enhancements. (#259) 2023-09-05 17:40:48 +08:00
__init__.py Merge develop to main (#233) 2023-08-24 22:03:04 +08:00
initialize_tensor.py feat(model): implement uniform_init for tensor. (#252) 2023-09-01 01:12:53 +08:00
initialize_trainer.py docs(*): add documentation and reST files for readthedocs (#272) 2023-09-06 15:36:03 +08:00
launch.py test(tests/test_training): add training e2e tests for loss spike and loss accuracy (#304) 2023-09-19 14:55:40 +08:00