Commit Graph

10 Commits (fd932cfc09314ba7ec320c0bb415e0689ebf6b54)

Author SHA1 Message Date
zhanglei fd932cfc09 refactor 2023-09-22 15:52:37 +08:00
zhanglei ccdaf8ec45 fix the moe_loss for ci and val 2023-09-22 15:45:36 +08:00
huangting4201 1ed36754df
feat(.github/workflows): update ci e2e tests and add ci unit tests (#324)
* feat(.github/workflows/e2e_test.yaml): update e2e yaml

* feat(.github/workflows/e2e_test.yaml): update e2e yaml

* test e2e

* test e2e

* test e2e

* test e2e

* test e2e

* fix(ci): test ci

* fix(ci): test ci

* fix(ci): test ci

* fix(ci): test ci

* fix(ci): test ci

* fix(ci): add weekly tests

---------

Co-authored-by: huangting4201 <huangting3@sensetime.com>
2023-09-22 14:07:14 +08:00
huangting4201 025ca55dfe
test(tests/test_training): add training e2e tests for loss spike and loss accuracy (#304)
* tests(test_training): add test case for loss accuracy

* tests(test_training): update test cases

* ci(.github/workflows/e2e_test.yaml): remove pull submodule

* ci(.github/workflows/e2e_test.yaml): update ci env and remove useless env var

* test(tests/test_training): add 16 GPUs test cases

* test(tests/test_training): fix training_16GPU_8DP2PP test case error

* test(tests/test_training): add new case for interleaved pp

* test(tests/test_training): remove redundant code

* test(tests/test_training): update ci job timeout minutes to 30m

* feat(initialize/launch.py): check num_chunks and interleaved_overlap

---------

Co-authored-by: huangting4201 <huangting3@sensetime.com>
2023-09-19 14:55:40 +08:00
jiaxingli ab513e1ddd
feat: add optimizer_unitest (#303)
* feat: add optimizer_unitest

* feat: add optimizer test

* feat: add optimizer test

* feat:add optimizer test

* fianl change

* feat:add optimizer test

* feat:add optimizer test

* feat:add optimizer test
2023-09-15 18:56:56 +08:00
jiaxingli 882a07011c
feat: add unitest for model (#300)
* feat: add unitest for model

* feat:add model test
2023-09-14 13:18:34 +08:00
Guoteng 85e39aae67
fix(ckpt): fix snapshot none load error and remove file lock (#298) 2023-09-08 20:41:53 +08:00
Guoteng 37b8c6684e
feat(utils): add timeout warpper for key functions (#286) 2023-09-07 17:26:17 +08:00
Guoteng 8acf823a04
fix(storage): fix and refactor storage api (#281) 2023-09-06 01:15:09 +08:00
Guoteng f6e007f95b
feat(ckpt): fix checkpoint bugs and add feature enhancements. (#259)
* fix(ckpt): ckpt bug fix and api refactor
1. fix latest ckpt query bug
2. add ckpt unit test
3. fix storage manager boto3/local client get_fns bug
4. fix only model load case zero fp32 buffer overwrite model weights bug.
5. add ckpt_type and add zero reload ci-test

* fix(ckpt): fix ckpt and trainer bug

* fix and refactor

* fix base on comment

* feat: add legacy api
2023-09-05 17:40:48 +08:00