* add fused precision support for norm
* refactor code
* refactor code
* change the granularity of hook
* fix bugs if self.model is ModuleList
* add dtype condition for post hook
* refactor code for split group
* refactor code for pre/post hook
* refactor code for split group
* remove fp32 hook for norm
* unit tests for fused precision
* add doc for fused precision
* add doc for En. version
* reformat docs
* Update mixed_precision.rst
* Update mixed_precision.po
* update mixed_precision.po
* fix(storage): fix try_get_storage_backend
* fix typo and print infos only in log rank
* fix typo and print infos only in log rank
---------
Co-authored-by: gaoyang07 <Gary1546308416AL@gmail.com>
* feat(.github/workflows/e2e_test.yaml): update e2e yaml
* feat(.github/workflows/e2e_test.yaml): update e2e yaml
* test e2e
* test e2e
* test e2e
* test e2e
* test e2e
* fix(ci): test ci
* fix(ci): test ci
* fix(ci): test ci
* fix(ci): test ci
* fix(ci): test ci
* fix(ci): add weekly tests
---------
Co-authored-by: huangting4201 <huangting3@sensetime.com>
* tests(test_training): add test case for loss accuracy
* tests(test_training): update test cases
* ci(.github/workflows/e2e_test.yaml): remove pull submodule
* ci(.github/workflows/e2e_test.yaml): update ci env and remove useless env var
* test(tests/test_training): add 16 GPUs test cases
* test(tests/test_training): fix training_16GPU_8DP2PP test case error
* test(tests/test_training): add new case for interleaved pp
* test(tests/test_training): remove redundant code
* test(tests/test_training): update ci job timeout minutes to 30m
* feat(initialize/launch.py): check num_chunks and interleaved_overlap
---------
Co-authored-by: huangting4201 <huangting3@sensetime.com>
* fix(ckpt): ckpt bug fix and api refactor
1. fix latest ckpt query bug
2. add ckpt unit test
3. fix storage manager boto3/local client get_fns bug
4. fix only model load case zero fp32 buffer overwrite model weights bug.
5. add ckpt_type and add zero reload ci-test
* fix(ckpt): fix ckpt and trainer bug
* fix and refactor
* fix base on comment
* feat: add legacy api