Commit Graph

5 Commits (4b40fbd7430c85c65a93d5a6adaf22888b91c9dc)

Author SHA1 Message Date
アマデウス 28b515d610
[model checkpoint] updated checkpoint hook (#598) 2022-04-01 16:53:03 +08:00
アマデウス 0fedef4f3c
Layer integration (#83)
* integrated parallel layers for ease of building models

* integrated 2.5d layers

* cleaned codes and unit tests

* added log metric by step hook; updated imagenet benchmark; fixed some bugs

* reworked initialization; cleaned codes

Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com>
2021-12-27 15:04:32 +08:00
Frank Lee 9a0466534c
update markdown docs (english) (#60) 2021-12-10 14:37:33 +08:00
Frank Lee 3defa32aee
Support TP-compatible Torch AMP and Update trainer API (#27)
* Add gradient accumulation, fix lr scheduler

* fix FP16 optimizer and adapted torch amp with tensor parallel (#18)

* fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes

* fixed trainer

* Revert "fixed trainer"

This reverts commit 2e0b0b7699.

* improved consistency between trainer, engine and schedule (#23)

Co-authored-by: 1SAA <c2h214748@gmail.com>

Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: ver217 <lhx0217@gmail.com>
2021-11-18 19:45:06 +08:00
zbian 404ecbdcc6 Migrated project 2021-10-28 18:21:23 +02:00