Commit Graph

3 Commits (15ff4133624605e417a3f6297c65fd2de57137fd)

Author SHA1 Message Date
jiaopenglong 9284303a6d
doc(monitor): add light monitoring doc (#352)
* add light monitoring doc

* update light monitoring doc

* update light monitoring doc

* update light monitoring doc

* update light monitoring doc continue

* update light monitoring doc continue

* update monitor config doc

* update monitor config doc continue

* update monitor config doc continue
2023-09-25 19:28:09 +08:00
jiaopenglong 8d8d811e10
feat(monitor): add light monitor (#275)
* add light monitor

* filter key of metrics dict

* test no light_monitor case

* mv init_light_monitor to initialize_distributed_env
2023-09-05 19:24:01 +08:00
huangting4201 ff0fa7659f
feat(monitor): support monitor and alert (#175)
* feat(monitor): support monitor and alert

* feat(monitor.py): fix demo error

* feat(monitor.py): move cmd monitor args to config file

* feat(hybrid_zero_optim.py): if overflow occurs send alert msg

* feat(monitor.py): remove alert msg filter

* feat(monitor.py): optimize class MonitorTracker

* feat(monitor.py): optimize code

* feat(monitor.py): optimize code

* feat(monitor.py): optimize code

* feat(monitor.py): optimize code

* feat(train.py): update print to log

* style(ci): fix lint error

* fix(utils/evaluation.py): remove useless code

* fix(model/modeling_internlm.py): fix lint error

---------

Co-authored-by: huangting4201 <huangting3@sensetime.com>
2023-08-08 11:18:15 +08:00