Commit Graph

17 Commits (839847b7d78bce6af5dfe58d27b5ce2c74a3619b)

Author SHA1 Message Date
digger yu a9d1cadc49
fix typo with colossalai/trainer utils zero (#3908) 2023-06-07 16:08:37 +08:00
Jiarui Fang 7e24b9b9ee
[Gemini] clean no used MemTraceOp (#1970) 2022-11-17 13:41:54 +08:00
Jiarui Fang 4165eabb1e
[hotfix] remove potiential circle import (#1307)
* make it faster

* [hotfix] remove circle import
2022-07-14 13:44:26 +08:00
ver217 232142f402
[utils] refactor profiler (#837)
* add model data profiler

* add a subclass of torch.profiler.profile

* refactor folder structure

* remove redundant codes

* polish code

* use GeminiMemoryManager

* fix import path

* fix stm profiler ext

* polish comments

* remove useless file
2022-04-24 17:03:59 +08:00
ver217 369a288bf3
polish utils docstring (#620) 2022-04-01 16:36:47 +08:00
Liang Bowen 2c45efc398
html refactor (#555) 2022-03-31 11:36:56 +08:00
Jie Zhu 73d36618a6
[profiler] add MemProfiler (#356)
* add memory trainer hook

* fix bug

* add memory trainer hook

* fix import bug

* fix import bug

* add trainer hook

* fix #370 git log bug

* modify `to_tensorboard` function to support better output

* remove useless output

* change the name of `MemProfiler`

* complete memory profiler

* replace error with warning

* finish trainer hook

* modify interface of MemProfiler

* modify `__init__.py` in profiler

* remove unnecessary pass statement

* add usage to doc string

* add usage to trainer hook

* new location to store temp data file
2022-03-29 12:48:34 +08:00
1SAA 907ac4a2dc fixed error when no collective communication in CommProfiler 2022-03-14 17:21:00 +08:00
HELSON dfd0363f68
polished output format for communication profiler and pcie profiler (#404)
fixed typing error
2022-03-14 16:07:45 +08:00
HELSON 7c079d9c33
[hotfix] fixed bugs in ShardStrategy and PcieProfiler (#394) 2022-03-11 18:12:46 +08:00
HELSON 8c18eb0998 [profiler] Fixed bugs in CommProfiler and PcieProfiler (#377) 2022-03-11 15:50:28 +08:00
HELSON 1ed7c24c02 Added PCIE profiler to dectect data transmission (#373) 2022-03-11 15:50:28 +08:00
HELSON 534e0bb118 Fixed import bug for no-tensorboard environment (#354) 2022-03-11 15:50:28 +08:00
HELSON c57e089824 [profile] added example for ProfilerContext (#349) 2022-03-11 15:50:28 +08:00
HELSON 425bb0df3f Added Profiler Context to manage all profilers (#340) 2022-03-11 15:50:28 +08:00
HELSON 4f26fabe4f fixed strings in profiler outputs (#325) 2022-03-11 15:50:28 +08:00
1SAA 73bff11288 Added profiler communication operations
Fixed bug for learning rate scheduler
2022-03-11 15:50:28 +08:00