アマデウス
|
acae68eb04
|
[model checkpoint] updated checkpoint save/load utils (#592)
|
3 years ago |
ver217
|
369a288bf3
|
polish utils docstring (#620)
|
3 years ago |
LuGY
|
02b187c14f
|
[zero] add sampling time for memstats collector (#610)
|
3 years ago |
アマデウス
|
54e688b623
|
moved ensure_path_exists to utils.common (#591)
|
3 years ago |
Jiarui Fang
|
e956d93ac2
|
[refactor] memory utils (#577)
|
3 years ago |
HELSON
|
e6d50ec107
|
[zero] adapt zero for unsharded parameters (#561)
* support existing sharded and unsharded parameters in zero
* add unitest for moe-zero model init
* polish moe gradient handler
|
3 years ago |
ver217
|
7c6c427db1
|
[zero] trace states of fp16/32 grad and fp32 param (#571)
|
3 years ago |
Jiarui Fang
|
7675366fce
|
[polish] rename col_attr -> colo_attr (#558)
|
3 years ago |
Liang Bowen
|
2c45efc398
|
html refactor (#555)
|
3 years ago |
Jiarui Fang
|
d1211148a7
|
[utils] update colo tensor moving APIs (#553)
|
3 years ago |
Jiarui Fang
|
107b99ddb1
|
[zero] dump memory stats for sharded model (#548)
|
3 years ago |
Liang Bowen
|
ec5086c49c
|
Refactored docstring to google style
|
3 years ago |
Jiarui Fang
|
53b1b6e340
|
[zero] non model data tracing (#545)
|
3 years ago |
Jie Zhu
|
73d36618a6
|
[profiler] add MemProfiler (#356)
* add memory trainer hook
* fix bug
* add memory trainer hook
* fix import bug
* fix import bug
* add trainer hook
* fix #370 git log bug
* modify `to_tensorboard` function to support better output
* remove useless output
* change the name of `MemProfiler`
* complete memory profiler
* replace error with warning
* finish trainer hook
* modify interface of MemProfiler
* modify `__init__.py` in profiler
* remove unnecessary pass statement
* add usage to doc string
* add usage to trainer hook
* new location to store temp data file
|
3 years ago |
Jiarui Fang
|
c11ff81b15
|
[zero] get memory usage of sharded optim v2. (#542)
|
3 years ago |
Jiarui Fang
|
705f56107c
|
[zero] refactor model data tracing (#537)
|
3 years ago |
Jiarui Fang
|
05e33b2578
|
[zero] fix grad offload (#528)
* [zero] fix grad offload
* polish code
|
3 years ago |
Jiarui Fang
|
8d8c5407c0
|
[zero] refactor model data tracing (#522)
|
3 years ago |
Jiarui Fang
|
920c5889a7
|
[zero] add colo move inline (#521)
|
3 years ago |
Jiarui Fang
|
0bebda6ea5
|
[zero] fix init device bug in zero init context unittest (#516)
|
3 years ago |
Jiarui Fang
|
7ef3507ace
|
[zero] show model data cuda memory usage after zero context init. (#515)
|
3 years ago |
Jiarui Fang
|
9330be0f3c
|
[memory] set cuda mem frac (#506)
|
3 years ago |
Jiarui Fang
|
0035b7be07
|
[memory] add model data tensor moving api (#503)
|
3 years ago |
Jiarui Fang
|
a445e118cf
|
[polish] polish singleton and global context (#500)
|
3 years ago |
HELSON
|
f24b5ed201
|
[MOE] remove old MoE legacy (#493)
|
3 years ago |
Jiarui Fang
|
b334822163
|
[zero] polish sharded param name (#484)
* [zero] polish sharded param name
* polish code
* polish
* polish code
* polish
* polsih
* polish
|
3 years ago |
Jiarui Fang
|
65c0f380c2
|
[format] polish name format for MOE (#481)
|
3 years ago |
HELSON
|
7544347145
|
[MOE] add unitest for MOE experts layout, gradient handler and kernel (#469)
|
3 years ago |
HELSON
|
aff9d354f7
|
[MOE] polish moe_env (#467)
|
3 years ago |
HELSON
|
84fd7c1d4d
|
add moe context, moe utilities and refactor gradient handler (#455)
|
3 years ago |
Frank Lee
|
b72b8445c6
|
optimized context test time consumption (#446)
|
3 years ago |
Jiarui Fang
|
496cbb0760
|
[hotfix] fix initialize bug with zero (#442)
|
3 years ago |
Frank Lee
|
b03b3ae99c
|
fixed mem monitor device (#433)
fixed mem monitor device
|
3 years ago |
Jiarui Fang
|
adebb3e041
|
[zero] cuda margin space for OS (#418)
|
3 years ago |
Jiarui Fang
|
56bb412e72
|
[polish] use GLOBAL_MODEL_DATA_TRACER (#417)
|
3 years ago |
Jiarui Fang
|
21dc54e019
|
[zero] memtracer to record cuda memory usage of model data and overall system (#395)
|
3 years ago |
LuGY
|
a9c27be42e
|
Added tensor detector (#393)
* Added tensor detector
* Added the - states
* Allowed change include_cpu when detect()
|
3 years ago |
1SAA
|
907ac4a2dc
|
fixed error when no collective communication in CommProfiler
|
3 years ago |
HELSON
|
dfd0363f68
|
polished output format for communication profiler and pcie profiler (#404)
fixed typing error
|
3 years ago |
HELSON
|
7c079d9c33
|
[hotfix] fixed bugs in ShardStrategy and PcieProfiler (#394)
|
3 years ago |
LuGY
|
de46450461
|
Added activation offload (#331)
* Added activation offload
* Fixed the import bug, used the pytest
|
3 years ago |
HELSON
|
8c18eb0998
|
[profiler] Fixed bugs in CommProfiler and PcieProfiler (#377)
|
3 years ago |
Jiarui Fang
|
b5f43acee3
|
[zero] find miss code (#378)
|
3 years ago |
HELSON
|
1ed7c24c02
|
Added PCIE profiler to dectect data transmission (#373)
|
3 years ago |
jiaruifang
|
d9217e1960
|
Revert "[zero] bucketized tensor cpu gpu copy (#368)"
This reverts commit bef05489b6 .
|
3 years ago |
Jiarui Fang
|
00670c870e
|
[zero] bucketized tensor cpu gpu copy (#368)
|
3 years ago |
Jiarui Fang
|
ea2872073f
|
[zero] global model data memory tracer (#360)
|
3 years ago |
HELSON
|
534e0bb118
|
Fixed import bug for no-tensorboard environment (#354)
|
3 years ago |
HELSON
|
c57e089824
|
[profile] added example for ProfilerContext (#349)
|
3 years ago |
Jiarui Fang
|
10e2826426
|
move async memory to an individual directory (#345)
|
3 years ago |