Jiarui Fang
107b99ddb1
[zero] dump memory stats for sharded model ( #548 )
3 years ago
Liang Bowen
ec5086c49c
Refactored docstring to google style
3 years ago
Jiarui Fang
53b1b6e340
[zero] non model data tracing ( #545 )
3 years ago
Jie Zhu
73d36618a6
[profiler] add MemProfiler ( #356 )
...
* add memory trainer hook
* fix bug
* add memory trainer hook
* fix import bug
* fix import bug
* add trainer hook
* fix #370 git log bug
* modify `to_tensorboard` function to support better output
* remove useless output
* change the name of `MemProfiler`
* complete memory profiler
* replace error with warning
* finish trainer hook
* modify interface of MemProfiler
* modify `__init__.py` in profiler
* remove unnecessary pass statement
* add usage to doc string
* add usage to trainer hook
* new location to store temp data file
3 years ago
Jiarui Fang
c11ff81b15
[zero] get memory usage of sharded optim v2. ( #542 )
3 years ago
Jiarui Fang
705f56107c
[zero] refactor model data tracing ( #537 )
3 years ago
Jiarui Fang
05e33b2578
[zero] fix grad offload ( #528 )
...
* [zero] fix grad offload
* polish code
3 years ago
Jiarui Fang
8d8c5407c0
[zero] refactor model data tracing ( #522 )
3 years ago
Jiarui Fang
920c5889a7
[zero] add colo move inline ( #521 )
3 years ago
Jiarui Fang
0bebda6ea5
[zero] fix init device bug in zero init context unittest ( #516 )
3 years ago
Jiarui Fang
7ef3507ace
[zero] show model data cuda memory usage after zero context init. ( #515 )
3 years ago
Jiarui Fang
9330be0f3c
[memory] set cuda mem frac ( #506 )
3 years ago
Jiarui Fang
0035b7be07
[memory] add model data tensor moving api ( #503 )
3 years ago
Jiarui Fang
a445e118cf
[polish] polish singleton and global context ( #500 )
3 years ago
HELSON
f24b5ed201
[MOE] remove old MoE legacy ( #493 )
3 years ago
Jiarui Fang
b334822163
[zero] polish sharded param name ( #484 )
...
* [zero] polish sharded param name
* polish code
* polish
* polish code
* polish
* polsih
* polish
3 years ago
Jiarui Fang
65c0f380c2
[format] polish name format for MOE ( #481 )
3 years ago
HELSON
7544347145
[MOE] add unitest for MOE experts layout, gradient handler and kernel ( #469 )
3 years ago
HELSON
aff9d354f7
[MOE] polish moe_env ( #467 )
3 years ago
HELSON
84fd7c1d4d
add moe context, moe utilities and refactor gradient handler ( #455 )
3 years ago
Frank Lee
b72b8445c6
optimized context test time consumption ( #446 )
3 years ago
Jiarui Fang
496cbb0760
[hotfix] fix initialize bug with zero ( #442 )
3 years ago
Frank Lee
b03b3ae99c
fixed mem monitor device ( #433 )
...
fixed mem monitor device
3 years ago
Jiarui Fang
adebb3e041
[zero] cuda margin space for OS ( #418 )
3 years ago
Jiarui Fang
56bb412e72
[polish] use GLOBAL_MODEL_DATA_TRACER ( #417 )
3 years ago
Jiarui Fang
21dc54e019
[zero] memtracer to record cuda memory usage of model data and overall system ( #395 )
3 years ago
LuGY
a9c27be42e
Added tensor detector ( #393 )
...
* Added tensor detector
* Added the - states
* Allowed change include_cpu when detect()
3 years ago
1SAA
907ac4a2dc
fixed error when no collective communication in CommProfiler
3 years ago
HELSON
dfd0363f68
polished output format for communication profiler and pcie profiler ( #404 )
...
fixed typing error
3 years ago
HELSON
7c079d9c33
[hotfix] fixed bugs in ShardStrategy and PcieProfiler ( #394 )
3 years ago
LuGY
de46450461
Added activation offload ( #331 )
...
* Added activation offload
* Fixed the import bug, used the pytest
3 years ago
HELSON
8c18eb0998
[profiler] Fixed bugs in CommProfiler and PcieProfiler ( #377 )
3 years ago
Jiarui Fang
b5f43acee3
[zero] find miss code ( #378 )
3 years ago
HELSON
1ed7c24c02
Added PCIE profiler to dectect data transmission ( #373 )
3 years ago
jiaruifang
d9217e1960
Revert "[zero] bucketized tensor cpu gpu copy ( #368 )"
...
This reverts commit bef05489b6
.
3 years ago
Jiarui Fang
00670c870e
[zero] bucketized tensor cpu gpu copy ( #368 )
3 years ago
Jiarui Fang
ea2872073f
[zero] global model data memory tracer ( #360 )
3 years ago
HELSON
534e0bb118
Fixed import bug for no-tensorboard environment ( #354 )
3 years ago
HELSON
c57e089824
[profile] added example for ProfilerContext ( #349 )
3 years ago
Jiarui Fang
10e2826426
move async memory to an individual directory ( #345 )
3 years ago
HELSON
425bb0df3f
Added Profiler Context to manage all profilers ( #340 )
3 years ago
HELSON
4f26fabe4f
fixed strings in profiler outputs ( #325 )
3 years ago
1SAA
73bff11288
Added profiler communication operations
...
Fixed bug for learning rate scheduler
3 years ago
Jie Zhu
d344689274
[profiler] primary memory tracer
3 years ago
Jiarui Fang
5a560a060a
Feature/zero ( #279 )
...
* add zero1 (#209 )
* add zero1
* add test zero1
* update zero stage 1 develop (#212 )
* Implement naive zero3 (#240 )
* naive zero3 works well
* add zero3 param manager
* add TODOs in comments
* add gather full param ctx
* fix sub module streams
* add offload
* fix bugs of hook and add unit tests
* fix bugs of hook and add unit tests (#252 )
* add gather full param ctx
* fix sub module streams
* add offload
* fix bugs of hook and add unit tests
* polish code and add state dict hook
* fix bug
* update unit test
* refactor reconstructed zero code
* clip_grad support zero3 and add unit test
* add unit test for Zero3ParameterManager
* [WIP] initialize the shard param class
* [WIP] Yet another sharded model implementation (#274 )
* [WIP] initialize the shard param class
* [WIP] Yes another implementation of shardModel. Using a better hook method.
* torch.concat -> torch.cat
* fix test_zero_level_1.py::test_zero_level_1 unitest
* remove deepspeed implementation and refactor for the reconstructed zero module
* polish zero dp unittests
Co-authored-by: ver217 <lhx0217@gmail.com>
Co-authored-by: Frank Lee <somerlee.9@gmail.com>
3 years ago
Frank Lee
3a1a9820b0
fixed mkdir conflict and align yapf config with flake ( #220 )
3 years ago
アマデウス
9ee197d0e9
moved env variables to global variables; ( #215 )
...
added branch context;
added vocab parallel layers;
moved split_batch from load_batch to tensor parallel embedding layers;
updated gpt model;
updated unit test cases;
fixed few collective communicator bugs
3 years ago
Frank Lee
812357d63c
fixed utils docstring and add example to readme ( #200 )
3 years ago
HELSON
0f8c7f9804
Fixed docstring in colossalai ( #171 )
3 years ago
Frank Lee
e2089c5c15
adapted for sequence parallel ( #163 )
3 years ago