Jiarui Fang
d5e3e3ec01
[example] update gpt example for larger model scale ( #2211 )
2 years ago
Jiarui Fang
b87496a66b
[hotfix] fix auto policy of test_sharded_optim_v2 ( #2157 )
2 years ago
Jiarui Fang
ee287620f0
[Gemini] revert ZeROInitCtx related tracer ( #2138 )
2 years ago
Jiarui Fang
c89c66a858
[Gemini] update API of the chunkmemstatscollector. ( #2129 )
2 years ago
Jiarui Fang
2938edf446
[Gemini] update the non model data record method in runtime memory tracer ( #2128 )
2 years ago
Jiarui Fang
8fac837679
[Gemini] update non model data calculation method ( #2126 )
2 years ago
Jiarui Fang
5efda69735
[Gemini] hotfix the unittest bugs ( #2125 )
2 years ago
Jiarui Fang
05bb28aacf
[Gemini] mapping of preop timestep and param ( #2124 )
2 years ago
Jiarui Fang
9214d1fe28
[Gemini] chunk init using runtime visited param order ( #2115 )
2 years ago
Jiarui Fang
8afc001f4f
[Gemini] chunk init use OrderedParamGenerator ( #2110 )
2 years ago
Jiarui Fang
70a8556946
[gemini] get the param visited order during runtime ( #2108 )
2 years ago
Jiarui Fang
85efb7ac2e
[Gemini] gemini use the runtime memory tracer (RMT) ( #2099 )
2 years ago
Jiarui Fang
4b055351b0
[Gemini] make RuntimeMemTracer work correctly ( #2096 )
2 years ago
Jiarui Fang
1fca5d79ea
[Gemini] remove GLOBAL_MODEL_DATA_TRACER ( #2091 )
2 years ago
Jiarui Fang
28e55c2530
[Gemini] remove GLOBAL_CUDA_MEM_INFO ( #2090 )
2 years ago
Jiarui Fang
25abae6d7f
[Gemini] use MemStats in Runtime Memory tracer ( #2088 )
2 years ago
Jiarui Fang
33f4412102
[Gemini] use MemStats to store the tracing data. Seperate it from Collector. ( #2084 )
2 years ago
Jiarui Fang
1f99205827
[Gemini] remove static tracer ( #2083 )
2 years ago
Jiarui Fang
b3b89865e2
[Gemini] ParamOpHook -> ColoParamOpHook ( #2080 )
2 years ago
Jiarui Fang
a7adad9ccb
[Gemini] rename hooks related to runtime mem tracer ( #2076 )
2 years ago
Jiarui Fang
223332ff7e
[Gemini] rename ParamTracerWrapper -> RuntimeMemTracer ( #2073 )
2 years ago
Jiarui Fang
9f828ef36f
[Gemini] remove not used MemtracerWrapper ( #2072 )
2 years ago
Zihao
38ea4ba1bd
[Gemini] fix grad unreleased issue and param recovery issue ( #2052 )
2 years ago
Zihao
6a9158f1fa
[Gemini] free and allocate cuda memory by tensor.storage, add grad hook ( #2040 )
2 years ago
Zihao
95c4532fff
[Gemini] paramWrapper paramTracerHook unitest ( #2030 )
2 years ago
Jiarui Fang
0b0d8f9e17
[hotfix] revert bug PRs ( #2016 )
2 years ago
Zihao
0160a62a3c
[Gemini] param_tracer_wrapper and test case ( #2009 )
2 years ago
Jiarui Fang
3712ac7f90
[Gemini] add bert for MemtracerWrapper unintests ( #1982 )
2 years ago
Jiarui Fang
0529fcde06
[Gemini] independent runtime tracer ( #1974 )
2 years ago
Jiarui Fang
c4739a725a
[Gemini] polish memstats collector ( #1962 )
2 years ago
Zihao
20e255d4e8
MemStatsCollectorStatic ( #1765 )
2 years ago
HELSON
b28991dd0a
[feature] A new ZeRO implementation ( #1644 )
2 years ago
Jiarui Fang
c5d39215f6
Revert "[feature] new zero implementation ( #1623 )" ( #1643 )
...
This reverts commit 5be118f405
.
2 years ago
HELSON
5be118f405
[feature] new zero implementation ( #1623 )
2 years ago
Jiarui Fang
372f791444
[refactor] move chunk and chunkmgr to directory gemini ( #1182 )
2 years ago
ver217
1f894e033f
[gemini] zero supports gemini ( #1093 )
...
* add placement policy
* add gemini mgr
* update mem stats collector
* update zero
* update zero optim
* fix bugs
* zero optim monitor os
* polish unit test
* polish unit test
* add assert
3 years ago
ver217
be01db37c8
[tensor] refactor chunk mgr and impl MemStatsCollectorV2 ( #1077 )
...
* polish chunk manager
* polish unit test
* impl add_extern_static_tensor for chunk mgr
* add mem stats collector v2
* polish code
* polish unit test
* polish code
* polish get chunks
3 years ago
ver217
d7e0303d1e
[zero] use GeminiMemoryManager when sampling model data ( #850 )
3 years ago
Jiarui Fang
4d9332b4c5
[refactor] moving memtracer to gemini ( #801 )
3 years ago