Commit Graph

39 Commits (31c78f2be3272a9a4062fe78eca34b3847a0c900)

Author SHA1 Message Date
Jiarui Fang d5e3e3ec01
[example] update gpt example for larger model scale (#2211)
2 years ago
Jiarui Fang b87496a66b
[hotfix] fix auto policy of test_sharded_optim_v2 (#2157)
2 years ago
Jiarui Fang ee287620f0
[Gemini] revert ZeROInitCtx related tracer (#2138)
2 years ago
Jiarui Fang c89c66a858
[Gemini] update API of the chunkmemstatscollector. (#2129)
2 years ago
Jiarui Fang 2938edf446
[Gemini] update the non model data record method in runtime memory tracer (#2128)
2 years ago
Jiarui Fang 8fac837679
[Gemini] update non model data calculation method (#2126)
2 years ago
Jiarui Fang 5efda69735
[Gemini] hotfix the unittest bugs (#2125)
2 years ago
Jiarui Fang 05bb28aacf
[Gemini] mapping of preop timestep and param (#2124)
2 years ago
Jiarui Fang 9214d1fe28
[Gemini] chunk init using runtime visited param order (#2115)
2 years ago
Jiarui Fang 8afc001f4f
[Gemini] chunk init use OrderedParamGenerator (#2110)
2 years ago
Jiarui Fang 70a8556946
[gemini] get the param visited order during runtime (#2108)
2 years ago
Jiarui Fang 85efb7ac2e
[Gemini] gemini use the runtime memory tracer (RMT) (#2099)
2 years ago
Jiarui Fang 4b055351b0
[Gemini] make RuntimeMemTracer work correctly (#2096)
2 years ago
Jiarui Fang 1fca5d79ea
[Gemini] remove GLOBAL_MODEL_DATA_TRACER (#2091)
2 years ago
Jiarui Fang 28e55c2530
[Gemini] remove GLOBAL_CUDA_MEM_INFO (#2090)
2 years ago
Jiarui Fang 25abae6d7f
[Gemini] use MemStats in Runtime Memory tracer (#2088)
2 years ago
Jiarui Fang 33f4412102
[Gemini] use MemStats to store the tracing data. Seperate it from Collector. (#2084)
2 years ago
Jiarui Fang 1f99205827
[Gemini] remove static tracer (#2083)
2 years ago
Jiarui Fang b3b89865e2
[Gemini] ParamOpHook -> ColoParamOpHook (#2080)
2 years ago
Jiarui Fang a7adad9ccb
[Gemini] rename hooks related to runtime mem tracer (#2076)
2 years ago
Jiarui Fang 223332ff7e
[Gemini] rename ParamTracerWrapper -> RuntimeMemTracer (#2073)
2 years ago
Jiarui Fang 9f828ef36f
[Gemini] remove not used MemtracerWrapper (#2072)
2 years ago
Zihao 38ea4ba1bd
[Gemini] fix grad unreleased issue and param recovery issue (#2052)
2 years ago
Zihao 6a9158f1fa
[Gemini] free and allocate cuda memory by tensor.storage, add grad hook (#2040)
2 years ago
Zihao 95c4532fff
[Gemini] paramWrapper paramTracerHook unitest (#2030)
2 years ago
Jiarui Fang 0b0d8f9e17
[hotfix] revert bug PRs (#2016)
2 years ago
Zihao 0160a62a3c
[Gemini] param_tracer_wrapper and test case (#2009)
2 years ago
Jiarui Fang 3712ac7f90
[Gemini] add bert for MemtracerWrapper unintests (#1982)
2 years ago
Jiarui Fang 0529fcde06
[Gemini] independent runtime tracer (#1974)
2 years ago
Jiarui Fang c4739a725a
[Gemini] polish memstats collector (#1962)
2 years ago
Zihao 20e255d4e8
MemStatsCollectorStatic (#1765)
2 years ago
HELSON b28991dd0a
[feature] A new ZeRO implementation (#1644)
2 years ago
Jiarui Fang c5d39215f6
Revert "[feature] new zero implementation (#1623)" (#1643)
2 years ago
HELSON 5be118f405
[feature] new zero implementation (#1623)
2 years ago
Jiarui Fang 372f791444
[refactor] move chunk and chunkmgr to directory gemini (#1182)
2 years ago
ver217 1f894e033f
[gemini] zero supports gemini (#1093)
3 years ago
ver217 be01db37c8
[tensor] refactor chunk mgr and impl MemStatsCollectorV2 (#1077)
3 years ago
ver217 d7e0303d1e
[zero] use GeminiMemoryManager when sampling model data (#850)
3 years ago
Jiarui Fang 4d9332b4c5
[refactor] moving memtracer to gemini (#801)
3 years ago