Commit Graph

95 Commits (d344313533de84ebd6876e0da86303218a954a4f)

Author SHA1 Message Date
Shawn-Kong 1712da2800
[NFC] polish colossalai/gemini/gemini_context.py code style (#2690)
2 years ago
HELSON 8213f89fd2
[gemini] add fake_release_chunk for keep-gathered chunk in the inference mode (#2671)
2 years ago
ver217 5b1854309a
[hotfix] fix zero ddp warmup check (#2545)
2 years ago
HELSON 707b11d4a0
[gemini] update ddp strict mode (#2518)
2 years ago
HELSON 2bfeb24308
[zero] add warning for ignored parameters (#2446)
2 years ago
HELSON 7829aa094e
[ddp] add is_ddp_ignored (#2434)
2 years ago
HELSON bb4e9a311a
[zero] add inference mode and its unit test (#2418)
2 years ago
Jiarui Fang af32022f74
[Gemini] fix the convert_to_torch_module bug (#2269)
2 years ago
Jiarui Fang d5e3e3ec01
[example] update gpt example for larger model scale (#2211)
2 years ago
HELSON 2458659919
[zero] fix error for BEiT models (#2169)
2 years ago
Jiarui Fang b87496a66b
[hotfix] fix auto policy of test_sharded_optim_v2 (#2157)
2 years ago
Jiarui Fang ee287620f0
[Gemini] revert ZeROInitCtx related tracer (#2138)
2 years ago
Jiarui Fang c89c66a858
[Gemini] update API of the chunkmemstatscollector. (#2129)
2 years ago
Jiarui Fang 2938edf446
[Gemini] update the non model data record method in runtime memory tracer (#2128)
2 years ago
Jiarui Fang 8fac837679
[Gemini] update non model data calculation method (#2126)
2 years ago
Jiarui Fang 5efda69735
[Gemini] hotfix the unittest bugs (#2125)
2 years ago
Jiarui Fang 05bb28aacf
[Gemini] mapping of preop timestep and param (#2124)
2 years ago
Jiarui Fang 9214d1fe28
[Gemini] chunk init using runtime visited param order (#2115)
2 years ago
Jiarui Fang e5aa8333e4
[NFC] update chunk manager API (#2119)
2 years ago
Jiarui Fang e99edfcb51
[NFC] polish comments for Chunk class (#2116)
2 years ago
Jiarui Fang 8afc001f4f
[Gemini] chunk init use OrderedParamGenerator (#2110)
2 years ago
HELSON 63fbba3c19
[zero] add L2 gradient clipping for ZeRO (#2112)
2 years ago
Jiarui Fang 70a8556946
[gemini] get the param visited order during runtime (#2108)
2 years ago
Jiarui Fang 61f31c3cf0
[Gemini] NFC, polish search_chunk_configuration (#2107)
2 years ago
Jiarui Fang 85efb7ac2e
[Gemini] gemini use the runtime memory tracer (RMT) (#2099)
2 years ago
Jiarui Fang 4b055351b0
[Gemini] make RuntimeMemTracer work correctly (#2096)
2 years ago
Jiarui Fang 1fca5d79ea
[Gemini] remove GLOBAL_MODEL_DATA_TRACER (#2091)
2 years ago
Jiarui Fang 28e55c2530
[Gemini] remove GLOBAL_CUDA_MEM_INFO (#2090)
2 years ago
Jiarui Fang 25abae6d7f
[Gemini] use MemStats in Runtime Memory tracer (#2088)
2 years ago
Jiarui Fang 33f4412102
[Gemini] use MemStats to store the tracing data. Seperate it from Collector. (#2084)
2 years ago
Jiarui Fang 1f99205827
[Gemini] remove static tracer (#2083)
2 years ago
Jiarui Fang b3b89865e2
[Gemini] ParamOpHook -> ColoParamOpHook (#2080)
2 years ago
Jiarui Fang a7adad9ccb
[Gemini] rename hooks related to runtime mem tracer (#2076)
2 years ago
Jiarui Fang 223332ff7e
[Gemini] rename ParamTracerWrapper -> RuntimeMemTracer (#2073)
2 years ago
Jiarui Fang 9f828ef36f
[Gemini] remove not used MemtracerWrapper (#2072)
2 years ago
Zihao 38ea4ba1bd
[Gemini] fix grad unreleased issue and param recovery issue (#2052)
2 years ago
Zihao 6a9158f1fa
[Gemini] free and allocate cuda memory by tensor.storage, add grad hook (#2040)
2 years ago
Jiarui Fang 28aa9a4294
[Gemini] more rigorous unit tests for run_fwd_bwd (#2034)
2 years ago
Zihao 95c4532fff
[Gemini] paramWrapper paramTracerHook unitest (#2030)
2 years ago
Jiarui Fang 8daf1b4db1
[Gemini] patch for supporting orch.add_ function for ColoTensor (#2003)
2 years ago
Zihao a719b89a41
[gemini] param_trace_hook (#2020)
2 years ago
Jiarui Fang 0b0d8f9e17
[hotfix] revert bug PRs (#2016)
2 years ago
Zihao aba3db464d
[Gemini] ParamMemHook (#2008)
2 years ago
Zihao 0160a62a3c
[Gemini] param_tracer_wrapper and test case (#2009)
2 years ago
Jiarui Fang 3712ac7f90
[Gemini] add bert for MemtracerWrapper unintests (#1982)
2 years ago
Jiarui Fang e481489aa6
[Gemini] MemtracerWrapper unittests (#1981)
2 years ago
Jiarui Fang 31922110ad
[Gemini] memory trace hook (#1978)
2 years ago
Jiarui Fang 0529fcde06
[Gemini] independent runtime tracer (#1974)
2 years ago
Jiarui Fang 7e24b9b9ee
[Gemini] clean no used MemTraceOp (#1970)
2 years ago
Jiarui Fang c4739a725a
[Gemini] polish memstats collector (#1962)
2 years ago