Commit Graph

72 Commits (052b03e83f30f46a43f87e2c9739ab04f56b6460)

Author SHA1 Message Date
Frank Lee 1ad3a636b1
[test] fixed torchrec model test (#3167)
2 years ago
HELSON 56ddc9ca7a
[hotfix] add correct device for fake_param (#2796)
2 years ago
HELSON 8213f89fd2
[gemini] add fake_release_chunk for keep-gathered chunk in the inference mode (#2671)
2 years ago
HELSON 707b11d4a0
[gemini] update ddp strict mode (#2518)
2 years ago
HELSON 5521af7877
[zero] fix state_dict and load_state_dict for ddp ignored parameters (#2443)
2 years ago
HELSON bb4e9a311a
[zero] add inference mode and its unit test (#2418)
2 years ago
HELSON ea13a201bb
[polish] polish code for get_static_torch_model (#2405)
2 years ago
HELSON 48d33b1b17
[gemini] add get static torch model (#2356)
2 years ago
HELSON a3100bd50d
[testing] add beit model for unit testings (#2196)
2 years ago
HELSON 2458659919
[zero] fix error for BEiT models (#2169)
2 years ago
Jiarui Fang 27327a4c90
[example] add palm pytorch version (#2172)
2 years ago
Jiarui Fang 2827f41898
[Gemini] GeminiDPP convert to PyTorch Module. (#2151)
2 years ago
Jiarui Fang c89c66a858
[Gemini] update API of the chunkmemstatscollector. (#2129)
2 years ago
Jiarui Fang 2938edf446
[Gemini] update the non model data record method in runtime memory tracer (#2128)
2 years ago
Jiarui Fang deee317b0f
[Gemini] test step-tensor mapping using repeated_computed_layers.py (#2127)
2 years ago
Jiarui Fang 8fac837679
[Gemini] update non model data calculation method (#2126)
2 years ago
Jiarui Fang 5efda69735
[Gemini] hotfix the unittest bugs (#2125)
2 years ago
Jiarui Fang 05bb28aacf
[Gemini] mapping of preop timestep and param (#2124)
2 years ago
Jiarui Fang 9214d1fe28
[Gemini] chunk init using runtime visited param order (#2115)
2 years ago
Jiarui Fang e5aa8333e4
[NFC] update chunk manager API (#2119)
2 years ago
Jiarui Fang e99edfcb51
[NFC] polish comments for Chunk class (#2116)
2 years ago
HELSON 63fbba3c19
[zero] add L2 gradient clipping for ZeRO (#2112)
2 years ago
Jiarui Fang 70a8556946
[gemini] get the param visited order during runtime (#2108)
2 years ago
Jiarui Fang 85efb7ac2e
[Gemini] gemini use the runtime memory tracer (RMT) (#2099)
2 years ago
Jiarui Fang 978242326a
[Gemini] remove eval in gemini unittests! (#2092)
2 years ago
Jiarui Fang 1fca5d79ea
[Gemini] remove GLOBAL_MODEL_DATA_TRACER (#2091)
2 years ago
Jiarui Fang 25abae6d7f
[Gemini] use MemStats in Runtime Memory tracer (#2088)
2 years ago
Jiarui Fang 4f21c9e8d9
[Gemini] polish runtime tracer tests (#2077)
2 years ago
Jiarui Fang a7adad9ccb
[Gemini] rename hooks related to runtime mem tracer (#2076)
2 years ago
Jiarui Fang 40b7d55bf3
[Gemini] add albert in test models. (#2075)
2 years ago
Jiarui Fang 616ed91ecd
[test] bert test in non-distributed way (#2074)
2 years ago
Jiarui Fang 223332ff7e
[Gemini] rename ParamTracerWrapper -> RuntimeMemTracer (#2073)
2 years ago
Jiarui Fang 9f828ef36f
[Gemini] remove not used MemtracerWrapper (#2072)
2 years ago
Zihao 38ea4ba1bd
[Gemini] fix grad unreleased issue and param recovery issue (#2052)
2 years ago
HELSON f6178728a0
[gemini] fix init bugs for modules (#2047)
2 years ago
Zihao 6a9158f1fa
[Gemini] free and allocate cuda memory by tensor.storage, add grad hook (#2040)
2 years ago
Jiarui Fang 1e885329f4
[test] align model name with the file name. (#2045)
2 years ago
Jiarui Fang 31c644027b
[hotfix] hotfix Gemini for no leaf modules bug (#2043)
2 years ago
HELSON 384cd26314
[zero] fix testing parameters (#2042)
2 years ago
HELSON 17a3c685b0
[zero] fix unit-tests (#2039)
2 years ago
Jiarui Fang eb7742a4bb
[Gemini] more tests for Gemini (#2038)
2 years ago
HELSON 537e181705
[testing] fix testing models (#2036)
2 years ago
Jiarui Fang 96134e7be3
[hotfix] add bert test for gemini fwd bwd (#2035)
2 years ago
Jiarui Fang 28aa9a4294
[Gemini] more rigorous unit tests for run_fwd_bwd (#2034)
2 years ago
Zihao 95c4532fff
[Gemini] paramWrapper paramTracerHook unitest (#2030)
2 years ago
Jiarui Fang 8daf1b4db1
[Gemini] patch for supporting orch.add_ function for ColoTensor (#2003)
2 years ago
Jiarui Fang 2e9cbfca12
[Gemini] add unitests to check gemini correctness (#2015)
2 years ago
Jiarui Fang 0b0d8f9e17
[hotfix] revert bug PRs (#2016)
2 years ago
Zihao 0160a62a3c
[Gemini] param_tracer_wrapper and test case (#2009)
2 years ago
Jiarui Fang 3d907faede
[Gemini] add an inline_op_module to common test models and polish unitests. (#2004)
2 years ago