Commit Graph

39 Commits (4befaabace567589251a0c5ba2916a7fc891bcfa)

Author SHA1 Message Date
HELSON ea13a201bb
[polish] polish code for get_static_torch_model (#2405)
2 years ago
HELSON 48d33b1b17
[gemini] add get static torch model (#2356)
2 years ago
HELSON a3100bd50d
[testing] add beit model for unit testings (#2196)
2 years ago
HELSON 2458659919
[zero] fix error for BEiT models (#2169)
2 years ago
Jiarui Fang 27327a4c90
[example] add palm pytorch version (#2172)
2 years ago
Jiarui Fang 2827f41898
[Gemini] GeminiDPP convert to PyTorch Module. (#2151)
2 years ago
Jiarui Fang c89c66a858
[Gemini] update API of the chunkmemstatscollector. (#2129)
2 years ago
Jiarui Fang 2938edf446
[Gemini] update the non model data record method in runtime memory tracer (#2128)
2 years ago
Jiarui Fang deee317b0f
[Gemini] test step-tensor mapping using repeated_computed_layers.py (#2127)
2 years ago
Jiarui Fang 8fac837679
[Gemini] update non model data calculation method (#2126)
2 years ago
Jiarui Fang 5efda69735
[Gemini] hotfix the unittest bugs (#2125)
2 years ago
Jiarui Fang 05bb28aacf
[Gemini] mapping of preop timestep and param (#2124)
2 years ago
Jiarui Fang 9214d1fe28
[Gemini] chunk init using runtime visited param order (#2115)
2 years ago
Jiarui Fang e5aa8333e4
[NFC] update chunk manager API (#2119)
2 years ago
Jiarui Fang e99edfcb51
[NFC] polish comments for Chunk class (#2116)
2 years ago
HELSON 63fbba3c19
[zero] add L2 gradient clipping for ZeRO (#2112)
2 years ago
Jiarui Fang 85efb7ac2e
[Gemini] gemini use the runtime memory tracer (RMT) (#2099)
2 years ago
Jiarui Fang 978242326a
[Gemini] remove eval in gemini unittests! (#2092)
2 years ago
Jiarui Fang 40b7d55bf3
[Gemini] add albert in test models. (#2075)
2 years ago
HELSON f6178728a0
[gemini] fix init bugs for modules (#2047)
2 years ago
Jiarui Fang 1e885329f4
[test] align model name with the file name. (#2045)
2 years ago
Jiarui Fang 31c644027b
[hotfix] hotfix Gemini for no leaf modules bug (#2043)
2 years ago
HELSON 384cd26314
[zero] fix testing parameters (#2042)
2 years ago
HELSON 17a3c685b0
[zero] fix unit-tests (#2039)
2 years ago
Jiarui Fang eb7742a4bb
[Gemini] more tests for Gemini (#2038)
2 years ago
HELSON 537e181705
[testing] fix testing models (#2036)
2 years ago
Jiarui Fang 96134e7be3
[hotfix] add bert test for gemini fwd bwd (#2035)
2 years ago
Jiarui Fang 28aa9a4294
[Gemini] more rigorous unit tests for run_fwd_bwd (#2034)
2 years ago
Jiarui Fang 2e9cbfca12
[Gemini] add unitests to check gemini correctness (#2015)
2 years ago
Jiarui Fang f7e276fa71
[Gemini] add GeminiAdamOptimizer (#1960)
2 years ago
HELSON c6a1a62636
[hotfix] fix zero's incompatibility with checkpoint in torch-1.12 (#1786)
2 years ago
HELSON f69f9bf223
[zero] add chunk init function for users (#1729)
2 years ago
HELSON 1468e4bcfc
[zero] add constant placement policy (#1705)
2 years ago
HELSON b28991dd0a
[feature] A new ZeRO implementation (#1644)
2 years ago
Jiarui Fang c5d39215f6
Revert "[feature] new zero implementation (#1623)" (#1643)
2 years ago
HELSON 5be118f405
[feature] new zero implementation (#1623)
2 years ago
HELSON b80340168e
[zero] add chunk_managerV2 for all-gather chunk (#1441)
2 years ago
HELSON 9056677b13
[zero] add chunk size searching algorithm for parameters in different groups (#1436)
2 years ago
HELSON 039b7ed3bc
[polish] add update directory in gemini; rename AgChunk to ChunkV2 (#1432)
2 years ago