Commit Graph

101 Commits (c3d5fa3bac85baa07e30e2978a7517034ba7e0aa)

Author SHA1 Message Date
digger yu 1878749753
[nfc] fix typo colossalai/nn (#3887)
2 years ago
digger yu 9265f2d4d7
[NFC]fix typo colossalai/auto_parallel nn utils etc. (#3779)
2 years ago
ver217 26b7aac0be
[zero] reorganize zero/gemini folder structure (#3424)
2 years ago
junxu c52edcf0eb
Rename class method of ZeroDDP (#2692)
2 years ago
HELSON 8213f89fd2
[gemini] add fake_release_chunk for keep-gathered chunk in the inference mode (#2671)
2 years ago
ver217 5b1854309a
[hotfix] fix zero ddp warmup check (#2545)
2 years ago
HELSON a4ed9125ac
[hotfix] fix lightning error (#2529)
2 years ago
HELSON 66dfcf5281
[gemini] update the gpt example (#2527)
2 years ago
HELSON b528eea0f0
[zero] add zero wrappers (#2523)
2 years ago
HELSON 707b11d4a0
[gemini] update ddp strict mode (#2518)
2 years ago
HELSON 2d1a7dfe5f
[zero] add strict ddp mode (#2508)
2 years ago
HELSON 5521af7877
[zero] fix state_dict and load_state_dict for ddp ignored parameters (#2443)
2 years ago
HELSON 7829aa094e
[ddp] add is_ddp_ignored (#2434)
2 years ago
HELSON bb4e9a311a
[zero] add inference mode and its unit test (#2418)
2 years ago
HELSON ea13a201bb
[polish] polish code for get_static_torch_model (#2405)
2 years ago
eric8607242 9880fd2cd8
Fix state_dict key missing issue of the ZeroDDP (#2363)
2 years ago
HELSON 48d33b1b17
[gemini] add get static torch model (#2356)
2 years ago
Jiarui Fang af32022f74
[Gemini] fix the convert_to_torch_module bug (#2269)
2 years ago
HELSON 2458659919
[zero] fix error for BEiT models (#2169)
2 years ago
Jiarui Fang 2827f41898
[Gemini] GeminiDPP convert to PyTorch Module. (#2151)
2 years ago
Jiarui Fang 9214d1fe28
[Gemini] chunk init using runtime visited param order (#2115)
2 years ago
Jiarui Fang e5aa8333e4
[NFC] update chunk manager API (#2119)
2 years ago
Jiarui Fang e99edfcb51
[NFC] polish comments for Chunk class (#2116)
2 years ago
HELSON 63fbba3c19
[zero] add L2 gradient clipping for ZeRO (#2112)
2 years ago
Jiarui Fang 1f99205827
[Gemini] remove static tracer (#2083)
2 years ago
Jiarui Fang b3b89865e2
[Gemini] ParamOpHook -> ColoParamOpHook (#2080)
2 years ago
HELSON e37f3db40c
[gemini] add arguments (#2046)
2 years ago
Jiarui Fang 96134e7be3
[hotfix] add bert test for gemini fwd bwd (#2035)
2 years ago
Jiarui Fang 8daf1b4db1
[Gemini] patch for supporting orch.add_ function for ColoTensor (#2003)
2 years ago
Jiarui Fang cc0ed7cf33
[Gemini] ZeROHookV2 -> GeminiZeROHook (#1972)
2 years ago
Jiarui Fang f7e276fa71
[Gemini] add GeminiAdamOptimizer (#1960)
2 years ago
Jiarui Fang cd5a0d56fa
[Gemini] make gemini usage simple (#1821)
2 years ago
Zihao 20e255d4e8
MemStatsCollectorStatic (#1765)
2 years ago
HELSON c6a1a62636
[hotfix] fix zero's incompatibility with checkpoint in torch-1.12 (#1786)
2 years ago
HELSON 1468e4bcfc
[zero] add constant placement policy (#1705)
2 years ago
Jiarui Fang 21962e1593
[embedding] rename FreqAwareEmbedding -> CachedEmbedding (#1699)
2 years ago
Jiarui Fang 363fc2861a
[embeddings] more detailed timer (#1692)
2 years ago
HELSON b28991dd0a
[feature] A new ZeRO implementation (#1644)
2 years ago
Jiarui Fang c638bec028
[embedding] polish async copy (#1657)
2 years ago
Jiarui Fang 988570e4a6
[embedding] add more detail profiling (#1656)
2 years ago
Jiarui Fang e1f97fd2b8
[embedding] print profiling results (#1654)
2 years ago
Jiarui Fang 04443605a5
[embedding] non-blocking cpu-gpu copy (#1647)
2 years ago
CsRic 0767f67a0f
[embedding] isolate cache_op from forward (#1645)
2 years ago
Jiarui Fang c5d39215f6
Revert "[feature] new zero implementation (#1623)" (#1643)
2 years ago
HELSON 5be118f405
[feature] new zero implementation (#1623)
2 years ago
Jiarui Fang e57df80325
[embeddings] cache option (#1635)
2 years ago
Jiarui Fang 38c68b5b9a
[embedding] rollback for better FAW performance (#1625)
2 years ago
Jiarui Fang 504ff1d101
[embeddings] use cache_ratio instead of cuda_row_num (#1611)
2 years ago
Jiarui Fang a19eb80998
[embedding] updates some default parameters
2 years ago
CsRic f3403ff98e
[embeddings] add already_split_along_rank flag for tablewise mode (#1584)
2 years ago