327 Commits (457a0de79fd2d3602eba0ac78e606acb6401fc60)

Author SHA1 Message Date
digger yu 9265f2d4d7
[NFC]fix typo colossalai/auto_parallel nn utils etc. (#3779) 2 years ago
jiangmingyan 307894f74d
[booster] gemini plugin support shard checkpoint (#3610) 2 years ago
YH a22407cc02
[zero] Suggests a minor change to confusing variable names in the ZeRO optimizer. (#3173) 2 years ago
Hongxin Liu 50793b35f4
[gemini] accelerate inference (#3641) 2 years ago
Hongxin Liu 4b3240cb59
[booster] add low level zero plugin (#3594) 2 years ago
digger-yu b9a8dff7e5
[doc] Fix typo under colossalai and doc(#3618) 2 years ago
Hongxin Liu 12eff9eb4c
[gemini] state dict supports fp16 (#3590) 2 years ago
Hongxin Liu f313babd11
[gemini] support save state dict in shards (#3581) 2 years ago
YH d329c294ec
Add docstr for zero3 chunk search utils (#3572) 2 years ago
Hongxin Liu 173dad0562
[misc] add verbose arg for zero and op builder (#3552) 2 years ago
Hongxin Liu 152239bbfa
[gemini] gemini supports lazy init (#3379) 2 years ago
YH bcf0cbcbe7
[doc] Add docs for clip args in zero optim (#3504) 2 years ago
ver217 573af84184
[example] update examples related to zero/gemini (#3431) 2 years ago
ver217 26b7aac0be
[zero] reorganize zero/gemini folder structure (#3424) 2 years ago
YH 80aed29cd3
[zero] Refactor ZeroContextConfig class using dataclass (#3186) 2 years ago
YH 9d644ff09f
Fix docstr for zero statedict (#3185) 2 years ago
ver217 823f3b9cf4
[doc] add deepspeed citation and copyright (#2996) 2 years ago
YH 7b13f7db18
[zero] trivial zero optimizer refactoring (#2869) 2 years ago
Boyuan Yao 8e3f66a0d1
[zero] fix wrong import (#2777) 2 years ago
Nikita Shulga 01066152f1
Don't use `torch._six` (#2775) 2 years ago
YH ae86a29e23
Refact method of grad store (#2687) 2 years ago
HELSON df4f020ee3
[zero1&2] only append parameters with gradients (#2681) 2 years ago
HELSON b528eea0f0
[zero] add zero wrappers (#2523) 2 years ago
HELSON 077a5cdde4
[zero] fix gradient clipping in hybrid parallelism (#2521) 2 years ago
HELSON d565a24849
[zero] add unit testings for hybrid parallelism (#2486) 2 years ago
HELSON a5dc4253c6
[zero] polish low level optimizer (#2473) 2 years ago
Jiarui Fang 867c8c2d3a
[zero] low level optim supports ProcessGroup (#2464) 2 years ago
HELSON 7829aa094e
[ddp] add is_ddp_ignored (#2434) 2 years ago
HELSON 62c38e3330
[zero] polish low level zero optimizer (#2275) 2 years ago
HELSON a7d95b7024
[example] add zero1, zero2 example in GPT examples (#2146) 2 years ago
Jiarui Fang c89c66a858
[Gemini] update API of the chunkmemstatscollector. (#2129) 2 years ago
Jiarui Fang 2938edf446
[Gemini] update the non model data record method in runtime memory tracer (#2128) 2 years ago
Jiarui Fang e99edfcb51
[NFC] polish comments for Chunk class (#2116) 2 years ago
Jiarui Fang 33f4412102
[Gemini] use MemStats to store the tracing data. Seperate it from Collector. (#2084) 2 years ago
Jiarui Fang b3b89865e2
[Gemini] ParamOpHook -> ColoParamOpHook (#2080) 2 years ago
HELSON a1ce02d740
[zero] test gradient accumulation (#1964) 2 years ago
Jiarui Fang cc0ed7cf33
[Gemini] ZeROHookV2 -> GeminiZeROHook (#1972) 2 years ago
Jiarui Fang c4739a725a
[Gemini] polish memstats collector (#1962) 2 years ago
Jiarui Fang f7e276fa71
[Gemini] add GeminiAdamOptimizer (#1960) 2 years ago
HELSON 7066dfbf82
[zero] fix memory leak for zero2 (#1955) 2 years ago
HELSON 6e51d296f0
[zero] migrate zero1&2 (#1878) 2 years ago
Zihao 20e255d4e8
MemStatsCollectorStatic (#1765) 2 years ago
HELSON c6a1a62636
[hotfix] fix zero's incompatibility with checkpoint in torch-1.12 (#1786) 2 years ago
CsRic ea961d8fd1 [NFC] polish colossalai/zero/sharded_param/__init__.py code style (#1717) 2 years ago
HELSON 1468e4bcfc
[zero] add constant placement policy (#1705) 2 years ago
HELSON b28991dd0a
[feature] A new ZeRO implementation (#1644) 2 years ago
Jiarui Fang c5d39215f6
Revert "[feature] new zero implementation (#1623)" (#1643) 2 years ago
HELSON 5be118f405
[feature] new zero implementation (#1623) 2 years ago
HELSON f7f2248771
[moe] fix MoE bugs (#1628) 2 years ago
ver217 c9e8ce67b8
fix move fp32 shards (#1604) 2 years ago