ColossalAI

Commit Graph

Author	SHA1	Message	Date
Jiarui Fang	c89c66a858	[Gemini] update API of the chunkmemstatscollector. (#2129 )	2 years ago
Jiarui Fang	2938edf446	[Gemini] update the non model data record method in runtime memory tracer (#2128 )	2 years ago
Jiarui Fang	9214d1fe28	[Gemini] chunk init using runtime visited param order (#2115 )	2 years ago
Jiarui Fang	85efb7ac2e	[Gemini] gemini use the runtime memory tracer (RMT) (#2099 )	2 years ago
Jiarui Fang	1f99205827	[Gemini] remove static tracer (#2083 )	2 years ago
Jiarui Fang	c4739a725a	[Gemini] polish memstats collector (#1962 )	2 years ago
Zihao	20e255d4e8	MemStatsCollectorStatic (#1765 )	2 years ago
Jiarui Fang	c248800359	[kernel] skip tests of flash_attn and triton when they are not available (#1798 )	2 years ago
HELSON	c6a1a62636	[hotfix] fix zero's incompatibility with checkpoint in torch-1.12 (#1786 ) * [hotfix] fix zero's incompatibility with checkpoint in torch-1.12 * [zero] add cpu shard init * [zero] add tiny example test * [colo_tensor] fix bugs for torch-1.11	2 years ago
HELSON	1468e4bcfc	[zero] add constant placement policy (#1705 ) * fixes memory leak when paramter is in fp16 in ZeroDDP init. * bans chunk releasement in CUDA. Only when a chunk is about to offload, it is allowed to release. * adds a constant placement policy. With it, users can allocate a reserved caching memory space for parameters.	2 years ago
HELSON	b28991dd0a	[feature] A new ZeRO implementation (#1644 )	2 years ago
Jiarui Fang	c5d39215f6	Revert "[feature] new zero implementation (#1623 )" (#1643 ) This reverts commit `5be118f405`.	2 years ago
HELSON	5be118f405	[feature] new zero implementation (#1623 )	2 years ago
ver217	d068af81a3	[doc] update rst and docstring (#1351 ) * update rst * add zero docstr * fix docstr * remove fx.tracer.meta_patch * fix docstr * fix docstr * update fx rst * fix fx docstr * remove useless rst	2 years ago
Jiarui Fang	372f791444	[refactor] move chunk and chunkmgr to directory gemini (#1182 )	2 years ago
ver217	54aabb8da4	[gemini] refactor gemini mgr (#1151 ) * refactor gemini mgr * udpate __init__	2 years ago
ver217	7d14b473f0	[gemini] gemini mgr supports "cpu" placement policy (#1118 ) * update gemini mgr * update chunk * add docstr * polish placement policy * update test chunk * update test zero * polish unit test * remove useless unit test	2 years ago
Frank Lee	14e5b11d7f	[zero] fixed api consistency (#1098 )	2 years ago
ver217	1f894e033f	[gemini] zero supports gemini (#1093 ) * add placement policy * add gemini mgr * update mem stats collector * update zero * update zero optim * fix bugs * zero optim monitor os * polish unit test * polish unit test * add assert	2 years ago

19 Commits (ce3c4eca7bc2c5b148dfe5db1ddb702558af4831)