ColossalAI

Commit Graph

Author	SHA1	Message	Date
YuliangLiu0306	32a45cd7ef	[pipelinable]use pipelinable to support GPT model. (#903 ) * [CLI] add CLI launcher * Revert "[CLI] add CLI launcher" This reverts commit `df7e6506d4`. * [pipelinable]use pipelinable to support GPT model. * fix a bug caused by ShardedModel * polish * fix front func list	3 years ago
Frank Lee	11f54c7b6b	[doc] improved docstring and assertion messages for the engine module (#871 )	3 years ago
Jiarui Fang	681addb512	[refactor] moving grad acc logic to engine (#804 )	3 years ago
Jiarui Fang	4d9332b4c5	[refactor] moving memtracer to gemini (#801 )	3 years ago
HELSON	84c6700b2a	[zero] refactor memstats_collector (#746 )	3 years ago
Jiarui Fang	4d90a7b513	[refactor] zero directory (#724 )	3 years ago
Jiarui Fang	193dc8dacb	[refactor] refactor the memory utils (#715 )	3 years ago
HELSON	ee112fe1da	[zero] adapt zero hooks for unsharded module (#699 )	3 years ago
ver217	3c9cd5bb5e	[zero] stateful tensor manager (#687 ) * [WIP] stateful tensor manager * add eviction strategy * polish code * polish code * polish comment * add unit test * fix sampler bug * polish code * fix max sampling cnt resetting bug * fix sampler bug * polish code * fix bug * fix unit test Co-authored-by: jiaruifang <fangjiarui123@gmail.com>	3 years ago
YuliangLiu0306	0ed7042f42	[pipeline] refactor pipeline (#679 ) * refactor pipeline---put runtime schedule into engine. * add type hint for schedule Optional[BaseSchedule] * preprocess schedule during engine initializing * infer pipeline schedule params from config	3 years ago
RichardoLuo	ad1e7ab2b2	'[NFC] polish <colossalai/engine/_base_engine.py> code style' (#631 ) Co-authored-by: RichardoLuo <14049555596@qq.com>	3 years ago
doubleHU	f2da21a827	fix format (#586 )	3 years ago
fanjinfucool	ffad81e1d1	fix format (#585 ) Co-authored-by: fanjifu <FAN>	3 years ago
Maruyama_Aya	d2dc6049b5	fix format (#580 )	3 years ago
yuxuan-lou	cfb41297ff	'fix/format' (#573 )	3 years ago
YuliangLiu0306	ade05a5d83	[refactor] pipeline, put runtime schedule into engine. (#627 )	3 years ago
Jiarui Fang	e956d93ac2	[refactor] memory utils (#577 )	3 years ago
HELSON	e6d50ec107	[zero] adapt zero for unsharded parameters (#561 ) * support existing sharded and unsharded parameters in zero * add unitest for moe-zero model init * polish moe gradient handler	3 years ago
Jiarui Fang	7675366fce	[polish] rename col_attr -> colo_attr (#558 )	3 years ago
ver217	014bac0c49	[zero] hijack p.grad in sharded model (#554 ) * hijack p.grad in sharded model * polish comments * polish comments	3 years ago
Jiarui Fang	f552b11294	[zero] label state for param fp16 and grad (#551 )	3 years ago
Jiarui Fang	214da761d4	[zero] add stateful tensor (#549 )	3 years ago
Liang Bowen	ec5086c49c	Refactored docstring to google style	3 years ago
Jie Zhu	73d36618a6	[profiler] add MemProfiler (#356 ) * add memory trainer hook * fix bug * add memory trainer hook * fix import bug * fix import bug * add trainer hook * fix #370 git log bug * modify `to_tensorboard` function to support better output * remove useless output * change the name of `MemProfiler` * complete memory profiler * replace error with warning * finish trainer hook * modify interface of MemProfiler * modify `__init__.py` in profiler * remove unnecessary pass statement * add usage to doc string * add usage to trainer hook * new location to store temp data file	3 years ago
HELSON	a30e2b4c24	[zero] adapt for no-leaf module in zero (#535 ) only process module's own parameters in Zero context add zero hooks for all modules that contrain parameters gather parameters only belonging to module itself	3 years ago
Jiarui Fang	705f56107c	[zero] refactor model data tracing (#537 )	3 years ago
Jiarui Fang	4d322b79da	[refactor] remove old zero code (#517 )	3 years ago
Jiarui Fang	920c5889a7	[zero] add colo move inline (#521 )	3 years ago
Jiarui Fang	a445e118cf	[polish] polish singleton and global context (#500 )	3 years ago
Jiarui Fang	b334822163	[zero] polish sharded param name (#484 ) * [zero] polish sharded param name * polish code * polish * polish code * polish * polsih * polish	3 years ago
Jiarui Fang	65c0f380c2	[format] polish name format for MOE (#481 )	3 years ago
ver217	8d3250d74b	[zero] ZeRO supports pipeline parallel (#477 )	3 years ago
HELSON	aff9d354f7	[MOE] polish moe_env (#467 )	3 years ago
HELSON	84fd7c1d4d	add moe context, moe utilities and refactor gradient handler (#455 )	3 years ago
ver217	a241f61b34	[zero] Update initialize for ZeRO (#458 ) * polish code * shard strategy receive pg in shard() / gather() * update zero engine * polish code	3 years ago
ver217	9506a8beb2	use double buffer to handle grad	3 years ago
Jiarui Fang	56bb412e72	[polish] use GLOBAL_MODEL_DATA_TRACER (#417 )	3 years ago
Jiarui Fang	21dc54e019	[zero] memtracer to record cuda memory usage of model data and overall system (#395 )	3 years ago
ver217	88804aee49	add bucket tensor shard strategy	3 years ago
Xu Kai	54ee8d1254	Fix/format colossalai/engine/paramhooks/(#350 )	3 years ago
yuxuan-lou	3b88eb2259	Flake8 code restyle	3 years ago
Jiarui Fang	44e4891f57	[zero] able to place params on cpu after zero init context (#365 ) * place params on cpu after zero init context * polish code	3 years ago
Jiarui Fang	10e2826426	move async memory to an individual directory (#345 )	3 years ago
Frank Lee	6a3188167c	set criterion as optional in colossalai initialize (#336 )	3 years ago
Jie Zhu	3213554cc2	[profiler] add adaptive sampling to memory profiler (#330 ) * fix merge conflict modify unit test remove unnessesary log info reformat file * remove unused module * remove unnecessary sync function * change doc string style from Google to Sphinx	3 years ago
ver217	1388671699	[zero] Update sharded model v2 using sharded param v2 (#323 )	3 years ago
Jiarui Fang	11bddb6e55	[zero] update zero context init with the updated test utils (#327 )	3 years ago
ver217	36f9a74ab2	fix sharded param hook and unit test	3 years ago
ver217	001ca624dd	impl shard optim v2 and add unit test	3 years ago
Jie Zhu	d344689274	[profiler] primary memory tracer	3 years ago

1 2

69 Commits (8ffdc3837679eee56000ec96e0bc46de9d3fff35)