ColossalAI

Commit Graph

Author	SHA1	Message	Date
Hongxin Liu	d921ce8391	[shardformer] support inplace sharding (#4251 ) * [shardformer] embedding support inplace sharding * [shardformer] linear support inplace sharding * [shardformer] layernorm support inplace sharding * [shardformer] qkv support inplace sharding * [test] update shardformer layer test * [shardformer] fix shared param sharding * [shardformer] fix bert policy * [shardformer] fix bloom policy * [shardformer] fix llama policy * [shardformer] fix opt policy * [shardformer] fix t5 policy * [shardformer] fix fused qkv linear * [shardformer] fix bugs * force sync * [test] fix bugs * [test] fix transformer version	2023-08-15 23:25:14 +08:00
Baizhou Zhang	208ac8f2ba	[pipeline] Add Pipeline Forward for GPT2Model Shardformer (#4224 ) * * fix typehint & docstring in sharder.py * * update pipeline forward for GPT2Model * * add test for pipeline forward of GPT2Model * * add cache cleaning in gpt2 test * * change assert to raise command	2023-08-15 23:25:14 +08:00
Jianghai	1094e0f0d3	[pipeline] Bert pipeline for shardformer and its tests (#4197 ) * add pipeline forward * complete pipeline forward check * fix bert forward without pipeline * fix comments * discard useless line * add todo * clean prints * fix distribute layers	2023-08-15 23:25:14 +08:00
Hongxin Liu	890774b2fb	[shardformer] support lazy init (#4202 ) * [shardformer] support lazy init * [shardformer] linear support lazy init * [shardformer] embedding support lazy init * [shardformer] norm support lazy init * [shardformer] fused linear support lazy init * [test] update shardformer test layer * [test] shardformer with lazy init fit ddp * [lazy] hotfix deepcopy of param * [shardformer] fix bert policy and update test * [shardformer] fix bloom policy and update test * [shardformer] fix opt policy and update test * [shardformer] fix t5 policy and update test * [shardformer] fix gpt2 policy and update test * [shardformer] fix llama policy and update test	2023-08-15 23:25:14 +08:00
ver217	d35bd7d0e6	[shardformer] fix type hint	2023-08-15 23:25:14 +08:00
ver217	1ed3f8a24f	[shardformer] rename policy file name	2023-08-15 23:25:14 +08:00
ver217	b0b8ad2823	[pipeline] update shardformer docstring	2023-08-15 23:25:14 +08:00
ver217	59f6f573f1	[pipeline] update shardformer policy	2023-08-15 23:25:14 +08:00
digger yu	2ac24040eb	fix some typo colossalai/shardformer (#4160 )	2023-07-04 17:53:39 +08:00
Frank Lee	1fb0d95df0	[shardformer] made tensor parallelism configurable (#4144 ) * [shardformer] made tensor parallelism configurable * polish code	2023-07-04 16:05:01 +08:00
Frank Lee	74257cb446	[shardformer] refactored some doc and api (#4137 ) * [shardformer] refactored some doc and api * polish code	2023-07-04 16:05:01 +08:00
jiangmingyan	7f9b30335b	[shardformer] write an shardformer example with bert finetuning (#4126 ) * [shardformer] add benchmark of shardformer * [shardformer] add benchmark of shardformer	2023-07-04 16:05:01 +08:00
Frank Lee	ae035d305d	[shardformer] added embedding gradient check (#4124 )	2023-07-04 16:05:01 +08:00
Frank Lee	44a190e6ac	[shardformer] import huggingface implicitly (#4101 )	2023-07-04 16:05:01 +08:00
Frank Lee	6a88bae4ec	[shardformer] integrate with data parallelism (#4103 )	2023-07-04 16:05:01 +08:00
Frank Lee	f3b6aaa6b7	[shardformer] supported fused normalization (#4112 )	2023-07-04 16:05:01 +08:00
Frank Lee	b1c2901530	[shardformer] supported bloom model (#4098 )	2023-07-04 16:05:01 +08:00
FoolPlayer	92f6791095	[shardformer] Add layernorm (#4072 ) * add layernorm to bert * add layernorm test * add layernorm test with load state dict * add use_mixedfusedLN in shard config * refactor policy to support fused_layernorm	2023-07-04 16:05:01 +08:00
Frank Lee	f22ddacef0	[shardformer] refactored the shardformer layer structure (#4053 )	2023-07-04 16:05:01 +08:00
Frank Lee	58df720570	[shardformer] adapted T5 and LLaMa test to use kit (#4049 ) * [shardformer] adapted T5 and LLaMa test to use kit * polish code	2023-07-04 16:05:01 +08:00
Frank Lee	d857f3dbba	[shardformer] supported T5 and its variants (#4045 )	2023-07-04 16:05:01 +08:00
Frank Lee	c1d5453e9f	[shardformer] adapted llama to the new API (#4036 )	2023-07-04 16:05:01 +08:00
FoolPlayer	74d176c8d8	[shardformer] fix bert and gpt downstream with new api (#4024 ) * fix bert downstream with new api * remove comment line	2023-07-04 16:05:01 +08:00
FoolPlayer	df018fc305	support bert with new api	2023-07-04 16:05:01 +08:00
FoolPlayer	dfca9678fa	integrate with dist layer (#4011 )	2023-07-04 16:05:01 +08:00
FoolPlayer	d3bc530849	[shardformer] Refactor shardformer api (#4001 ) * fix an error in readme * simplify code * refactor shardformer * add todo * remove slicer * resolve code review	2023-07-04 16:05:01 +08:00
FoolPlayer	f7774ec0f3	[Shardformer] Downstream bert (#3979 ) * add dist dropout in model * update docstring and bert policy with dropout * refactor basepolicy and sharded, update bert * update format * update gpt2 policy * update bert policy * remove unused code * update readme for new policy usage * add downstream model of bert * remove unused code	2023-07-04 16:05:01 +08:00
wukong1992	c1c672d0f0	[shardformer] shardformer support t5 model (#3994 ) test t5	2023-07-04 16:05:01 +08:00
FoolPlayer	45927d5527	[shardformer] Add dropout layer in shard model and refactor policy api (#3949 ) * add dist dropout in model * update docstring and bert policy with dropout * refactor basepolicy and sharded, update bert * update format * update gpt2 policy * update bert policy * remove unused code * update readme for new policy usage	2023-07-04 16:05:01 +08:00
FoolPlayer	a73130482d	[shardformer] Unit test (#3928 ) * fix bug in slicer, add slicer unit test * add dropout test * use pid as dropout seed * updata dropout test with local pattern * ad todo	2023-07-04 16:05:01 +08:00
FoolPlayer	f1cb5ac6bf	[shardformer] Align bert value (#3907 ) * add bert align test, fix dist loss bug * forward and backward align * add ignore index * add shardformer CI * add gather_output optional for user in shardconfig * update readme with optional gather_ouput * add dist crossentropy loss test, remove unused files * remove unused file * remove unused file * rename the file * polish code	2023-07-04 16:05:01 +08:00
FoolPlayer	79f8d5d54b	[shardformer] add gpt2 policy and modify shard and slicer to support (#3883 ) * add gpt2 policy and modify shard and slicer to support * remove unused code * polish code	2023-07-04 16:05:01 +08:00
FoolPlayer	ab8a47f830	[shardformer] add Dropout layer support different dropout pattern (#3856 ) * add dropout layer, add dropout test * modify seed manager as context manager * add a copy of col_nn.layer * add dist_crossentropy loss; separate module test * polish the code * fix dist crossentropy loss	2023-07-04 16:05:01 +08:00
Frank Lee	4972e1f40e	[shardformer] refactored the user api (#3828 ) * [shardformer] refactored the user api * polish code	2023-07-04 16:05:01 +08:00
FoolPlayer	8cc11235c0	[shardformer]: Feature/shardformer, add some docstring and readme (#3816 ) * init shardformer code structure * add implement of sharder (inject and replace) * add implement of replace layer to colossal layer * separate different layer policy, add some notion * implement 1d and 2d slicer, can tell col or row * fix bug when slicing and inject model * fix some bug; add inference test example * add share weight and train example * add train * add docstring and readme * add docstring for other files * pre-commit	2023-07-04 16:05:01 +08:00
FoolPlayer	8d68de767d	[shardformer] init shardformer code structure (#3731 ) * init shardformer code structure * add implement of sharder (inject and replace) * add implement of replace layer to colossal layer * separate different layer policy, add some notion * implement 1d and 2d slicer, can tell col or row * fix bug when slicing and inject model * fix some bug; add inference test example	2023-07-04 16:05:01 +08:00
Frank Lee	ddcf58cacf	Revert "[sync] sync feature/shardformer with develop"	2023-06-09 09:41:27 +08:00
FoolPlayer	ef1537759c	[shardformer] add gpt2 policy and modify shard and slicer to support (#3883 ) * add gpt2 policy and modify shard and slicer to support * remove unused code * polish code	2023-06-08 15:01:34 +08:00
FoolPlayer	21a3915c98	[shardformer] add Dropout layer support different dropout pattern (#3856 ) * add dropout layer, add dropout test * modify seed manager as context manager * add a copy of col_nn.layer * add dist_crossentropy loss; separate module test * polish the code * fix dist crossentropy loss	2023-06-08 15:01:34 +08:00
Frank Lee	537a52b7a2	[shardformer] refactored the user api (#3828 ) * [shardformer] refactored the user api * polish code	2023-06-08 15:01:34 +08:00
FoolPlayer	58f6432416	[shardformer]: Feature/shardformer, add some docstring and readme (#3816 ) * init shardformer code structure * add implement of sharder (inject and replace) * add implement of replace layer to colossal layer * separate different layer policy, add some notion * implement 1d and 2d slicer, can tell col or row * fix bug when slicing and inject model * fix some bug; add inference test example * add share weight and train example * add train * add docstring and readme * add docstring for other files * pre-commit	2023-06-08 15:01:34 +08:00
FoolPlayer	6a69b44dfc	[shardformer] init shardformer code structure (#3731 ) * init shardformer code structure * add implement of sharder (inject and replace) * add implement of replace layer to colossal layer * separate different layer policy, add some notion * implement 1d and 2d slicer, can tell col or row * fix bug when slicing and inject model * fix some bug; add inference test example	2023-06-08 15:01:34 +08:00

42 Commits (b3f5d7a3ba01fdd015866162608348fe480f1d55)