ColossalAI

Commit Graph

Author	SHA1	Message	Date
YuliangLiu0306	2eca4cd376	[DTensor] refactor dtensor with new components (#3089 ) * [DTensor] refactor dtensor with new components * polish	2 years ago
YuliangLiu0306	8e4e8601b7	[DTensor] implement layout converter (#3055 ) * [DTensor] refactor LayoutConverter for DTensor * polish code * polish docstring	2 years ago
YuliangLiu0306	29386a54e6	[DTensor] refactor CommSpec (#3034 )	2 years ago
YuliangLiu0306	4269196c79	[hotfix] skip auto checkpointing tests (#3029 ) * [hotfix] skip auto checkpointing tests * fix test name issue	2 years ago
YuliangLiu0306	cd2b0eaa8d	[DTensor] refactor sharding spec (#2987 ) * [autoparallel] refactor sharding spec * rename function name	2 years ago
YuliangLiu0306	e414e4092b	[DTensor] implementation of dtensor (#2946 ) * [DTensor] implementation of dtensor * test layout convert * polish	2 years ago
HELSON	707b11d4a0	[gemini] update ddp strict mode (#2518 ) * [zero] add strict ddp mode for chunk init * [gemini] update gpt example	2 years ago
HELSON	2d1a7dfe5f	[zero] add strict ddp mode (#2508 ) * [zero] add strict ddp mode * [polish] add comments for strict ddp mode * [zero] fix test error	2 years ago
HELSON	d565a24849	[zero] add unit testings for hybrid parallelism (#2486 )	2 years ago
HELSON	ea13a201bb	[polish] polish code for get_static_torch_model (#2405 ) * [gemini] polish code * [testing] remove code * [gemini] make more robust	2 years ago
HELSON	a3100bd50d	[testing] add beit model for unit testings (#2196 ) * [testing] add beit model * [beit] fix bugs * [beit] fix bugs * [testing] fix bugs	2 years ago
Jiarui Fang	1f99205827	[Gemini] remove static tracer (#2083 )	2 years ago
Jiarui Fang	2e9cbfca12	[Gemini] add unitests to check gemini correctness (#2015 )	2 years ago
Genghan Zhang	d655eea515	[autoparallel] mix gather (#1977 ) * Add mix-gather * Add comments * Add comments * Polish comments * Change the global rank assumption * Add tests * Add two-step tests * Fix 10 and 01 * Skip test becasue the number of GPUs	2 years ago
Jiarui Fang	f7e276fa71	[Gemini] add GeminiAdamOptimizer (#1960 )	2 years ago
Jiarui Fang	52c6ad26e0	[ColoTensor] reconfig ColoInitContext, decouple default_pg and default_dist_spec. (#1953 )	2 years ago
Jiarui Fang	9f4fb3f28a	[ColoTensor] ColoInitContext initialize parameters in shard mode. (#1937 )	2 years ago
Jiarui Fang	3ce4463fe6	[utils] remove lazy_memory_allocate from ColoInitContext (#1844 )	2 years ago
YuliangLiu0306	980ed21723	[autoparallel] shard param and buffer as expected (#1753 ) * [autoparallel] shard param and buffer as expected * fix unit test issue	2 years ago
Frank Lee	eee84908d4	[autoparallel] handled illegal sharding strategy (#1728 ) * [autoparallel] handled illegal sharding strategy * polish code	2 years ago
HELSON	f69f9bf223	[zero] add chunk init function for users (#1729 ) * add chunk manager init function * fix unit tests * add comment * add flush=True	2 years ago
HELSON	b28991dd0a	[feature] A new ZeRO implementation (#1644 )	2 years ago
YuliangLiu0306	3f068d1409	[autoparallel] update CommSpec (#1667 )	2 years ago
Frank Lee	154d3ef432	[fix] fixed the collective pattern name for consistency (#1649 ) * [fix] fixed the collective pattern name for consistency * polish code	2 years ago
Jiarui Fang	c5d39215f6	Revert "[feature] new zero implementation (#1623 )" (#1643 ) This reverts commit `5be118f405`.	2 years ago
HELSON	5be118f405	[feature] new zero implementation (#1623 )	2 years ago
YuliangLiu0306	702dbc5288	[tensor] use communication autograd func (#1617 ) * [tensor] use communication autograd func * change all to all comm spec info * rename pattern and distinguish fwd/bwd * polish code	2 years ago
YuliangLiu0306	4b03c25f85	[tensor]add 1D device mesh (#1492 )	2 years ago
YuliangLiu0306	b73fb7a077	[tensor] support runtime ShardingSpec apply (#1453 ) * [tensor] support runtime ShardingSpec apply * polish code * polish code	2 years ago
YuliangLiu0306	0f3042363c	[tensor] shape consistency generate transform path and communication cost (#1435 ) * [tensor] shape consistency output transform path and communication cost * polish code	2 years ago
Frank Lee	ae1b58cd16	[tensor] added linear implementation for the new sharding spec (#1416 ) * [tensor] added linear implementation for the new sharding spec * polish code	2 years ago
Jiarui Fang	89c434a0a6	[polish] add test_ops directory (#1431 )	2 years ago
Jiarui Fang	10b3df65c8	[FAW] move coloparam setting in test code. (#1429 )	2 years ago
Jiarui Fang	cb98cf5558	[FAW] parallel FreqAwareEmbedding (#1424 )	2 years ago
YuliangLiu0306	33f0744d51	[tensor] add shape consistency feature to support auto spec transform (#1418 ) * [tensor] add shape consistency feature to supportauto sharding spec transform. * [tensor] remove unused argument in simulator, add doc string for target pair.	2 years ago
Jiarui Fang	d209aff684	Add FreqAwareEmbeddingBag (#1421 )	2 years ago
Jiarui Fang	504419d261	[FAW] add cache manager for the cached embedding (#1419 )	2 years ago
YuliangLiu0306	7c96055c68	[tensor]build sharding spec to replace distspec in future. (#1405 )	2 years ago
HELSON	87775a0682	[colotensor] use cpu memory to store state_dict (#1367 )	2 years ago
HELSON	4417804129	[unit test] add megatron init test in zero_optim (#1358 )	2 years ago
HELSON	7a065dc9f6	[hotfix] fix megatron_init in test_gpt2.py (#1357 )	2 years ago
HELSON	7a8702c06d	[colotensor] add Tensor.view op and its unit test (#1343 ) [colotensor] add megatron initialization for gpt2	2 years ago
HELSON	bf5066fba7	[refactor] refactor ColoTensor's unit tests (#1340 )	2 years ago
ver217	0c51ff2c13	[hotfix] ZeroDDP use new process group (#1333 ) * process group supports getting ranks in group * chunk mgr receives a process group * update unit test * fix unit tests	2 years ago
HELSON	d49708ae43	[hotfix] fix ddp for unit test test_gpt2 (#1326 )	2 years ago
HELSON	1b41686461	[hotfix] fix unit test test_module_spec (#1321 )	2 years ago
Jiarui Fang	85f933b58b	[Optimizer] Remove useless ColoOptimizer (#1312 )	2 years ago
Jiarui Fang	9f10524313	[Optimizer] polish the init method of ColoOptimizer (#1310 )	2 years ago
HELSON	36086927e1	[hotfix] fix ColoTensor GPT2 unitest (#1309 )	2 years ago
HELSON	260a55804a	[hotfix] fix shape error in backward when using ColoTensor (#1298 )	2 years ago

1 2 3 4

152 Commits (30412866e0c6860de0787cd9c5e9a5bfffdc712c)