ColossalAI

Commit Graph

Author	SHA1	Message	Date
Jiarui Fang	a590ed0ba3	[zero] improve the accuracy of get_memory_usage of sharded param (#538 )	2022-03-28 16:19:19 +08:00
Jiarui Fang	37cb70feec	[zero] get memory usage for sharded param (#536 )	2022-03-28 15:01:21 +08:00
Frank Lee	3601b2bad0	[test] fixed rerun_on_exception and adapted test cases (#487 )	2022-03-25 17:25:12 +08:00
Jiarui Fang	b334822163	[zero] polish sharded param name (#484 ) * [zero] polish sharded param name * polish code * polish * polish code * polish * polsih * polish	2022-03-22 14:36:16 +08:00
ver217	a241f61b34	[zero] Update initialize for ZeRO (#458 ) * polish code * shard strategy receive pg in shard() / gather() * update zero engine * polish code	2022-03-18 16:18:31 +08:00
Frank Lee	f27d801a13	[test] optimized zero data parallel test (#452 )	2022-03-18 11:35:54 +08:00
ver217	54fd37f0e0	polish unit test	2022-03-14 15:06:02 +08:00
Frank Lee	526a318032	[unit test] Refactored test cases with component func (#339 ) * refactored test with component func * fixed bug	2022-03-11 15:50:28 +08:00
ver217	1388671699	[zero] Update sharded model v2 using sharded param v2 (#323 )	2022-03-11 15:50:28 +08:00
jiaruifang	799d105bb4	using pytest parametrize	2022-03-11 15:50:28 +08:00
Jiarui Fang	11bddb6e55	[zero] update zero context init with the updated test utils (#327 )	2022-03-11 15:50:28 +08:00
Jiarui Fang	90d3aef62c	[zero] yet an improved sharded param (#311 )	2022-03-11 15:50:28 +08:00
Jiarui Fang	c9e7d9582d	[zero] polish shard strategy (#310 ) * init shard param from shape tuple * add more unitest for shard param * add set_payload method for ShardedParam * [zero] add shareded tensor class * polish code * add shard stratgy * move shard and gather logic to shard strategy from shard tensor. * polish code	2022-03-11 15:50:28 +08:00
Jiarui Fang	74f77e314b	[zero] a shard strategy in granularity of tensor (#307 )	2022-03-11 15:50:28 +08:00
Jiarui Fang	80364c7686	[zero] sharded tensor (#305 ) * init shard param from shape tuple * add more unitest for shard param * add set_payload method for ShardedParam * [zero] add shareded tensor class * polish code	2022-03-11 15:50:28 +08:00
Jiarui Fang	e17e92c54d	Polish sharded parameter (#297 ) * init shard param from shape tuple * add more unitest for shard param * add more unittests to shareded param	2022-03-11 15:50:28 +08:00
Jiarui Fang	5a560a060a	Feature/zero (#279 ) * add zero1 (#209) * add zero1 * add test zero1 * update zero stage 1 develop (#212) * Implement naive zero3 (#240) * naive zero3 works well * add zero3 param manager * add TODOs in comments * add gather full param ctx * fix sub module streams * add offload * fix bugs of hook and add unit tests * fix bugs of hook and add unit tests (#252) * add gather full param ctx * fix sub module streams * add offload * fix bugs of hook and add unit tests * polish code and add state dict hook * fix bug * update unit test * refactor reconstructed zero code * clip_grad support zero3 and add unit test * add unit test for Zero3ParameterManager * [WIP] initialize the shard param class * [WIP] Yet another sharded model implementation (#274) * [WIP] initialize the shard param class * [WIP] Yes another implementation of shardModel. Using a better hook method. * torch.concat -> torch.cat * fix test_zero_level_1.py::test_zero_level_1 unitest * remove deepspeed implementation and refactor for the reconstructed zero module * polish zero dp unittests Co-authored-by: ver217 <lhx0217@gmail.com> Co-authored-by: Frank Lee <somerlee.9@gmail.com>	2022-03-11 15:50:28 +08:00

17 Commits (8c90d4df545957fca06d0ef8201ba9f0a40d06b7)