ColossalAI

Commit Graph

Author	SHA1	Message	Date
Frank Lee	b03b3ae99c	fixed mem monitor device (#433 ) fixed mem monitor device	2022-03-16 15:25:02 +08:00
Frank Lee	14a7094243	fixed fp16 optimizer none grad bug (#432 )	2022-03-16 14:35:46 +08:00
ver217	fce9432f08	sync before creating empty grad	2022-03-16 14:24:09 +08:00
ver217	ea6905a898	free param.grad	2022-03-16 14:24:09 +08:00
ver217	9506a8beb2	use double buffer to handle grad	2022-03-16 14:24:09 +08:00
Frank Lee	0f5f5dd556	fixed gpt attention mask in pipeline (#430 )	2022-03-16 14:23:43 +08:00
Jiarui Fang	f9c762df85	[test] merge zero optim tests (#428 )	2022-03-16 12:22:45 +08:00
Frank Lee	f0d6e2208b	[polish] add license meta to setup.py (#427 )	2022-03-16 12:05:56 +08:00
Jiarui Fang	5d7dc3525b	[hotfix] run cpu adam unittest in pytest (#424 )	2022-03-16 10:39:55 +08:00
Jiarui Fang	54229cd33e	[log] better logging display with rich (#426 ) * better logger using rich * remove deepspeed in zero requirements	2022-03-16 09:51:15 +08:00
HELSON	3f70a2b12f	removed noisy function during evaluation of MoE router (#419 )	2022-03-15 12:06:09 +08:00
Jiarui Fang	adebb3e041	[zero] cuda margin space for OS (#418 )	2022-03-15 12:02:19 +08:00
Jiarui Fang	56bb412e72	[polish] use GLOBAL_MODEL_DATA_TRACER (#417 )	2022-03-15 11:29:46 +08:00
Jiarui Fang	23ba3fc450	[zero] refactory ShardedOptimV2 init method (#416 )	2022-03-15 10:45:55 +08:00
Frank Lee	e79ea44247	[fp16] refactored fp16 optimizer (#392 )	2022-03-15 10:05:38 +08:00
Frank Lee	f8a0e7fb01	Merge pull request #412 from hpcaitech/develop merge develop to main	2022-03-14 22:48:56 +08:00
Jiarui Fang	21dc54e019	[zero] memtracer to record cuda memory usage of model data and overall system (#395 )	2022-03-14 22:05:30 +08:00
Jiarui Fang	a37bf1bc42	[hotfix] rm test_tensor_detector.py (#413 )	2022-03-14 21:39:48 +08:00
Jiarui Fang	370f567e7d	[zero] new interface for ShardedOptimv2 (#406 )	2022-03-14 20:48:41 +08:00
LuGY	a9c27be42e	Added tensor detector (#393 ) * Added tensor detector * Added the - states * Allowed change include_cpu when detect()	2022-03-14 18:01:46 +08:00
Frank Lee	32296cf462	Merge pull request #409 from 1SAA/develop [hotfix] fixed error when no collective communication in CommProfiler	2022-03-14 17:43:45 +08:00
1SAA	907ac4a2dc	fixed error when no collective communication in CommProfiler	2022-03-14 17:21:00 +08:00
Frank Lee	62b08acc72	update hf badge link (#410 )	2022-03-14 17:07:01 +08:00
Frank Lee	2fe68b359a	Merge pull request #403 from ver217/feature/shard-strategy [zero] Add bucket tensor shard strategy	2022-03-14 16:29:28 +08:00
Frank Lee	cf92a779dc	added huggingface badge (#407 )	2022-03-14 16:23:02 +08:00
HELSON	dfd0363f68	polished output format for communication profiler and pcie profiler (#404 ) fixed typing error	2022-03-14 16:07:45 +08:00
ver217	63469c0f91	polish code	2022-03-14 15:48:55 +08:00
ver217	54fd37f0e0	polish unit test	2022-03-14 15:06:02 +08:00
ver217	88804aee49	add bucket tensor shard strategy	2022-03-14 14:48:32 +08:00
Frank Lee	aaead33cfe	Merge pull request #397 from hpcaitech/create-pull-request/patch-sync-submodule [Bot] Synchronize Submodule References	2022-03-14 10:11:06 +08:00
github-actions	6098bc4cce	Automated submodule synchronization	2022-03-14 00:01:12 +00:00
Frank Lee	6937f85004	Merge pull request #402 from oikosohn/oikosohn-patch-1 fix typo in CHANGE_LOG.md	2022-03-13 22:40:04 +08:00
sohn	ff4f5d7231	fix typo in CHANGE_LOG.md - fix typo, `Unifed` -> `Unified` below Added	2022-03-13 23:34:34 +09:00
Frank Lee	fc5101f24c	Merge pull request #401 from hpcaitech/develop	2022-03-13 11:09:17 +08:00
Frank Lee	fc2fd0abe5	Merge pull request #400 from hpcaitech/hotfix/readme fixed broken badge link	2022-03-13 09:12:59 +08:00
Frank Lee	6d3a4f51bf	fixed broken badge link	2022-03-13 09:11:48 +08:00
HELSON	7c079d9c33	[hotfix] fixed bugs in ShardStrategy and PcieProfiler (#394 )	2022-03-11 18:12:46 +08:00
Frank Lee	1e4bf85cdb	fixed bug in activation checkpointing test (#387 )	2022-03-11 15:50:28 +08:00
Jiarui Fang	3af13a2c3e	[zero] polish ShardedOptimV2 unittest (#385 ) * place params on cpu after zero init context * polish code * bucketzed cpu gpu tensor transter * find a bug in sharded optim unittest * add offload unittest for ShardedOptimV2. * polish code and make it more robust	2022-03-11 15:50:28 +08:00
binmakeswell	ce7b2c9ae3	update README and images path (#384 )	2022-03-11 15:50:28 +08:00
ScalableEKNN	2fcd4f38ee	fix format (#379 )	2022-03-11 15:50:28 +08:00
Jiang Zhuo	5a4a3b77d9	fix format (#376 )	2022-03-11 15:50:28 +08:00
lucasliunju	ce886a9062	fix format (#374 )	2022-03-11 15:50:28 +08:00
Frank Lee	526a318032	[unit test] Refactored test cases with component func (#339 ) * refactored test with component func * fixed bug	2022-03-11 15:50:28 +08:00
LuGY	de46450461	Added activation offload (#331 ) * Added activation offload * Fixed the import bug, used the pytest	2022-03-11 15:50:28 +08:00
Jiarui Fang	272ebfb57d	[bug] shard param during initializing the ShardedModelV2 (#381 )	2022-03-11 15:50:28 +08:00
HELSON	8c18eb0998	[profiler] Fixed bugs in CommProfiler and PcieProfiler (#377 )	2022-03-11 15:50:28 +08:00
Jiarui Fang	b5f43acee3	[zero] find miss code (#378 )	2022-03-11 15:50:28 +08:00
Jiarui Fang	6b6002962a	[zero] zero init context collect numel of model (#375 )	2022-03-11 15:50:28 +08:00
HELSON	1ed7c24c02	Added PCIE profiler to dectect data transmission (#373 )	2022-03-11 15:50:28 +08:00

... 7 8 9 10 11 ...

620 Commits (f28c0213769dbb2037cd08123ecbf1c8a3f5114b) All Branches Search

620 Commits (f28c0213769dbb2037cd08123ecbf1c8a3f5114b)

All Branches