ColossalAI

Commit Graph

Author	SHA1	Message	Date
LuGY	105c5301c3	[zero]added hybrid adam, removed loss scale in adam (#527 ) * [zero]added hybrid adam, removed loss scale of adam * remove useless code	3 years ago
Jiarui Fang	8d8c5407c0	[zero] refactor model data tracing (#522 )	3 years ago
Frank Lee	3601b2bad0	[test] fixed rerun_on_exception and adapted test cases (#487 )	3 years ago
Jiarui Fang	4d322b79da	[refactor] remove old zero code (#517 )	3 years ago
LuGY	6a3f9fda83	[cuda] modify the fused adam, support hybrid of fp16 and fp32 (#497 )	3 years ago
Jiarui Fang	920c5889a7	[zero] add colo move inline (#521 )	3 years ago
ver217	7be397ca9c	[log] polish disable_existing_loggers (#519 )	3 years ago
Jiarui Fang	0bebda6ea5	[zero] fix init device bug in zero init context unittest (#516 )	3 years ago
fastalgo	a513164379	Update README.md (#514 )	3 years ago
Jiarui Fang	7ef3507ace	[zero] show model data cuda memory usage after zero context init. (#515 )	3 years ago
ver217	a2e61d61d4	[zero] zero init ctx enable rm_torch_payload_on_the_fly (#512 ) * enable rm_torch_payload_on_the_fly * polish docstr	3 years ago
Jiarui Fang	81145208d1	[install] run with out rich (#513 )	3 years ago
HELSON	0f2d219162	[MOE] add MOEGPT model (#510 )	3 years ago
Jiarui Fang	bca0c49a9d	[zero] use colo model data api in optimv2 (#511 )	3 years ago
Jiarui Fang	9330be0f3c	[memory] set cuda mem frac (#506 )	3 years ago
Frank Lee	97933b6710	[devops] recover tsinghua pip source due to proxy issue (#509 )	3 years ago
Jiarui Fang	0035b7be07	[memory] add model data tensor moving api (#503 )	3 years ago
Frank Lee	65ad47c35c	[devops] remove tsinghua source for pip (#507 )	3 years ago
Frank Lee	44f7bcb277	[devops] remove tsinghua source for pip (#505 )	3 years ago
binmakeswell	af56c1d024	fix discussion button in issue template (#504 )	3 years ago
Jiarui Fang	a445e118cf	[polish] polish singleton and global context (#500 )	3 years ago
ver217	9ec1ce6ab1	[zero] sharded model support the reuse of fp16 shard (#495 ) * sharded model supports reuse fp16 shard * rename variable * polish code * polish code * polish code	3 years ago
HELSON	f24b5ed201	[MOE] remove old MoE legacy (#493 )	3 years ago
ver217	c4c02424f3	[zero] sharded model manages ophooks individually (#492 )	3 years ago
HELSON	c9023d4078	[MOE] support PR-MOE (#488 )	3 years ago
ver217	a9ecb4b244	[zero] polish sharded optimizer v2 (#490 )	3 years ago
ver217	62b0a8d644	[zero] sharded optim support hybrid cpu adam (#486 ) * sharded optim support hybrid cpu adam * update unit test * polish docstring	3 years ago
Jiarui Fang	b334822163	[zero] polish sharded param name (#484 ) * [zero] polish sharded param name * polish code * polish * polish code * polish * polsih * polish	3 years ago
ver217	9caa8b6481	docs get correct release version (#489 )	3 years ago
HELSON	d7ea63992b	[MOE] add FP32LinearGate for MOE in NaiveAMP context (#480 )	3 years ago
github-actions[bot]	353566c198	Automated submodule synchronization (#483 ) Co-authored-by: github-actions <github-actions@github.com>	3 years ago
Jiarui Fang	65c0f380c2	[format] polish name format for MOE (#481 )	3 years ago
ver217	8d3250d74b	[zero] ZeRO supports pipeline parallel (#477 )	3 years ago
Sze-qq	7f5e4592eb	Update Experiment result about Colossal-AI with ZeRO (#479 ) * [readme] add experimental visualisation regarding ColossalAI with ZeRO (#476) * Hotfix/readme (#478) * add experimental visualisation regarding ColossalAI with ZeRO * adjust newly-added figure size	3 years ago
Frank Lee	83a847d058	[test] added rerun on exception for testing (#475 ) * [test] added rerun on exception function * polish code	3 years ago
ver217	d70f43dd7a	embedding remove attn mask (#474 )	3 years ago
HELSON	7544347145	[MOE] add unitest for MOE experts layout, gradient handler and kernel (#469 )	3 years ago
ver217	1559c0df41	fix attn mask shape of gpt (#472 )	3 years ago
ver217	3cb3fc275e	zero init ctx receives a dp process group (#471 )	3 years ago
ver217	7e30068a22	[doc] update rst (#470 ) * update rst * remove empty rst	3 years ago
HELSON	aff9d354f7	[MOE] polish moe_env (#467 )	3 years ago
HELSON	bccbc15861	[MOE] changed parallelmode to dist process group (#460 )	3 years ago
Frank Lee	8f9617c313	[release] update version (#465 )	3 years ago
Frank Lee	2963565ff8	[test] fixed release workflow step (#464 )	3 years ago
Frank Lee	292590e0fa	[test] fixed release workflow condition (#463 )	3 years ago
Frank Lee	90bd97b9c0	[devops] fixed workflow bug (#462 )	3 years ago
ver217	304263c2ce	fix gpt attention mask (#461 )	3 years ago
ver217	fc8e6db005	[doc] Update docstring for ZeRO (#459 ) * polish sharded model docstr * polish sharded optim docstr * polish zero docstr * polish shard strategy docstr	3 years ago
HELSON	84fd7c1d4d	add moe context, moe utilities and refactor gradient handler (#455 )	3 years ago
Frank Lee	af185b5519	[test] fixed amp convergence comparison test (#454 )	3 years ago

... 55 56 57 58 59 ...

3092 Commits (633e95b301336c4c237537f584882b3d8e5f4145) All Branches Search

3092 Commits (633e95b301336c4c237537f584882b3d8e5f4145)

All Branches