1139 Commits (fix-setup)

Author SHA1 Message Date
Hongxin Liu 422544222f [pipeline] add stage manager (#4093) 1 year ago
Hongxin Liu 5e1a9d48dd [cluster] add process group mesh (#4039) 1 year ago
LuGY d86ddd9b29
[hotfix] fix unsafe async comm in zero (#4404) 1 year ago
flybird1111 458ae331ad
[kernel] updated unittests for coloattention (#4389) 1 year ago
flybird1111 38b792aab2
[coloattention] fix import error (#4380) 1 year ago
flybird1111 25c57b9fb4
[fix] coloattention support flash attention 2 (#4347) 1 year ago
Hongxin Liu 16bf4c0221
[test] remove useless tests (#4359) 1 year ago
LuGY 1a49a5ea00 [zero] support shard optimizer state dict of zero (#4194) 1 year ago
LuGY dd7cc58299 [zero] add state dict for low level zero (#4179) 1 year ago
LuGY c668801d36 [zero] allow passing process group to zero12 (#4153) 1 year ago
LuGY 79cf1b5f33 [zero]support no_sync method for zero1 plugin (#4138) 1 year ago
LuGY c6ab96983a [zero] refactor low level zero for shard evenly (#4030) 1 year ago
Baizhou Zhang c6f6005990
[checkpointio] Sharded Optimizer Checkpoint for Gemini Plugin (#4302) 1 year ago
Hongxin Liu fc5cef2c79
[lazy] support init on cuda (#4269) 1 year ago
Cuiqing Li 4b977541a8
[Kernels] added triton-implemented of self attention for colossal-ai (#4241) 1 year ago
Baizhou Zhang 58913441a1
Next commit [checkpointio] Unsharded Optimizer Checkpoint for Gemini Plugin (#4141) 1 year ago
github-actions[bot] c77b3b19be
[format] applied code formatting on changed files in pull request 4152 (#4157) 1 year ago
Frank Lee 1fb0d95df0 [shardformer] made tensor parallelism configurable (#4144) 1 year ago
Frank Lee 74257cb446 [shardformer] refactored some doc and api (#4137) 1 year ago
Frank Lee ae035d305d [shardformer] added embedding gradient check (#4124) 1 year ago
Frank Lee 6a88bae4ec [shardformer] integrate with data parallelism (#4103) 1 year ago
Frank Lee f3b6aaa6b7 [shardformer] supported fused normalization (#4112) 1 year ago
Frank Lee b1c2901530 [shardformer] supported bloom model (#4098) 1 year ago
Kun Lin 8af29ee47a [shardformer] support vision transformer (#4096) 1 year ago
jiangmingyan ac80937138 [shardformer] shardformer support opt models (#4091) 1 year ago
Frank Lee d33a44e8c3 [shardformer] refactored layernorm (#4086) 1 year ago
Frank Lee c4b1b65931 [test] fixed tests failed due to dtensor change (#4082) 1 year ago
FoolPlayer 92f6791095 [shardformer] Add layernorm (#4072) 1 year ago
Frank Lee 70c58cfd4f [shardformer] supported fused qkv checkpoint (#4073) 1 year ago
FoolPlayer 0803a61412 [shardformer] add linearconv1d test (#4067) 1 year ago
Frank Lee 8eb09a4c69 [shardformer] support module saving and loading (#4062) 1 year ago
FoolPlayer 7740c55c55 support kit use for bert/gpt test (#4055) 1 year ago
Frank Lee f22ddacef0 [shardformer] refactored the shardformer layer structure (#4053) 1 year ago
Frank Lee 58df720570 [shardformer] adapted T5 and LLaMa test to use kit (#4049) 1 year ago
FoolPlayer 4021b9a8a2 [shardformer] add gpt2 test and layer class refactor (#4041) 1 year ago
Frank Lee d857f3dbba [shardformer] supported T5 and its variants (#4045) 1 year ago
Frank Lee c1d5453e9f [shardformer] adapted llama to the new API (#4036) 1 year ago
FoolPlayer 74d176c8d8 [shardformer] fix bert and gpt downstream with new api (#4024) 1 year ago
FoolPlayer 507c0ad368 add vocabembedding layer 1 year ago
Frank Lee 3893fa1a8d [shardformer] refactored embedding and dropout to parallel module (#4013) 1 year ago
FoolPlayer dfca9678fa integrate with dist layer (#4011) 1 year ago
Frank Lee 015af592f8 [shardformer] integrated linear 1D with dtensor (#3996) 1 year ago
Frank Lee 611971248c [device] support init device mesh from process group (#3990) 1 year ago
FoolPlayer f7774ec0f3 [Shardformer] Downstream bert (#3979) 1 year ago
wukong1992 c1c672d0f0 [shardformer] shardformer support t5 model (#3994) 1 year ago
wukong1992 6b30dfb7ce [shardformer] support llama model using shardformer (#3969) 1 year ago
FoolPlayer a73130482d [shardformer] Unit test (#3928) 1 year ago
FoolPlayer f1cb5ac6bf [shardformer] Align bert value (#3907) 1 year ago
Baizhou Zhang 0bb0b481b4 [gemini] fix argument naming during chunk configuration searching 1 year ago
github-actions[bot] a52f62082d
[format] applied code formatting on changed files in pull request 4021 (#4022) 1 year ago