955 Commits (39f2582e987871c198f2f2526cd4435cbd569741)

Author SHA1 Message Date
Baizhou Zhang 36e546b2cc [pipeline] add pipeline support for T5Stack/T5EncoderModel (#4300) 1 year ago
Jianghai d8408d185c [pipeline] OPT model pipeline (#4258) 1 year ago
Hongxin Liu d921ce8391 [shardformer] support inplace sharding (#4251) 1 year ago
Baizhou Zhang 2a2eacfaf1 [pipeline] support shardformer for GPT2ForQuestionAnswering & complete pipeline support for GPT2 (#4245) 1 year ago
Jianghai d9be0472ef [bugs] hot fix some testing bugs for new models (#4268) 1 year ago
Jianghai 34f0e34a4c [pipeline] finish bloom models pipeline and tests (#4223) 1 year ago
Jianghai e7cc62d735 [pipeline] All bert models (#4233) 1 year ago
Baizhou Zhang a14d352088 [pipeline] add pipeline forward for variants of gpt2 (#4238) 1 year ago
Baizhou Zhang 208ac8f2ba [pipeline] Add Pipeline Forward for GPT2Model Shardformer (#4224) 1 year ago
Jianghai 37d22f6878 [pipeline] add bloom model pipeline (#4210) 1 year ago
Jianghai 31bcf867ae [pipeline] Llama causal lm and llama for sequence classification pipeline (#4208) 1 year ago
Jianghai 1622031058 [pipeline] Llama pipeline (#4205) 1 year ago
Jianghai 1094e0f0d3 [pipeline] Bert pipeline for shardformer and its tests (#4197) 1 year ago
Hongxin Liu 890774b2fb [shardformer] support lazy init (#4202) 1 year ago
Jianghai f3bcc292c8 [pipeline] move bert related pipeline components to shardformer (#4187) 1 year ago
Jianghai c5ea728016 [pipeline] add bert_for_pretraining bert_lmhead forward and policy (#4172) 1 year ago
ver217 5fc60a3a04 [test] add shard util tests 1 year ago
ver217 2d6cc07feb [test] update shardformer tests 1 year ago
Jianghai 90a65ea682 [pipeline] build bloom model and policy , revise the base class of policy (#4161) 1 year ago
Jianghai c552cefa93 [pipeline]add pipeline policy and bert forward (#4130) 1 year ago
Hongxin Liu 5c897ddb94 [pipeline] add stage manager (#4093) 1 year ago
Jianghai e8e7e49243 [pipeline]add pipeline policy and bert forward (#4130) 1 year ago
Hongxin Liu f51ce1bc8e [pipeline] refactor 1f1b schedule (#4115) 1 year ago
Hongxin Liu 45fdc9b42c [pipeline] implement p2p communication (#4100) 1 year ago
Hongxin Liu 422544222f [pipeline] add stage manager (#4093) 1 year ago
Hongxin Liu 5e1a9d48dd [cluster] add process group mesh (#4039) 1 year ago
LuGY d86ddd9b29
[hotfix] fix unsafe async comm in zero (#4404) 1 year ago
flybird1111 458ae331ad
[kernel] updated unittests for coloattention (#4389) 1 year ago
flybird1111 38b792aab2
[coloattention] fix import error (#4380) 1 year ago
flybird1111 25c57b9fb4
[fix] coloattention support flash attention 2 (#4347) 1 year ago
Hongxin Liu 16bf4c0221
[test] remove useless tests (#4359) 1 year ago
LuGY 1a49a5ea00 [zero] support shard optimizer state dict of zero (#4194) 1 year ago
LuGY dd7cc58299 [zero] add state dict for low level zero (#4179) 1 year ago
LuGY c668801d36 [zero] allow passing process group to zero12 (#4153) 1 year ago
LuGY 79cf1b5f33 [zero]support no_sync method for zero1 plugin (#4138) 1 year ago
LuGY c6ab96983a [zero] refactor low level zero for shard evenly (#4030) 1 year ago
Baizhou Zhang c6f6005990
[checkpointio] Sharded Optimizer Checkpoint for Gemini Plugin (#4302) 1 year ago
Hongxin Liu fc5cef2c79
[lazy] support init on cuda (#4269) 1 year ago
Cuiqing Li 4b977541a8
[Kernels] added triton-implemented of self attention for colossal-ai (#4241) 1 year ago
Baizhou Zhang 58913441a1
Next commit [checkpointio] Unsharded Optimizer Checkpoint for Gemini Plugin (#4141) 1 year ago
github-actions[bot] c77b3b19be
[format] applied code formatting on changed files in pull request 4152 (#4157) 1 year ago
Frank Lee 1fb0d95df0 [shardformer] made tensor parallelism configurable (#4144) 1 year ago
Frank Lee 74257cb446 [shardformer] refactored some doc and api (#4137) 1 year ago
Frank Lee ae035d305d [shardformer] added embedding gradient check (#4124) 1 year ago
Frank Lee 6a88bae4ec [shardformer] integrate with data parallelism (#4103) 1 year ago
Frank Lee f3b6aaa6b7 [shardformer] supported fused normalization (#4112) 1 year ago
Frank Lee b1c2901530 [shardformer] supported bloom model (#4098) 1 year ago
Kun Lin 8af29ee47a [shardformer] support vision transformer (#4096) 1 year ago
jiangmingyan ac80937138 [shardformer] shardformer support opt models (#4091) 1 year ago
Frank Lee d33a44e8c3 [shardformer] refactored layernorm (#4086) 1 year ago