Commit Graph

2732 Commits (fae6c92eadafc7c700af5dd5bec9553d03c81550)
 

Author SHA1 Message Date
Hongxin Liu fae6c92ead
Merge branch 'main' into feature/shardformer
1 year ago
Hongxin Liu ac178ca5c1 [legacy] move builder and registry to legacy (#4603)
1 year ago
Hongxin Liu 8accecd55b [legacy] move engine to legacy (#4560)
1 year ago
Hongxin Liu 89fe027787 [legacy] move trainer to legacy (#4545)
1 year ago
Hongxin Liu bd18678478
[test] fix gemini checkpoint and gpt test (#4620)
1 year ago
Hongxin Liu 807e01a4ba
[zero] hotfix master param sync (#4618)
1 year ago
Hongxin Liu e71d245293
[test] ignore gpt2 shardformer test (#4619)
1 year ago
flybird11111 ec0866804c
[shardformer] update shardformer readme (#4617)
1 year ago
Bin Jia 86d22581e4
[shardformer] Add overlap optional for HybridParallelPlugin (#4615)
1 year ago
Hongxin Liu a39a5c66fe
Merge branch 'main' into feature/shardformer
1 year ago
Baizhou Zhang e79b1e80e2
[checkpointio] support huggingface from_pretrained for all plugins (#4606)
1 year ago
flybird11111 0a94fcd351
[shardformer] update bert finetune example with HybridParallelPlugin (#4584)
1 year ago
Jianghai 24c0768795
[shardformer] Pytree fix (#4533)
1 year ago
yingliu-hpc aaeb520ce3
Merge pull request #4542 from hpcaitech/chatglm
1 year ago
binmakeswell 8d7b02290f
[doc] add llama2 benchmark (#4604)
1 year ago
binmakeswell 7a978eb3d0
[DOC] hotfix/llama2news (#4595)
1 year ago
Hongxin Liu 63ecafb1fb
[checkpointio] optimize zero optim checkpoint io (#4591)
1 year ago
Hongxin Liu 508ca36fe3
[pipeline] 1f1b schedule receive microbatch size (#4589)
1 year ago
Mashiro cfa607080f
[Fix] Fix compile error (#4357)
1 year ago
栾鹏 eb952ea88d
Update Dockerfile (#4499)
1 year ago
LuGY cbac782254
[zero]fix zero ckptIO with offload (#4529)
1 year ago
Baizhou Zhang 38ccb8b1a3
[shardformer] support from_pretrained when loading model with HybridParallelPlugin (#4575)
1 year ago
Baizhou Zhang c9625dbb63
[shardformer] support sharded optimizer checkpointIO of HybridParallelPlugin (#4540)
1 year ago
Baizhou Zhang 2c787d7f47
[shardformer] fix submodule replacement bug when enabling pp (#4544)
1 year ago
Hongxin Liu c7b60f7547
[devops] cancel previous runs in the PR (#4546)
1 year ago
Tian Siyuan f1ae8c9104
[example] change accelerate version (#4431)
1 year ago
ChengDaqi2023 8e2e1992b8
[example] update streamlit 0.73.1 to 1.11.1 (#4386)
1 year ago
flybird11111 ec18fc7340
[shardformer] support pp+tp+zero1 tests (#4531)
1 year ago
Lufang Chen 12c95a9fed
fix runtime prepare pass (#4502)
1 year ago
Ying Liu 9f852f2489 keep requirements same with main branch
1 year ago
flybird11111 d367b88785
[shardformer] fix opt test hanging (#4521)
1 year ago
Ying Liu c648dc093f fix colossalai version in coati examples
1 year ago
yingliu-hpc 661a1ef712
Merge pull request #4541 from ver217/coati/chatglm
1 year ago
ver217 1c43bfd54e [coati] update ci
1 year ago
Bin Jia e241b74f24
[shardformer] Add overlap support for gpt2 (#4535)
1 year ago
yingliu-hpc 1467e3b41b
[coati] add chatglm model (#4539)
1 year ago
Baizhou Zhang 0387a47e63
[shardformer] fix emerged bugs after updating transformers (#4526)
1 year ago
Hongxin Liu 0b00def881
[example] add llama2 example (#4527)
1 year ago
Bin Jia c554b7f559
[shardformer/fix overlap bug] fix overlap bug, add overlap as an option in shardco… (#4516)
1 year ago
Jianghai 376533a564
[shardformer] zero1+pp and the corresponding tests (#4517)
1 year ago
Baizhou Zhang 44eab2b27f
[shardformer] support sharded checkpoint IO for models of HybridParallelPlugin (#4506)
1 year ago
flybird11111 de8a65babc
[shardformer] opt fix. (#4514)
1 year ago
LuGY 839847b7d7
[zero]support zero2 with gradient accumulation (#4511)
1 year ago
github-actions[bot] c0efc3ebcb
[format] applied code formatting on changed files in pull request 4479 (#4504)
1 year ago
flybird11111 3353e55c80
[shardformer] vit/llama/t5 ignore the sequence parallelism flag and some fix. (#4498)
1 year ago
Hongxin Liu 27061426f7
[gemini] improve compatibility and add static placement policy (#4479)
1 year ago
Jianghai e04436a82a
[shardformer] tests for 3d parallel (#4493)
1 year ago
flybird11111 59e252ecdb
[shardformer] chatglm support sequence parallel (#4482)
1 year ago
Bin Jia 351351a36e
[shardformer/sequence parallel] not support opt of seq-parallel, add warning and fix a bug in gpt2 pp (#4488)
1 year ago
Jianghai 5545114fd8
rename chatglm to chatglm2 (#4484)
1 year ago