2849 Commits (cabc1286ca4a2defffe8e74aaca18023620099f6)
 

Author SHA1 Message Date
flybird11111 c7d6975d29
[shardformer] fix GPT2DoubleHeadsModel (#4703) 1 year ago
Baizhou Zhang 068372a738
[doc] add potential solution for OOM in llama2 example (#4699) 1 year ago
digger yu 9c2feb2f0b
fix some typo with colossalai/device colossalai/tensor/ etc. (#4171) 1 year ago
Baizhou Zhang d8ceeac14e
[hotfix] fix typo in hybrid parallel io (#4697) 1 year ago
flybird11111 8844691f4b
[shardformer] update shardformer readme (#4689) 1 year ago
Baizhou Zhang 1d454733c4
[doc] Update booster user documents. (#4669) 1 year ago
Cuiqing Li bce0f16702
[Feature] The first PR to Add TP inference engine, kv-cache manager and related kernels for our inference system (#4577) 1 year ago
flybird11111 eedaa3e1ef
[shardformer]fix gpt2 double head (#4663) 1 year ago
Hongxin Liu 554aa9592e
[legacy] move communication and nn to legacy and refactor logger (#4671) 1 year ago
Hongxin Liu 536397cc95
[devops] fix concurrency group (#4667) 1 year ago
flybird11111 7486ed7d3a
[shardformer] update llama2/opt finetune example and fix llama2 policy (#4645) 1 year ago
Hongxin Liu a686f9ddc8
[devops] fix concurrency group and compatibility test (#4665) 1 year ago
Baizhou Zhang 295b38fecf
[example] update vit example for hybrid parallel plugin (#4641) 1 year ago
Baizhou Zhang 660eed9124
[pipeline] set optimizer to optional in execute_pipeline (#4630) 1 year ago
eric8607242 c3d5fa3bac
[shardformer] Support customized policy for llamav2 based model with HybridParallelPlugin (#4624) 1 year ago
Hongxin Liu 9709b8f502
[release] update version (#4623) 1 year ago
Hongxin Liu efba0f44b9
Merge pull request #4612 from hpcaitech/feature/shardformer 1 year ago
Hongxin Liu fae6c92ead
Merge branch 'main' into feature/shardformer 1 year ago
Hongxin Liu ac178ca5c1 [legacy] move builder and registry to legacy (#4603) 1 year ago
Hongxin Liu 8accecd55b [legacy] move engine to legacy (#4560) 1 year ago
Hongxin Liu 89fe027787 [legacy] move trainer to legacy (#4545) 1 year ago
Hongxin Liu bd18678478
[test] fix gemini checkpoint and gpt test (#4620) 1 year ago
Hongxin Liu 807e01a4ba
[zero] hotfix master param sync (#4618) 1 year ago
Hongxin Liu e71d245293
[test] ignore gpt2 shardformer test (#4619) 1 year ago
flybird11111 ec0866804c
[shardformer] update shardformer readme (#4617) 1 year ago
Bin Jia 86d22581e4
[shardformer] Add overlap optional for HybridParallelPlugin (#4615) 1 year ago
Hongxin Liu a39a5c66fe
Merge branch 'main' into feature/shardformer 1 year ago
Baizhou Zhang e79b1e80e2
[checkpointio] support huggingface from_pretrained for all plugins (#4606) 1 year ago
flybird11111 0a94fcd351
[shardformer] update bert finetune example with HybridParallelPlugin (#4584) 1 year ago
Jianghai 24c0768795
[shardformer] Pytree fix (#4533) 1 year ago
yingliu-hpc aaeb520ce3
Merge pull request #4542 from hpcaitech/chatglm 1 year ago
binmakeswell 8d7b02290f
[doc] add llama2 benchmark (#4604) 1 year ago
binmakeswell 7a978eb3d0
[DOC] hotfix/llama2news (#4595) 1 year ago
Hongxin Liu 63ecafb1fb
[checkpointio] optimize zero optim checkpoint io (#4591) 1 year ago
Hongxin Liu 508ca36fe3
[pipeline] 1f1b schedule receive microbatch size (#4589) 1 year ago
Mashiro cfa607080f
[Fix] Fix compile error (#4357) 1 year ago
栾鹏 eb952ea88d
Update Dockerfile (#4499) 1 year ago
LuGY cbac782254
[zero]fix zero ckptIO with offload (#4529) 1 year ago
Baizhou Zhang 38ccb8b1a3
[shardformer] support from_pretrained when loading model with HybridParallelPlugin (#4575) 1 year ago
Baizhou Zhang c9625dbb63
[shardformer] support sharded optimizer checkpointIO of HybridParallelPlugin (#4540) 1 year ago
Baizhou Zhang 2c787d7f47
[shardformer] fix submodule replacement bug when enabling pp (#4544) 1 year ago
Hongxin Liu c7b60f7547
[devops] cancel previous runs in the PR (#4546) 1 year ago
Tian Siyuan f1ae8c9104
[example] change accelerate version (#4431) 1 year ago
ChengDaqi2023 8e2e1992b8
[example] update streamlit 0.73.1 to 1.11.1 (#4386) 1 year ago
flybird11111 ec18fc7340
[shardformer] support pp+tp+zero1 tests (#4531) 1 year ago
Lufang Chen 12c95a9fed
fix runtime prepare pass (#4502) 1 year ago
Ying Liu 9f852f2489 keep requirements same with main branch 1 year ago
flybird11111 d367b88785
[shardformer] fix opt test hanging (#4521) 1 year ago
Ying Liu c648dc093f fix colossalai version in coati examples 1 year ago
yingliu-hpc 661a1ef712
Merge pull request #4541 from ver217/coati/chatglm 1 year ago