ColossalAI

Commit Graph

Author	SHA1	Message	Date
flybird11111	0a94fcd351	[shardformer] update bert finetune example with HybridParallelPlugin (#4584 ) * [shardformer] fix opt test hanging * fix * test * test * test * fix test * fix test * remove print * add fix * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] fix epoch change * [shardformer] broadcast add pp group * [shardformer] fix opt test hanging * fix * test * test * [shardformer] zero1+pp and the corresponding tests (#4517) * pause * finish pp+zero1 * Update test_shard_vit.py * [shardformer/fix overlap bug] fix overlap bug, add overlap as an option in shardco… (#4516) * fix overlap bug and support bert, add overlap as an option in shardconfig * support overlap for chatglm and bloom * [shardformer] fix emerged bugs after updating transformers (#4526) * test * fix test * fix test * remove print * add fix * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] Add overlap support for gpt2 (#4535) * add overlap support for gpt2 * remove unused code * remove unused code * [shardformer] support pp+tp+zero1 tests (#4531) * [shardformer] fix opt test hanging * fix * test * test * test * fix test * fix test * remove print * add fix * [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 * [shardformer] pp+tp+zero1 * [shardformer] pp+tp+zero1 * [shardformer] pp+tp+zero1 * [shardformer] pp+tp+zero1 * [shardformer] fix submodule replacement bug when enabling pp (#4544) * [shardformer] support sharded optimizer checkpointIO of HybridParallelPlugin (#4540) * implement sharded optimizer saving * add more param info * finish implementation of sharded optimizer saving * fix bugs in optimizer sharded saving * add pp+zero test * param group loading * greedy loading of optimizer * fix bug when loading * implement optimizer sharded saving * add optimizer test & arrange checkpointIO utils * fix gemini sharding state_dict * add verbose option * add loading of master params * fix typehint * fix master/working mapping in fp16 amp * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] fix epoch change * [shardformer] broadcast add pp group * rebase feature/shardformer * update pipeline * [shardformer] fix * [shardformer] fix * [shardformer] bert finetune fix * [shardformer] add all_reduce operation to loss add all_reduce operation to loss * [shardformer] make compatible with pytree. make compatible with pytree. * [shardformer] disable tp disable tp * [shardformer] add 3d plugin to ci test * [shardformer] update num_microbatches to None * [shardformer] update microbatchsize * [shardformer] update assert * update scheduler * update scheduler --------- Co-authored-by: Jianghai <72591262+CjhHa1@users.noreply.github.com> Co-authored-by: Bin Jia <45593998+FoolPlayer@users.noreply.github.com> Co-authored-by: Baizhou Zhang <eddiezhang@pku.edu.cn>	1 year ago
binmakeswell	ef4b99ebcd	add llama example CI	1 year ago
binmakeswell	7ff11b5537	[example] add llama pretraining (#4257 )	1 year ago
digger yu	2d40759a53	fix #3852 path error (#4058 )	1 year ago
Baizhou Zhang	4da324cd60	[hotfix]fix argument naming in docs and examples (#4083 )	1 year ago
LuGY	160c64c645	[example] fix bucket size in example of gpt gemini (#4028 )	1 year ago
Baizhou Zhang	b3ab7fbabf	[example] update ViT example using booster api (#3940 )	1 year ago
digger yu	33eef714db	fix typo examples and docs (#3932 )	1 year ago
Baizhou Zhang	e417dd004e	[example] update opt example using booster api (#3918 )	1 year ago
Liu Ziming	b306cecf28	[example] Modify palm example with the new booster API (#3913 ) * Modify torch version requirement to adapt torch 2.0 * modify palm example using new booster API * roll back * fix port * polish * polish	1 year ago
wukong1992	a55fb00c18	[booster] update bert example, using booster api (#3885 )	1 year ago
jiangmingyan	5f79008c4a	[example] update gemini examples (#3868 ) * [example]update gemini examples * [example]update gemini examples	2 years ago
digger yu	518b31c059	[docs] change placememt_policy to placement_policy (#3829 ) * fix typo colossalai/autochunk auto_parallel amp * fix typo colossalai/auto_parallel nn utils etc. * fix typo colossalai/auto_parallel autochunk fx/passes etc. * fix typo docs/ * change placememt_policy to placement_policy in docs/ and examples/	2 years ago
binmakeswell	15024e40d9	[auto] fix install cmd (#3772 )	2 years ago
digger-yu	b9a8dff7e5	[doc] Fix typo under colossalai and doc(#3618 ) * Fixed several spelling errors under colossalai * Fix the spelling error in colossalai and docs directory * Cautious Changed the spelling error under the example folder * Update runtime_preparation_pass.py revert autograft to autograd * Update search_chunk.py utile to until * Update check_installation.py change misteach to mismatch in line 91 * Update 1D_tensor_parallel.md revert to perceptron * Update 2D_tensor_parallel.md revert to perceptron in line 73 * Update 2p5D_tensor_parallel.md revert to perceptron in line 71 * Update 3D_tensor_parallel.md revert to perceptron in line 80 * Update README.md revert to resnet in line 42 * Update reorder_graph.py revert to indice in line 7 * Update p2p.py revert to megatron in line 94 * Update initialize.py revert to torchrun in line 198 * Update routers.py change to detailed in line 63 * Update routers.py change to detailed in line 146 * Update README.md revert random number in line 402	2 years ago
binmakeswell	f1b3d60cae	[example] reorganize for community examples (#3557 )	2 years ago
mandoxzhang	8f2c55f9c9	[example] remove redundant texts & update roberta (#3493 ) * update roberta example * update roberta example * modify conflict & update roberta	2 years ago
mandoxzhang	ab5fd127e3	[example] update roberta with newer ColossalAI (#3472 ) * update roberta example * update roberta example	2 years ago
Frank Lee	80eba05b0a	[test] refactor tests with spawn (#3452 ) * [test] added spawn decorator * polish code * polish code * polish code * polish code * polish code * polish code	2 years ago
ver217	573af84184	[example] update examples related to zero/gemini (#3431 ) * [zero] update legacy import * [zero] update examples * [example] fix opt tutorial * [example] fix opt tutorial * [example] fix opt tutorial * [example] fix opt tutorial * [example] fix import	2 years ago
ver217	26b7aac0be	[zero] reorganize zero/gemini folder structure (#3424 ) * [zero] refactor low-level zero folder structure * [zero] fix legacy zero import path * [zero] fix legacy zero import path * [zero] remove useless import * [zero] refactor gemini folder structure * [zero] refactor gemini folder structure * [zero] refactor legacy zero import path * [zero] refactor gemini folder structure * [zero] refactor gemini folder structure * [zero] refactor gemini folder structure * [zero] refactor legacy zero import path * [zero] fix test import path * [zero] fix test * [zero] fix circular import * [zero] update import	2 years ago
Yan Fang	189347963a	[auto] fix requirements typo for issue #3125 (#3209 )	2 years ago
Zihao	18dbe76cae	[auto-parallel] add auto-offload feature (#3154 ) * add auto-offload feature * polish code * fix syn offload runtime pass bug * add offload example * fix offload testing bug * fix example testing bug	2 years ago
binmakeswell	360674283d	[example] fix redundant note (#3065 )	2 years ago
Tomek	af3888481d	[example] fixed opt model downloading from huggingface	2 years ago
ramos	2ef855c798	support shardinit option to avoid OPT OOM initializing problem (#3037 ) Co-authored-by: poe <poe@nemoramo>	2 years ago
Ziyue Jiang	400f63012e	[pipeline] Add Simplified Alpa DP Partition (#2507 ) * add alpa dp split * add alpa dp split * use fwd+bwd instead of fwd only --------- Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>	2 years ago
github-actions[bot]	da056285f2	[format] applied code formatting on changed files in pull request 2922 (#2923 ) Co-authored-by: github-actions <github-actions@github.com>	2 years ago
binmakeswell	12bafe057f	[doc] update installation for GPT (#2922 )	2 years ago
Alex_996	a4fc125c34	Fix typos (#2863 ) Fix typos, `6.7 -> 6.7b`	2 years ago
dawei-wang	55424a16a5	[doc] fix GPT tutorial (#2860 ) Fix hpcaitech/ColossalAI#2851	2 years ago
Jiarui Fang	bf0204604f	[exmaple] add bert and albert (#2824 )	2 years ago
cloudhuang	43dffdaba5	[doc] fixed a typo in GPT readme (#2736 )	2 years ago
Jiatong (Julius) Han	a255a38f7f	[example] Polish README.md (#2658 ) * [tutorial] polish readme.md * [example] Update README.md	2 years ago
HELSON	6e0faa70e0	[gemini] add profiler in the demo (#2534 )	2 years ago
HELSON	66dfcf5281	[gemini] update the gpt example (#2527 )	2 years ago
HELSON	707b11d4a0	[gemini] update ddp strict mode (#2518 ) * [zero] add strict ddp mode for chunk init * [gemini] update gpt example	2 years ago
HELSON	2d1a7dfe5f	[zero] add strict ddp mode (#2508 ) * [zero] add strict ddp mode * [polish] add comments for strict ddp mode * [zero] fix test error	2 years ago
Jiarui Fang	e327e95144	[hotfix] gpt example titans bug #2493 (#2494 )	2 years ago
binmakeswell	fcc6d61d92	[example] fix requirements (#2488 )	2 years ago
Jiarui Fang	3a21485ead	[example] titans for gpt (#2484 )	2 years ago
Jiarui Fang	7c31706227	[CI] add test_ci.sh for palm, opt and gpt (#2475 )	2 years ago
ver217	f525d1f528	[example] update gpt gemini example ci test (#2477 )	2 years ago
Ziyue Jiang	fef5c949c3	polish pp middleware (#2476 ) Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>	2 years ago
Jiarui Fang	867c8c2d3a	[zero] low level optim supports ProcessGroup (#2464 )	2 years ago
YuliangLiu0306	2731531bc2	[autoparallel] integrate device mesh initialization into autoparallelize (#2393 ) * [autoparallel] integrate device mesh initialization into autoparallelize * add megatron solution * update gpt autoparallel examples with latest api * adapt beta value to fit the current computation cost	2 years ago
ZijianYY	fe0f7970a2	[examples] adding tflops to PaLM (#2365 )	2 years ago
HELSON	d84e747975	[hotfix] add DISTPAN argument for benchmark (#2412 ) * change the benchmark config file * change config * revert config file * rename distpan to distplan	2 years ago
HELSON	498b5ca993	[hotfix] fix gpt gemini example (#2404 ) * [hotfix] fix gpt gemini example * [example] add new assertions	2 years ago
Jiarui Fang	12c8bf38d7	[Pipeline] Refine GPT PP Example	2 years ago

1 2

93 Commits (0a94fcd3514a6f7d4f287bba614fda3fb12c8802)