ColossalAI

Commit Graph

Author	SHA1	Message	Date
wukong1992	b37797ed3d	[booster] support torch fsdp plugin in booster (#3697 ) Co-authored-by: 纪少敏 <jishaomin@jishaomindeMBP.lan>	2023-05-15 12:14:38 +08:00
digger-yu	ad6460cf2c	[NFC] fix typo applications/ and colossalai/ (#3735 )	2023-05-15 11:46:25 +08:00
digger-yu	1f73609adb	[CI] fix typo with tests/ etc. (#3727 ) * fix spelling error with examples/comminity/ * fix spelling error with tests/ * fix some spelling error with tests/ colossalai/ etc. * fix spelling error with tests/ etc. date:2023.5.10	2023-05-11 16:30:58 +08:00
digger-yu	899aa86368	[CI] fix typo with tests components (#3695 ) * fix spelling error with examples/comminity/ * fix spelling error with tests/	2023-05-11 11:10:28 +08:00
digger-yu	b7141c36dd	[CI] fix some spelling errors (#3707 ) * fix spelling error with examples/comminity/ * fix spelling error with tests/ * fix some spelling error with tests/ colossalai/ etc.	2023-05-10 17:12:03 +08:00
MisterLin1995	f7361ee1bd	[chat] fix community example ray (#3719 ) Co-authored-by: jiangwen <zxl265370@antgroup.com>	2023-05-10 13:36:09 +08:00
jiangmingyan	20068ba188	[booster] add tests for ddp and low level zero's checkpointio (#3715 ) * [booster] update tests for booster * [booster] update tests for booster * [booster] update tests for booster * [booster] update tests for booster * [booster] update tests for booster * [booster] update booster tutorials#3717, fix recursive check	2023-05-10 12:17:02 +08:00
Hongxin Liu	6552cbf8e1	[booster] fix no_sync method (#3709 ) * [booster] fix no_sync method * [booster] add test for ddp no_sync * [booster] fix merge * [booster] update unit test * [booster] update unit test * [booster] update unit test	2023-05-09 11:10:02 +08:00
Hongxin Liu	3bf09efe74	[booster] update prepare dataloader method for plugin (#3706 ) * [booster] add prepare dataloader method for plug * [booster] update examples and docstr	2023-05-08 15:44:03 +08:00
Hongxin Liu	f83ea813f5	[example] add train resnet/vit with booster example (#3694 ) * [example] add train vit with booster example * [example] update readme * [example] add train resnet with booster example * [example] enable ci * [example] enable ci * [example] add requirements * [hotfix] fix analyzer init * [example] update requirements	2023-05-08 10:42:30 +08:00
YH	2629f9717d	[tensor] Refactor handle_trans_spec in DistSpecManager	2023-05-06 17:55:37 +08:00
zhang-yi-chi	2da5d81dec	[chat] fix train_prompts.py gemini strategy bug (#3666 ) * fix gemini strategy bug * add comment * add comment * better solution	2023-05-06 16:46:38 +08:00
Hongxin Liu	d556648885	[example] add finetune bert with booster example (#3693 )	2023-05-06 11:53:13 +08:00
digger-yu	65bdc3159f	fix some spelling error with applications/Chat/examples/ (#3692 ) * fix spelling error with examples/comminity/ * fix spelling error with example/	2023-05-06 11:27:23 +08:00
Hongxin Liu	d0915f54f4	[booster] refactor all dp fashion plugins (#3684 ) * [booster] add dp plugin base * [booster] inherit dp plugin base * [booster] refactor unit tests	2023-05-05 19:36:10 +08:00
digger-yu	b49020c1b1	[CI] Update test_sharded_optim_with_sync_bn.py (#3688 ) fix spelling error in line23 change "cudnn_determinstic"=True to "cudnn_deterministic=True"	2023-05-05 18:57:27 +08:00
Tong Li	b36e67cb2b	Merge pull request #3680 from digger-yu/digger-yu-patch-2 fix spelling error with applications/Chat/evaluate/	2023-05-05 16:26:04 +08:00
jiangmingyan	307894f74d	[booster] gemini plugin support shard checkpoint (#3610 ) * gemini plugin add shard checkpoint save/load * gemini plugin add shard checkpoint save/load * gemini plugin add shard checkpoint save/load * gemini plugin add shard checkpoint save/load * gemini plugin add shard checkpoint save/load * gemini plugin add shard checkpoint save/load * gemini plugin add shard checkpoint save/load * gemini plugin add shard checkpoint save/load * gemini plugin add shard checkpoint save/load * gemini plugin add shard checkpoint save/load * gemini plugin add shard checkpoint save/load * gemini plugin add shard checkpoint save/load * gemini plugin add shard checkpoint save/load * gemini plugin add shard checkpoint save/load * gemini plugin support shard checkpoint * [API Refactoring]gemini plugin support shard checkpoint * [API Refactoring]gemini plugin support shard checkpoint * [API Refactoring]gemini plugin support shard checkpoint * [API Refactoring]gemini plugin support shard checkpoint * [API Refactoring]gemini plugin support shard checkpoint * [API Refactoring]gemini plugin support shard checkpoint * [API Refactoring]gemini plugin support shard checkpoint * [API Refactoring]gemini plugin support shard checkpoint * [API Refactoring]gemini plugin support shard checkpoint * [API Refactoring]gemini plugin support shard checkpoint * [API Refactoring]gemini plugin support shard checkpoint * [API Refactoring]gemini plugin support shard checkpoint * [API Refactoring]gemini plugin support shard checkpoint --------- Co-authored-by: luchen <luchen@luchendeMBP.lan> Co-authored-by: luchen <luchen@luchendeMacBook-Pro.local>	2023-05-05 14:37:21 +08:00
Camille Zhong	0f785cb1f3	[chat] PPO stage3 doc enhancement (#3679 ) * Add RoBERTa for RLHF Stage 2 & 3 (test) RoBERTa for RLHF Stage 2 & 3 (still in testing) Revert "Add RoBERTa for RLHF Stage 2 & 3 (test)" This reverts commit `06741d894d`. Add RoBERTa for RLHF stage 2 & 3 1. add roberta folder under model folder 2. add roberta option in train_reward_model.py 3. add some test in testci Update test_ci.sh Revert "Update test_ci.sh" This reverts commit 9c7352b81766f3177d31eeec0ec178a301df966a. Add RoBERTa for RLHF Stage 2 & 3 (test) RoBERTa for RLHF Stage 2 & 3 (still in testing) Revert "Add RoBERTa for RLHF Stage 2 & 3 (test)" This reverts commit `06741d894d`. Add RoBERTa for RLHF stage 2 & 3 1. add roberta folder under model folder 2. add roberta option in train_reward_model.py 3. add some test in testci Update test_ci.sh Revert "Update test_ci.sh" This reverts commit 9c7352b81766f3177d31eeec0ec178a301df966a. update roberta with coati chat ci update Revert "chat ci update" This reverts commit 17ae7ae01fa752bd3289fc39069868fde99cf846. * Update README.md Update README.md * update readme * Update test_ci.sh * update readme and add a script update readme and add a script modify readme Update README.md	2023-05-05 13:36:56 +08:00
digger-yu	6650daeb0a	[doc] fix chat spelling error (#3671 ) * Update README.md change "huggingaface" to "huggingface" * Update README.md change "Colossa-AI" to "Colossal-AI"	2023-05-05 11:37:35 +08:00
Hongxin Liu	7bd0bee8ea	[chat] add opt attn kernel (#3655 ) * [chat] add opt attn kernel * [chat] disable xformer during fwd	2023-05-04 16:03:33 +08:00
digger-yu	8ba7858753	Update generate_gpt35_answers.py fix spelling error with generate_gpt35_answers.py	2023-05-04 15:34:16 +08:00
digger-yu	bfbf650588	fix spelling error fix spelling error with evaluate.py	2023-05-04 15:31:09 +08:00
tanitna	1a60dc07a8	[chat] typo accimulation_steps -> accumulation_steps (#3662 )	2023-04-28 15:42:57 +08:00
Tong Li	816add7e7f	Merge pull request #3656 from TongLi3701/chat/update_eval [Chat]: Remove unnecessary step and update documentation	2023-04-28 14:07:44 +08:00
binmakeswell	268b3cd80d	[chat] set default zero2 strategy (#3667 ) * [chat] set default gemini strategy * [chat] set default zero2 strategy * [chat] set default zero2 strategy	2023-04-28 13:56:50 +08:00
Tong Li	c1a355940e	update readme	2023-04-28 11:56:35 +08:00
Tong Li	ed3eaa6922	update documentation	2023-04-28 11:49:21 +08:00
Tong Li	c419117329	update questions and readme	2023-04-27 19:04:26 +08:00
Tong Li	aa77ddae33	remove unnecessary step and update readme	2023-04-27 18:51:58 +08:00
YH	a22407cc02	[zero] Suggests a minor change to confusing variable names in the ZeRO optimizer. (#3173 ) * Fix confusing variable name in zero opt * Apply lint * Fix util func * Fix minor util func * Fix zero param optimizer name	2023-04-27 18:43:14 +08:00
Hongxin Liu	842768a174	[chat] refactor model save/load logic (#3654 ) * [chat] strategy refactor unwrap model * [chat] strategy refactor save model * [chat] add docstr * [chat] refactor trainer save model * [chat] fix strategy typing * [chat] refactor trainer save model * [chat] update readme * [chat] fix unit test	2023-04-27 18:41:49 +08:00
Hongxin Liu	6ef7011462	[chat] remove lm model class (#3653 ) * [chat] refactor lora * [chat] remove lm class * [chat] refactor save model * [chat] refactor train sft * [chat] fix ci * [chat] fix ci	2023-04-27 15:37:38 +08:00
Camille Zhong	8bccb72c8d	[Doc] enhancement on README.md for chat examples (#3646 ) * Add RoBERTa for RLHF Stage 2 & 3 (test) RoBERTa for RLHF Stage 2 & 3 (still in testing) Revert "Add RoBERTa for RLHF Stage 2 & 3 (test)" This reverts commit `06741d894d`. Add RoBERTa for RLHF stage 2 & 3 1. add roberta folder under model folder 2. add roberta option in train_reward_model.py 3. add some test in testci Update test_ci.sh Revert "Update test_ci.sh" This reverts commit 9c7352b81766f3177d31eeec0ec178a301df966a. Add RoBERTa for RLHF Stage 2 & 3 (test) RoBERTa for RLHF Stage 2 & 3 (still in testing) Revert "Add RoBERTa for RLHF Stage 2 & 3 (test)" This reverts commit `06741d894d`. Add RoBERTa for RLHF stage 2 & 3 1. add roberta folder under model folder 2. add roberta option in train_reward_model.py 3. add some test in testci Update test_ci.sh Revert "Update test_ci.sh" This reverts commit 9c7352b81766f3177d31eeec0ec178a301df966a. update roberta with coati chat ci update Revert "chat ci update" This reverts commit 17ae7ae01fa752bd3289fc39069868fde99cf846. * Update README.md Update README.md * update readme * Update test_ci.sh	2023-04-27 14:26:19 +08:00
Hongxin Liu	2a951955ad	[chat] refactor trainer (#3648 ) * [chat] ppo trainer remove useless args * [chat] update examples * [chat] update benchmark * [chat] update examples * [chat] fix sft training with wandb * [chat] polish docstr	2023-04-26 18:11:49 +08:00
Hongxin Liu	f8288315d9	[chat] polish performance evaluator (#3647 )	2023-04-26 17:34:59 +08:00
Hongxin Liu	50793b35f4	[gemini] accelerate inference (#3641 ) * [gemini] support don't scatter after inference * [chat] update colossalai strategy * [chat] fix opt benchmark * [chat] update opt benchmark * [gemini] optimize inference * [test] add gemini inference test * [chat] fix unit test ci * [chat] fix ci * [chat] fix ci * [chat] skip checkpoint test	2023-04-26 16:32:40 +08:00
Hongxin Liu	4b3240cb59	[booster] add low level zero plugin (#3594 ) * [booster] add low level zero plugin * [booster] fix gemini plugin test * [booster] fix precision * [booster] add low level zero plugin test * [test] fix booster plugin test oom * [test] fix booster plugin test oom * [test] fix googlenet and inception output trans * [test] fix diffuser clip vision model * [test] fix torchaudio_wav2vec2_base * [test] fix low level zero plugin test	2023-04-26 14:37:25 +08:00
digger-yu	b9a8dff7e5	[doc] Fix typo under colossalai and doc(#3618 ) * Fixed several spelling errors under colossalai * Fix the spelling error in colossalai and docs directory * Cautious Changed the spelling error under the example folder * Update runtime_preparation_pass.py revert autograft to autograd * Update search_chunk.py utile to until * Update check_installation.py change misteach to mismatch in line 91 * Update 1D_tensor_parallel.md revert to perceptron * Update 2D_tensor_parallel.md revert to perceptron in line 73 * Update 2p5D_tensor_parallel.md revert to perceptron in line 71 * Update 3D_tensor_parallel.md revert to perceptron in line 80 * Update README.md revert to resnet in line 42 * Update reorder_graph.py revert to indice in line 7 * Update p2p.py revert to megatron in line 94 * Update initialize.py revert to torchrun in line 198 * Update routers.py change to detailed in line 63 * Update routers.py change to detailed in line 146 * Update README.md revert random number in line 402	2023-04-26 11:38:43 +08:00
Tong Li	e1b0a78afa	Merge pull request #3621 from zhang-yi-chi/fix/chat-train-prompts-single-gpu [chat] fix single gpu training bug in examples/train_prompts.py	2023-04-24 22:13:54 +08:00
ddobokki	df309fc6ab	[Chat] Remove duplicate functions (#3625 )	2023-04-24 12:23:15 +08:00
Hongxin Liu	179558a87a	[devops] fix chat ci (#3628 )	2023-04-24 10:55:14 +08:00
zhang-yi-chi	739cfe3360	[chat] fix enable single gpu training bug	2023-04-22 14:16:08 +08:00
digger-yu	d7bf284706	[chat] polish code note typo (#3612 )	2023-04-20 17:22:15 +08:00
Yuanchen	c4709d34cf	Chat evaluate (#3608 ) Co-authored-by: Yuanchen Xu <yuanchen.xu00@gmail.com>	2023-04-20 11:12:24 +08:00
digger-yu	633bac2f58	[doc] .github/workflows/README.md (#3605 ) Fixed several word spelling errors change "compatiblity" to "compatibility" etc.	2023-04-20 10:36:28 +08:00
digger-yu	becd3b0f54	[doc] fix setup.py typo (#3603 ) Optimization Code change "vairable" to "variable"	2023-04-19 17:28:15 +08:00
digger-yu	7570d9ae3d	[doc] fix op_builder/README.md (#3597 ) Optimization Code change "requries" to "requires"	2023-04-19 15:56:01 +08:00
Hongxin Liu	12eff9eb4c	[gemini] state dict supports fp16 (#3590 ) * [gemini] save state dict support fp16 * [gemini] save state dict shard support fp16 * [gemini] fix state dict * [gemini] fix state dict	2023-04-19 11:01:48 +08:00
github-actions[bot]	d544ed4345	[bot] Automated submodule synchronization (#3596 ) Co-authored-by: github-actions <github-actions@github.com>	2023-04-19 10:38:12 +08:00

1 2 3 4 5 ...

2324 Commits (b37797ed3d3d6af294a095397b4bc135264b8c6a) All Branches Search

2324 Commits (b37797ed3d3d6af294a095397b4bc135264b8c6a)

All Branches