ColossalAI

Commit Graph

Author	SHA1	Message	Date
digger yu	8abc87798f	fix Tensor is not defined (#4129 )	2023-07-03 17:10:18 +08:00
digger yu	7e46bc87b6	fix CheckpointIndexFile is not defined (#4109 )	2023-07-03 17:09:06 +08:00
digger yu	09fe9dc704	[nfc]fix ColossalaiOptimizer is not defined (#4122 )	2023-06-30 17:23:22 +08:00
Wenhao Chen	edd75a59ea	[chat] remove naive strategy and split colossalai strategy (#4094 ) * feat: remove on_learn_epoch fn as not used * revert: add _on_learn_epoch fn * to: remove the use of NaiveStrategy * test: remove NaiveStrategy tests * feat: remove NaiveStrategy * style: modify comments and params * feat: split ColossalAIStrategy into LowLevelZeroStrategy and GeminiStrategy * fix: remove naive * fix: align with modified colossal strategy * fix: fix ddp _try_init_dist arg	2023-06-29 18:11:00 +08:00
Wenhao Chen	b03d64d010	[chat] refactor trainer class (#4080 ) * to: add SLTrainer * refactor: refactor RMTrainer and SFTTrainer * fix: fix init file * feat: remove on_learn_epoch fn as not used * fix: align with modified gemini arguments * to: add OnPolicyTrainer * revert: add _on_learn_epoch fn * refactor: refactor PPOTrainer * style: rename PPOTrainer argument * fix: align with modified PPO arguments * test: align with modified train_prompts arguments * chore: modify train_prompts * docs: align with modified arguments * fix: remove unnecessary output * fix: move dataloader to fit fn of SLTrainer * fix: move dataloader to fit fn of OnPolicyTrainer * fix: modify usage of prompt and pretrain dataloader	2023-06-29 10:48:09 +08:00
Jianghai	711e2b4c00	[doc] update and revise some typos and errs in docs (#4107 ) * fix some typos and problems in doc * fix some typos and problems in doc * add doc test	2023-06-28 19:30:37 +08:00
digger yu	769cddcb2c	fix typo docs/ (#4033 )	2023-06-28 15:30:30 +08:00
digger yu	2d40759a53	fix #3852 path error (#4058 )	2023-06-28 15:29:44 +08:00
Frank Lee	1ee947f617	[workflow] added status check for test coverage workflow (#4106 )	2023-06-28 14:33:43 +08:00
Jianghai	31dc302017	[examples] copy resnet example to image (#4090 ) * copy resnet example * add pytest package * skip test_ci * skip test_ci * skip test_ci	2023-06-27 16:40:46 +08:00
Frank Lee	95e95b6d58	[testing] move pytest to be inside the function (#4087 )	2023-06-27 11:02:25 +08:00
Baizhou Zhang	4da324cd60	[hotfix]fix argument naming in docs and examples (#4083 )	2023-06-26 23:50:04 +08:00
Michelle	e89b127d8e	[chat]: fix chat evaluation possible bug (#4064 ) * fix chat eval * fix utils * fix utils * add comment --------- Co-authored-by: Qianran Ma <qianranm@luchentech.com>	2023-06-26 15:26:07 +08:00
Baizhou Zhang	2c8ae37f61	Merge pull request #4056 from Fridge003/hotfix/fix_gemini_chunk_config_searching [gemini] Rename arguments in chunk configuration searching	2023-06-25 17:37:37 +08:00
Wenhao Chen	153b957a1b	[chat] refactor strategy class with booster api (#3987 ) * refactor: adapt boost API in base and naive strategies * fix: initialize plugin after setup_distributed * fix: fix save_pretrained fn * refactor: adapt boost API in DDPStrategy * to: add _post_init check * to: fix ddp backward, modify ddp dataloader and unwrap * feat: adapt boost API in ColossalAIStrategy * fix: call setup_distributed before use get_current_device * fix: fix save_model and save_optimizer * test: remove save_sharded_optimizer test * style: apply formatter * fix: fix stage check and add comments * feat: allow dict type arg in strategy.prepare * to: temporarily remove lr_scheduler for testing * style: simplify init of ColossalAIStrategy * fix: fix lr_scheduler in sft and rm * style: modify comments * test: add train_prompts tests * fix: fix inference only case and use in train_prompts * test: skip failed tests in ci * style: fix CodeFactor check * fix: do not use model.to('cpu') with GeminiPlugin * test: enable colossalai_gemini tests * test: set CUDA_VISIBLE_DEVICES in ci * docs: add note	2023-06-25 17:36:21 +08:00
Baizhou Zhang	0bb0b481b4	[gemini] fix argument naming during chunk configuration searching	2023-06-25 13:34:15 +08:00
Frank Lee	b463651f3e	[workflow] cover all public repositories in weekly report (#4069 )	2023-06-22 14:41:25 +08:00
Hongxin Liu	4a81faa5f3	[devops] fix build on pr ci (#4043 ) * [devops] fix build on pr ci * [devops] fix build on pr ci	2023-06-19 17:12:56 +08:00
github-actions[bot]	a52f62082d	[format] applied code formatting on changed files in pull request 4021 (#4022 ) Co-authored-by: github-actions <github-actions@github.com>	2023-06-19 11:23:24 +08:00
LuGY	160c64c645	[example] fix bucket size in example of gpt gemini (#4028 )	2023-06-19 11:22:42 +08:00
digger yu	727c4598a9	[nfc] fix dim not defined and fix typo (#3991 )	2023-06-19 11:21:55 +08:00
Frank Lee	ca768eb62d	Merge pull request #4025 from hpcaitech/develop [sync] sync develop to main	2023-06-19 10:31:34 +08:00
Frank Lee	a5883aa790	[test] fixed codefactor format report (#4026 )	2023-06-16 18:23:02 +08:00
Baizhou Zhang	822c3d4d66	[checkpointio] sharded optimizer checkpoint for DDP plugin (#4002 )	2023-06-16 14:14:05 +08:00
Wenhao Chen	725af3eeeb	[booster] make optimizer argument optional for boost (#3993 ) * feat: make optimizer optional in Booster.boost * test: skip unet test if diffusers version > 0.10.2	2023-06-15 17:38:42 +08:00
Baizhou Zhang	c9cff7e7fa	[checkpointio] General Checkpointing of Sharded Optimizers (#3984 )	2023-06-15 15:21:26 +08:00
digger yu	d4fb7bfda7	fix typo applications/Chat/coati/ (#3947 )	2023-06-15 10:43:11 +08:00
Baizhou Zhang	e8ad3c88f5	[doc] add a note about unit-testing to CONTRIBUTING.md (#3970 )	2023-06-14 16:32:39 +08:00
Yuanchen	2925f47399	[evaluate] support gpt evaluation with reference (#3972 ) Co-authored-by: Yuanchen Xu <yuanchen.xu00@gmail.com>	2023-06-13 15:12:29 +08:00
Frank Lee	8bcad73677	[workflow] fixed the directory check in build (#3980 )	2023-06-13 14:42:35 +08:00
Wenhao Chen	9d02590c9a	[chat] refactor actor class (#3968 ) * refactor: separate log_probs fn from Actor forward fn * refactor: separate generate fn from Actor class * feat: update unwrap_model and get_base_model * unwrap_model returns model not wrapped by Strategy * get_base_model returns HF model for Actor, Critic and RewardModel * feat: simplify Strategy.prepare * style: remove get_base_model method of Actor * perf: tokenize text in batches * refactor: move calc_action_log_probs to utils of model * test: update test with new forward fn * style: rename forward fn args * fix: do not unwrap model in save_model fn of naive strategy * test: add gemini test for train_prompts * fix: fix _set_default_generate_kwargs	2023-06-13 13:31:56 +08:00
Frank Lee	2bf6547ad7	Merge pull request #3967 from ver217/update-develop [sync] update develop branch with main	2023-06-12 16:39:43 +08:00
Frank Lee	6718a2f285	[workflow] cancel duplicated workflow jobs (#3960 )	2023-06-12 15:11:27 +08:00
Frank Lee	71fe52769c	[gemini] fixed the gemini checkpoint io (#3934 )	2023-06-12 15:11:27 +08:00
Baizhou Zhang	b3ab7fbabf	[example] update ViT example using booster api (#3940 )	2023-06-12 15:02:27 +08:00
Frank Lee	4110d1f0d4	[workflow] cancel duplicated workflow jobs (#3960 )	2023-06-12 09:50:57 +08:00
digger yu	1aadeedeea	fix typo .github/workflows/scripts/ (#3946 )	2023-06-09 10:30:50 +08:00
digger yu	e61ffc77c6	fix typo tests/ (#3936 )	2023-06-09 09:49:41 +08:00
Frank Lee	bd1ab98158	[gemini] fixed the gemini checkpoint io (#3934 )	2023-06-09 09:48:49 +08:00
FoolPlayer	bd2c7c3297	Merge pull request #3942 from hpcaitech/revert-3931-sync/develop-to-shardformer Revert "[sync] sync feature/shardformer with develop"	2023-06-09 09:42:28 +08:00
Frank Lee	ddcf58cacf	Revert "[sync] sync feature/shardformer with develop"	2023-06-09 09:41:27 +08:00
FoolPlayer	24651fdd4f	Merge pull request #3931 from FrankLeeeee/sync/develop-to-shardformer [sync] sync feature/shardformer with develop	2023-06-09 09:34:00 +08:00
Liu Ziming	e277534a18	Merge pull request #3905 from MaruyamaAya/dreambooth [example] Adding an example of training dreambooth with the new booster API	2023-06-09 08:44:18 +08:00
Yuanchen	21c4c0b1a0	support UniEval and add CHRF metric (#3924 ) Co-authored-by: Yuanchen Xu <yuanchen.xu00@gmail.com>	2023-06-08 17:38:47 +08:00
digger yu	33eef714db	fix typo examples and docs (#3932 )	2023-06-08 16:09:32 +08:00
FoolPlayer	ef1537759c	[shardformer] add gpt2 policy and modify shard and slicer to support (#3883 ) * add gpt2 policy and modify shard and slicer to support * remove unused code * polish code	2023-06-08 15:01:34 +08:00
FoolPlayer	6370a935f6	update README (#3909 )	2023-06-08 15:01:34 +08:00
FoolPlayer	21a3915c98	[shardformer] add Dropout layer support different dropout pattern (#3856 ) * add dropout layer, add dropout test * modify seed manager as context manager * add a copy of col_nn.layer * add dist_crossentropy loss; separate module test * polish the code * fix dist crossentropy loss	2023-06-08 15:01:34 +08:00
FoolPlayer	997544c1f9	[shardformer] update readme with modules implement doc (#3834 ) * update readme with modules content * remove img	2023-06-08 15:01:34 +08:00
Frank Lee	537a52b7a2	[shardformer] refactored the user api (#3828 ) * [shardformer] refactored the user api * polish code	2023-06-08 15:01:34 +08:00

1 2 3 4 5 ...

2582 Commits (da4f7b855f0074b374bbd26837c036f2cdfa9564) All Branches Search

2582 Commits (da4f7b855f0074b374bbd26837c036f2cdfa9564)

All Branches