ColossalAI

Commit Graph

Author	SHA1	Message	Date
Hongxin Liu	4341f5e8e6	[lazyinit] fix clone and deepcopy (#3553 )	2023-04-17 11:25:13 +08:00
digger-yu	1c7734bc94	[doc] Update 1D_tensor_parallel.md (#3563 ) Display format optimization, fix bug#3562 Specific changes 1. "This is called a column-parallel fashion" Translate to Chinese 2. use the ```math code block syntax to display a math expression as a block, No modification of formula content Please check that the math formula is displayed correctly If OK, I will change the format of the English version of the formula in parallel	2023-04-14 22:12:32 +08:00
binmakeswell	f1b3d60cae	[example] reorganize for community examples (#3557 )	2023-04-14 16:27:48 +08:00
MisterLin1995	1a809eddaa	[chat] ChatGPT train prompts on ray example (#3309 ) * [feat][chatgpt]train prompts on ray example * [fix]simplify code * [fix]remove depreciated parameter * [fix]add dependencies * [fix]method calling * [fix]experience maker * [fix]missing loss function * [fix]init optimizer * [feat]add usage comment * [fix]rename files * [fix]add readme * [fix]file path * [fix]move directory --------- Co-authored-by: jiangwen <zxl265370@antgroup.com>	2023-04-13 18:18:36 +08:00
binmakeswell	535b896435	[chat] polish tutorial doc (#3551 ) * [chat] clean up duplicate tutorial * [chat] clean up duplicate tutorial * [chat] clean up duplicate tutorial * [chat] clean up duplicate tutorial	2023-04-13 18:11:48 +08:00
digger-yu	77efdfe1dd	[doc] Update README.md (#3549 ) Format Optimization ,Add [] outside of DeepSpeed	2023-04-13 17:11:55 +08:00
digger-yu	3f760da9f0	Update README.md (#3548 ) Delete more ")"	2023-04-13 16:49:57 +08:00
digger-yu	a3ac48ef3d	[doc] Update README-zh-Hans.md (#3541 ) Fixing document link errors using absolute paths	2023-04-12 23:09:30 +08:00
natalie_cao	de84c0311a	Polish Code	2023-04-12 18:19:46 +08:00
Hongxin Liu	152239bbfa	[gemini] gemini supports lazy init (#3379 ) * [gemini] fix nvme optimizer init * [gemini] gemini supports lazy init * [gemini] add init example * [gemini] add fool model * [zero] update gemini ddp * [zero] update init example * add chunk method * add chunk method * [lazyinit] fix lazy tensor tolist * [gemini] fix buffer materialization * [misc] remove useless file * [booster] update gemini plugin * [test] update gemini plugin test * [test] fix gemini plugin test * [gemini] fix import * [gemini] fix import * [lazyinit] use new metatensor * [lazyinit] use new metatensor * [lazyinit] fix __set__ method	2023-04-12 16:03:25 +08:00
jiangmingyan	366a035552	[checkpoint] Shard saved checkpoint need to be compatible with the naming format of hf checkpoint files (#3479 ) * [checkpoint] support huggingface style sharded checkpoint, to be compatible with hf file naming format * [checkpoint] support huggingface style sharded checkpoint, to be compatible with hf file naming format * [checkpoint] Shard saved checkpoint add 'variant' field to customize filename * [checkpoint] Shard saved checkpoint add 'variant' field to customize filename * [checkpoint] Shard saved checkpoint add 'variant' field to customize filename * [checkpoint] Shard saved checkpoint add 'variant' field to customize filename --------- Co-authored-by: luchen <luchen@luchendeMacBook-Pro.local> Co-authored-by: luchen <luchen@luchendeMBP.lan>	2023-04-12 16:02:17 +08:00
Yuanchen	7182ac2a04	[chat]add examples of training with limited resources in chat readme (#3536 ) Co-authored-by: Yuanchen Xu <yuanchen.xu00@gmail.com>	2023-04-12 15:47:09 +08:00
zhang-yi-chi	e6a132a449	[chat]: add vf_coef argument for PPOTrainer (#3318 )	2023-04-11 09:54:59 +08:00
ver217	89fd10a1c9	[chat] add zero2 cpu strategy for sft training (#3520 )	2023-04-10 19:00:13 +08:00
binmakeswell	990d4c3e4e	[doc] hide diffusion in application path (#3519 ) - [ ] Stable Diffusion - [ ] Dreambooth It's easy for users to think that we don't support them yet. Add them after migrating them from example to application https://github.com/hpcaitech/ColossalAI/tree/main/examples/images	2023-04-10 17:52:24 +08:00
binmakeswell	0c0455700f	[doc] add requirement and highlight application (#3516 ) * [doc] add requirement and highlight application * [doc] link example and application	2023-04-10 17:37:16 +08:00
NatalieC323	635d0a1baf	[Chat Community] Update README.md (fixed#3487) (#3506 ) * Update README.md * Update README.md * Update README.md * Update README.md --------- Co-authored-by: Fazzie-Maqianli <55798671+Fazziekey@users.noreply.github.com>	2023-04-10 14:36:39 +08:00
YH	bcf0cbcbe7	[doc] Add docs for clip args in zero optim (#3504 )	2023-04-10 11:11:28 +08:00
gongenlei	a7ca297281	[coati] Fix LlamaCritic (#3475 ) * mv LlamaForCausalLM to LlamaModel * rm unused imports --------- Co-authored-by: gongenlei <gongenlei@baidu.com>	2023-04-07 11:39:09 +08:00
mandoxzhang	8f2c55f9c9	[example] remove redundant texts & update roberta (#3493 ) * update roberta example * update roberta example * modify conflict & update roberta	2023-04-07 11:33:32 +08:00
mandoxzhang	ab5fd127e3	[example] update roberta with newer ColossalAI (#3472 ) * update roberta example * update roberta example	2023-04-07 10:34:51 +08:00
NatalieC323	fb8fae6f29	Revert "[dreambooth] fixing the incompatibity in requirements.txt (#3190 ) (#3378 )" (#3481 )	2023-04-06 20:22:52 +08:00
binmakeswell	891b8e7fac	[chat] fix stage3 PPO sample sh command (#3477 )	2023-04-06 18:08:16 +08:00
NatalieC323	c701b77b11	[dreambooth] fixing the incompatibity in requirements.txt (#3190 ) (#3378 ) * Update requirements.txt * Update environment.yaml * Update README.md * Update environment.yaml * Update README.md * Update README.md * Delete requirements_colossalai.txt * Update requirements.txt * Update README.md	2023-04-06 17:50:52 +08:00
Frank Lee	4e9989344d	[doc] updated contributor list (#3474 )	2023-04-06 17:47:59 +08:00
jiangmingyan	52a933e175	[checkpoint] support huggingface style sharded checkpoint (#3461 ) * [checkpoint] support huggingface style sharded checkpoint * [checkpoint] support huggingface style sharded checkpoint * [checkpoint] support huggingface style sharded checkpoint * [checkpoint] support huggingface style sharded checkpoint * [checkpoint] support huggingface style sharded checkpoint --------- Co-authored-by: luchen <luchen@luchendeMBP.lan>	2023-04-06 16:23:39 +08:00
Fazzie-Maqianli	6afeb1202a	add community example dictionary (#3465 )	2023-04-06 15:04:48 +08:00
Frank Lee	80eba05b0a	[test] refactor tests with spawn (#3452 ) * [test] added spawn decorator * polish code * polish code * polish code * polish code * polish code * polish code	2023-04-06 14:51:35 +08:00
YY Lin	62f4e2eb07	[Chat]Add Peft support & fix the ptx bug (#3433 ) * Update ppo.py Fix the bug of fetching wrong batch data * Add peft model support in SFT and Prompts training In stage-1 and stage-3, the peft model supports are added. So the trained artifacts will be only a small lora additions instead of the whole bunch of files. * Delete test_prompts.txt * Delete test_pretrained.txt * Move the peft stuffs to a community folder. * Move the demo sft to community * delete dirty files * Add instructions to install peft using source * Remove Chinese comments * remove the Chinese comments	2023-04-06 11:54:52 +08:00
Dr-Corgi	73afb63594	[chat]fix save_model(#3377 ) The function save_model should be a part of PPOTrainer.	2023-04-06 11:19:14 +08:00
kingkingofall	57a3c4db6d	[chat]fix readme (#3429 ) * fix stage 2 fix stage 2 * add torch	2023-04-06 10:58:53 +08:00
Frank Lee	7d8d825681	[booster] fixed the torch ddp plugin with the new checkpoint api (#3442 )	2023-04-06 09:43:51 +08:00
YH	8f740deb53	Fix typo (#3448 )	2023-04-06 09:43:31 +08:00
ver217	933048ad3e	[test] reorganize zero/gemini tests (#3445 )	2023-04-06 09:38:25 +08:00
Camille Zhong	72cb4dd433	[Chat] fix the tokenizer "int too big to convert" error in SFT training (#3453 ) * Add RoBERTa for RLHF Stage 2 & 3 (test) RoBERTa for RLHF Stage 2 & 3 (still in testing) * Revert "Add RoBERTa for RLHF Stage 2 & 3 (test)" This reverts commit `06741d894d`. * Add RoBERTa for RLHF stage 2 & 3 1. add roberta folder under model folder 2. add roberta option in train_reward_model.py 3. add some test in testci * Update test_ci.sh * Revert "Update test_ci.sh" This reverts commit 9c7352b81766f3177d31eeec0ec178a301df966a. * Add RoBERTa for RLHF Stage 2 & 3 (test) RoBERTa for RLHF Stage 2 & 3 (still in testing) * Revert "Add RoBERTa for RLHF Stage 2 & 3 (test)" This reverts commit `06741d894d`. * Add RoBERTa for RLHF stage 2 & 3 1. add roberta folder under model folder 2. add roberta option in train_reward_model.py 3. add some test in testci * Update test_ci.sh * Revert "Update test_ci.sh" This reverts commit 9c7352b81766f3177d31eeec0ec178a301df966a. * update roberta with coati * chat ci update * Revert "chat ci update" This reverts commit 17ae7ae01fa752bd3289fc39069868fde99cf846. * [Chat] fix the tokenizer "int too big to convert" error in SFT training fix the tokenizer error during SFT training using Bloom and OPT	2023-04-06 09:30:28 +08:00
Hakjin Lee	46c009dba4	[format] Run lint on colossalai.engine (#3367 )	2023-04-05 23:24:43 +08:00
Yuanchen	b92313903f	fix save_model indent error in ppo trainer (#3450 ) Co-authored-by: Yuanchen Xu <yuanchen.xu00@gmail.com>	2023-04-05 09:45:42 +08:00
YuliangLiu0306	ffcdbf0f65	[autoparallel]integrate auto parallel feature with new tracer (#3408 ) * [autoparallel] integrate new analyzer in module level * unify the profiling method * polish * fix no codegen bug * fix pass bug * fix liveness test * polish	2023-04-04 17:40:45 +08:00
ver217	573af84184	[example] update examples related to zero/gemini (#3431 ) * [zero] update legacy import * [zero] update examples * [example] fix opt tutorial * [example] fix opt tutorial * [example] fix opt tutorial * [example] fix opt tutorial * [example] fix import	2023-04-04 17:32:51 +08:00
Yuanchen	773955abfa	fix save_model inin naive and ddp strategy (#3436 ) Co-authored-by: Yuanchen Xu <yuanchen.xu00@gmail.com>	2023-04-04 15:30:01 +08:00
Frank Lee	1beb85cc25	[checkpoint] refactored the API and added safetensors support (#3427 ) * [checkpoint] refactored the API and added safetensors support * polish code	2023-04-04 15:23:01 +08:00
ver217	26b7aac0be	[zero] reorganize zero/gemini folder structure (#3424 ) * [zero] refactor low-level zero folder structure * [zero] fix legacy zero import path * [zero] fix legacy zero import path * [zero] remove useless import * [zero] refactor gemini folder structure * [zero] refactor gemini folder structure * [zero] refactor legacy zero import path * [zero] refactor gemini folder structure * [zero] refactor gemini folder structure * [zero] refactor gemini folder structure * [zero] refactor legacy zero import path * [zero] fix test import path * [zero] fix test * [zero] fix circular import * [zero] update import	2023-04-04 13:48:16 +08:00
Yuanchen	b09adff724	[chat]fix sft training for bloom, gpt and opt (#3418 ) fix sft training for bloom, gpt and opt	2023-04-04 09:46:23 +08:00
Frank Lee	638a07a7f9	[test] fixed gemini plugin test (#3411 ) * [test] fixed gemini plugin test * polish code * polish code	2023-04-03 17:12:22 +08:00
Camille Zhong	30412866e0	[chatgpt] add pre-trained model RoBERTa for RLHF stage 2 & 3 (#3223 ) * Add RoBERTa for RLHF Stage 2 & 3 (test) RoBERTa for RLHF Stage 2 & 3 (still in testing) * Revert "Add RoBERTa for RLHF Stage 2 & 3 (test)" This reverts commit `06741d894d`. * Add RoBERTa for RLHF stage 2 & 3 1. add roberta folder under model folder 2. add roberta option in train_reward_model.py 3. add some test in testci * add test for reward model training * Update test_ci.sh * Revert "Update test_ci.sh" This reverts commit 9c7352b81766f3177d31eeec0ec178a301df966a. * Add RoBERTa for RLHF Stage 2 & 3 (test) RoBERTa for RLHF Stage 2 & 3 (still in testing) * Revert "Add RoBERTa for RLHF Stage 2 & 3 (test)" This reverts commit `06741d894d`. * Add RoBERTa for RLHF stage 2 & 3 1. add roberta folder under model folder 2. add roberta option in train_reward_model.py 3. add some test in testci * Update test_ci.sh * Revert "Update test_ci.sh" This reverts commit 9c7352b81766f3177d31eeec0ec178a301df966a. * update roberta with coati	2023-04-03 10:11:03 +08:00
Chris Sundström	94c24d9444	Improve grammar and punctuation (#3398 ) Minor changes to improve grammar and punctuation.	2023-04-02 22:00:57 +08:00
Jan Roudaut	dd367ce795	[doc] polish diffusion example (#3386 ) * [examples/images/diffusion]: README.md: typo fixes * Update README.md * Grammar fixes * Reformulated "Step 3" (xformers) introduction to the cost => at the cost + reworded pip availability.	2023-04-01 23:09:40 +08:00
Jan Roudaut	51cd2fec57	Typofix: malformed `xformers` version (#3384 ) s/0.12.0/0.0.12/	2023-03-31 23:32:44 +08:00
ver217	5f2e34e6c9	[booster] implement Gemini plugin (#3352 ) * [booster] add gemini plugin * [booster] update docstr * [booster] gemini plugin add coloparam convertor * [booster] fix coloparam convertor * [booster] fix gemini plugin device * [booster] add gemini plugin test * [booster] gemini plugin ignore sync bn * [booster] skip some model * [booster] skip some model * [booster] modify test world size * [booster] modify test world size * [booster] skip test	2023-03-31 16:06:13 +08:00
HELSON	1a1d68b053	[moe] add checkpoint for moe models (#3354 ) * [moe] add checkpoint for moe models * [hotfix] fix bugs in unit test	2023-03-31 09:20:33 +08:00

... 4 5 6 7 8 ...

2509 Commits (74d176c8d84235e1b68f537eb9022c2d0a4e09ca) All Branches Search

2509 Commits (74d176c8d84235e1b68f537eb9022c2d0a4e09ca)

All Branches