ColossalAI

Commit Graph

Author	SHA1	Message	Date
jiangmingyan	366a035552	[checkpoint] Shard saved checkpoint need to be compatible with the naming format of hf checkpoint files (#3479 ) * [checkpoint] support huggingface style sharded checkpoint, to be compatible with hf file naming format * [checkpoint] support huggingface style sharded checkpoint, to be compatible with hf file naming format * [checkpoint] Shard saved checkpoint add 'variant' field to customize filename * [checkpoint] Shard saved checkpoint add 'variant' field to customize filename * [checkpoint] Shard saved checkpoint add 'variant' field to customize filename * [checkpoint] Shard saved checkpoint add 'variant' field to customize filename --------- Co-authored-by: luchen <luchen@luchendeMacBook-Pro.local> Co-authored-by: luchen <luchen@luchendeMBP.lan>	2 years ago
Yuanchen	7182ac2a04	[chat]add examples of training with limited resources in chat readme (#3536 ) Co-authored-by: Yuanchen Xu <yuanchen.xu00@gmail.com>	2 years ago
zhang-yi-chi	e6a132a449	[chat]: add vf_coef argument for PPOTrainer (#3318 )	2 years ago
ver217	89fd10a1c9	[chat] add zero2 cpu strategy for sft training (#3520 )	2 years ago
binmakeswell	990d4c3e4e	[doc] hide diffusion in application path (#3519 ) - [ ] Stable Diffusion - [ ] Dreambooth It's easy for users to think that we don't support them yet. Add them after migrating them from example to application https://github.com/hpcaitech/ColossalAI/tree/main/examples/images	2 years ago
binmakeswell	0c0455700f	[doc] add requirement and highlight application (#3516 ) * [doc] add requirement and highlight application * [doc] link example and application	2 years ago
NatalieC323	635d0a1baf	[Chat Community] Update README.md (fixed#3487) (#3506 ) * Update README.md * Update README.md * Update README.md * Update README.md --------- Co-authored-by: Fazzie-Maqianli <55798671+Fazziekey@users.noreply.github.com>	2 years ago
YH	bcf0cbcbe7	[doc] Add docs for clip args in zero optim (#3504 )	2 years ago
gongenlei	a7ca297281	[coati] Fix LlamaCritic (#3475 ) * mv LlamaForCausalLM to LlamaModel * rm unused imports --------- Co-authored-by: gongenlei <gongenlei@baidu.com>	2 years ago
mandoxzhang	8f2c55f9c9	[example] remove redundant texts & update roberta (#3493 ) * update roberta example * update roberta example * modify conflict & update roberta	2 years ago
mandoxzhang	ab5fd127e3	[example] update roberta with newer ColossalAI (#3472 ) * update roberta example * update roberta example	2 years ago
NatalieC323	fb8fae6f29	Revert "[dreambooth] fixing the incompatibity in requirements.txt (#3190 ) (#3378 )" (#3481 )	2 years ago
binmakeswell	891b8e7fac	[chat] fix stage3 PPO sample sh command (#3477 )	2 years ago
NatalieC323	c701b77b11	[dreambooth] fixing the incompatibity in requirements.txt (#3190 ) (#3378 ) * Update requirements.txt * Update environment.yaml * Update README.md * Update environment.yaml * Update README.md * Update README.md * Delete requirements_colossalai.txt * Update requirements.txt * Update README.md	2 years ago
Frank Lee	4e9989344d	[doc] updated contributor list (#3474 )	2 years ago
jiangmingyan	52a933e175	[checkpoint] support huggingface style sharded checkpoint (#3461 ) * [checkpoint] support huggingface style sharded checkpoint * [checkpoint] support huggingface style sharded checkpoint * [checkpoint] support huggingface style sharded checkpoint * [checkpoint] support huggingface style sharded checkpoint * [checkpoint] support huggingface style sharded checkpoint --------- Co-authored-by: luchen <luchen@luchendeMBP.lan>	2 years ago
Fazzie-Maqianli	6afeb1202a	add community example dictionary (#3465 )	2 years ago
Frank Lee	80eba05b0a	[test] refactor tests with spawn (#3452 ) * [test] added spawn decorator * polish code * polish code * polish code * polish code * polish code * polish code	2 years ago
YY Lin	62f4e2eb07	[Chat]Add Peft support & fix the ptx bug (#3433 ) * Update ppo.py Fix the bug of fetching wrong batch data * Add peft model support in SFT and Prompts training In stage-1 and stage-3, the peft model supports are added. So the trained artifacts will be only a small lora additions instead of the whole bunch of files. * Delete test_prompts.txt * Delete test_pretrained.txt * Move the peft stuffs to a community folder. * Move the demo sft to community * delete dirty files * Add instructions to install peft using source * Remove Chinese comments * remove the Chinese comments	2 years ago
Dr-Corgi	73afb63594	[chat]fix save_model(#3377 ) The function save_model should be a part of PPOTrainer.	2 years ago
kingkingofall	57a3c4db6d	[chat]fix readme (#3429 ) * fix stage 2 fix stage 2 * add torch	2 years ago
Frank Lee	7d8d825681	[booster] fixed the torch ddp plugin with the new checkpoint api (#3442 )	2 years ago
YH	8f740deb53	Fix typo (#3448 )	2 years ago
ver217	933048ad3e	[test] reorganize zero/gemini tests (#3445 )	2 years ago
Camille Zhong	72cb4dd433	[Chat] fix the tokenizer "int too big to convert" error in SFT training (#3453 ) * Add RoBERTa for RLHF Stage 2 & 3 (test) RoBERTa for RLHF Stage 2 & 3 (still in testing) * Revert "Add RoBERTa for RLHF Stage 2 & 3 (test)" This reverts commit `06741d894d`. * Add RoBERTa for RLHF stage 2 & 3 1. add roberta folder under model folder 2. add roberta option in train_reward_model.py 3. add some test in testci * Update test_ci.sh * Revert "Update test_ci.sh" This reverts commit 9c7352b81766f3177d31eeec0ec178a301df966a. * Add RoBERTa for RLHF Stage 2 & 3 (test) RoBERTa for RLHF Stage 2 & 3 (still in testing) * Revert "Add RoBERTa for RLHF Stage 2 & 3 (test)" This reverts commit `06741d894d`. * Add RoBERTa for RLHF stage 2 & 3 1. add roberta folder under model folder 2. add roberta option in train_reward_model.py 3. add some test in testci * Update test_ci.sh * Revert "Update test_ci.sh" This reverts commit 9c7352b81766f3177d31eeec0ec178a301df966a. * update roberta with coati * chat ci update * Revert "chat ci update" This reverts commit 17ae7ae01fa752bd3289fc39069868fde99cf846. * [Chat] fix the tokenizer "int too big to convert" error in SFT training fix the tokenizer error during SFT training using Bloom and OPT	2 years ago
Hakjin Lee	46c009dba4	[format] Run lint on colossalai.engine (#3367 )	2 years ago
Yuanchen	b92313903f	fix save_model indent error in ppo trainer (#3450 ) Co-authored-by: Yuanchen Xu <yuanchen.xu00@gmail.com>	2 years ago
YuliangLiu0306	ffcdbf0f65	[autoparallel]integrate auto parallel feature with new tracer (#3408 ) * [autoparallel] integrate new analyzer in module level * unify the profiling method * polish * fix no codegen bug * fix pass bug * fix liveness test * polish	2 years ago
ver217	573af84184	[example] update examples related to zero/gemini (#3431 ) * [zero] update legacy import * [zero] update examples * [example] fix opt tutorial * [example] fix opt tutorial * [example] fix opt tutorial * [example] fix opt tutorial * [example] fix import	2 years ago
Yuanchen	773955abfa	fix save_model inin naive and ddp strategy (#3436 ) Co-authored-by: Yuanchen Xu <yuanchen.xu00@gmail.com>	2 years ago
Frank Lee	1beb85cc25	[checkpoint] refactored the API and added safetensors support (#3427 ) * [checkpoint] refactored the API and added safetensors support * polish code	2 years ago
ver217	26b7aac0be	[zero] reorganize zero/gemini folder structure (#3424 ) * [zero] refactor low-level zero folder structure * [zero] fix legacy zero import path * [zero] fix legacy zero import path * [zero] remove useless import * [zero] refactor gemini folder structure * [zero] refactor gemini folder structure * [zero] refactor legacy zero import path * [zero] refactor gemini folder structure * [zero] refactor gemini folder structure * [zero] refactor gemini folder structure * [zero] refactor legacy zero import path * [zero] fix test import path * [zero] fix test * [zero] fix circular import * [zero] update import	2 years ago
Yuanchen	b09adff724	[chat]fix sft training for bloom, gpt and opt (#3418 ) fix sft training for bloom, gpt and opt	2 years ago
Frank Lee	638a07a7f9	[test] fixed gemini plugin test (#3411 ) * [test] fixed gemini plugin test * polish code * polish code	2 years ago
Camille Zhong	30412866e0	[chatgpt] add pre-trained model RoBERTa for RLHF stage 2 & 3 (#3223 ) * Add RoBERTa for RLHF Stage 2 & 3 (test) RoBERTa for RLHF Stage 2 & 3 (still in testing) * Revert "Add RoBERTa for RLHF Stage 2 & 3 (test)" This reverts commit `06741d894d`. * Add RoBERTa for RLHF stage 2 & 3 1. add roberta folder under model folder 2. add roberta option in train_reward_model.py 3. add some test in testci * add test for reward model training * Update test_ci.sh * Revert "Update test_ci.sh" This reverts commit 9c7352b81766f3177d31eeec0ec178a301df966a. * Add RoBERTa for RLHF Stage 2 & 3 (test) RoBERTa for RLHF Stage 2 & 3 (still in testing) * Revert "Add RoBERTa for RLHF Stage 2 & 3 (test)" This reverts commit `06741d894d`. * Add RoBERTa for RLHF stage 2 & 3 1. add roberta folder under model folder 2. add roberta option in train_reward_model.py 3. add some test in testci * Update test_ci.sh * Revert "Update test_ci.sh" This reverts commit 9c7352b81766f3177d31eeec0ec178a301df966a. * update roberta with coati	2 years ago
Chris Sundström	94c24d9444	Improve grammar and punctuation (#3398 ) Minor changes to improve grammar and punctuation.	2 years ago
Jan Roudaut	dd367ce795	[doc] polish diffusion example (#3386 ) * [examples/images/diffusion]: README.md: typo fixes * Update README.md * Grammar fixes * Reformulated "Step 3" (xformers) introduction to the cost => at the cost + reworded pip availability.	2 years ago
Jan Roudaut	51cd2fec57	Typofix: malformed `xformers` version (#3384 ) s/0.12.0/0.0.12/	2 years ago
ver217	5f2e34e6c9	[booster] implement Gemini plugin (#3352 ) * [booster] add gemini plugin * [booster] update docstr * [booster] gemini plugin add coloparam convertor * [booster] fix coloparam convertor * [booster] fix gemini plugin device * [booster] add gemini plugin test * [booster] gemini plugin ignore sync bn * [booster] skip some model * [booster] skip some model * [booster] modify test world size * [booster] modify test world size * [booster] skip test	2 years ago
HELSON	1a1d68b053	[moe] add checkpoint for moe models (#3354 ) * [moe] add checkpoint for moe models * [hotfix] fix bugs in unit test	2 years ago
YuliangLiu0306	fee2af8610	[autoparallel] adapt autoparallel with new analyzer (#3261 ) * [autoparallel] adapt autoparallel with new analyzer * fix all node handler tests * polish * polish	2 years ago
アマデウス	e78a1e949a	fix torch 2.0 compatibility (#3346 )	2 years ago
Ofey Chan	8706a8c66c	[NFC] polish colossalai/engine/gradient_handler/__init__.py code style (#3329 )	2 years ago
yuxuan-lou	198a74b9fd	[NFC] polish colossalai/context/random/__init__.py code style (#3327 )	2 years ago
Andrew	82132f4e3d	[chat] correcting a few obvious typos and grammars errors (#3338 )	2 years ago
YuliangLiu0306	fbd2a9e05b	[hotfix] meta_tensor_compatibility_with_torch2	2 years ago
binmakeswell	15a74da79c	[doc] add Intel cooperation news (#3333 ) * [doc] add Intel cooperation news * [doc] add Intel cooperation news	2 years ago
Michelle	ad285e1656	[NFC] polish colossalai/fx/tracer/_tracer_utils.py (#3323 ) * [NFC] polish colossalai/engine/schedule/_pipeline_schedule.py code style * [NFC] polish colossalai/fx/tracer/_tracer_utils.py code style --------- Co-authored-by: Qianran Ma <qianranm@luchentech.com>	2 years ago
Xu Kai	64350029fe	[NFC] polish colossalai/gemini/paramhooks/_param_hookmgr.py code style	2 years ago
RichardoLuo	1ce9d0c531	[NFC] polish initializer_data.py code style (#3287 )	2 years ago

1 2 3 4 5 ...

2249 Commits (366a035552ff62d5f3dd9750bc9d263c2aa60dbc) All Branches Search

2249 Commits (366a035552ff62d5f3dd9750bc9d263c2aa60dbc)

All Branches