ColossalAI

Commit Graph

Author	SHA1	Message	Date
YeAnbang	84eab13078	update sft trainning script	6 months ago
YeAnbang	2abdede1d7	fix readme	6 months ago
YeAnbang	77db21610a	replace the customized dataloader setup with the build-in one	6 months ago
YeAnbang	0d7ff10ea5	replace the customized dataloader setup with the build-in one	6 months ago
YeAnbang	790e1362a6	merge	6 months ago
YeAnbang	ac1520cb8f	remove baichuan from template test due to transformer version conflict	6 months ago
YeAnbang	e16ccc272a	update ci	6 months ago
YeAnbang	45195ac53d	remove local data path	6 months ago
YeAnbang	bf57b13dda	remove models that require huggingface auth from ci	6 months ago
YeAnbang	0bbac158ed	fix datasets version	6 months ago
YeAnbang	62eb28b929	remove duplicated test	6 months ago
YeAnbang	b8b5cacf38	fix transformers version	6 months ago
pre-commit-ci[bot]	1b880ce095	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	6 months ago
YeAnbang	7ae87b3159	fix training script	6 months ago
YeAnbang	0b4a33548c	moupdate ci tests, st ci test cases passed, tp failed in generation for ppo, sp is buggy	6 months ago
YeAnbang	7e65b71815	run pre-commit	6 months ago
YeAnbang	929e1e3da4	upgrade ppo dpo rm script	6 months ago
YeAnbang	7a7e86987d	upgrade colossal-chat support tp_group>1, add sp for sft	6 months ago
Tong Li	913c920ecc	[Colossal-LLaMA] Fix sft issue for llama2 (#5719 ) * fix minor issue * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	6 months ago
Hongxin Liu	7f8b16635b	[misc] refactor launch API and tensor constructor (#5666 ) * [misc] remove config arg from initialize * [misc] remove old tensor contrusctor * [plugin] add npu support for ddp * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [devops] fix doc test ci * [test] fix test launch * [doc] update launch doc --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	7 months ago
linsj20	91fa553775	[Feature] qlora support (#5586 ) * [feature] qlora support * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * qlora follow commit * migrate qutization folder to colossalai/ * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fixes --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	7 months ago
Tong Li	862fbaaa62	[Feature] Support LLaMA-3 CPT and ST (#5619 ) * support LLaMA-3 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Run pre-commit --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	7 months ago
Camille Zhong	89049b0d89	[doc] fix ColossalMoE readme (#5599 ) * fix readme * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	7 months ago
Hongxin Liu	641b1ee71a	[devops] remove post commit ci (#5566 ) * [devops] remove post commit ci * [misc] run pre-commit on all files * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	8 months ago
digger yu	a799ca343b	[fix] fix typo s/muiti-node /multi-node etc. (#5448 )	8 months ago
Wenhao Chen	e614aa34f3	[shardformer, pipeline] add `gradient_checkpointing_ratio` and heterogenous shard policy for llama (#5508 ) * feat: add `GradientCheckpointConfig` and `PipelineGradientCheckpointConfig` * feat: apply `GradientCheckpointConfig` to policy and llama_forward * feat: move `distribute_layer` and `get_stage_index` to PipelineStageManager * fix: add optional args for `distribute_layer` and `get_stage_index` * fix: fix changed API calls * test: update llama tests * style: polish `GradientCheckpointConfig` * fix: fix pipeline utils tests	8 months ago
YeAnbang	df5e9c53cf	[ColossalChat] Update RLHF V2 (#5286 ) * Add dpo. Fix sft, ppo, lora. Refactor all * fix and tested ppo * 2 nd round refactor * add ci tests * fix ci * fix ci * fix readme, style * fix readme style * fix style, fix benchmark * reproduce benchmark result, remove useless files * rename to ColossalChat * use new image * fix ci workflow * fix ci * use local model/tokenizer for ci tests * fix ci * fix ci * fix ci * fix ci timeout * fix rm progress bar. fix ci timeout * fix ci * fix ci typo * remove 3d plugin from ci temporary * test environment * cannot save optimizer * support chat template * fix readme * fix path * test ci locally * restore build_or_pr * fix ci data path * fix benchmark * fix ci, move ci tests to 3080, disable fast tokenizer * move ci to 85 * support flash attention 2 * add all-in-one data preparation script. Fix colossal-llama2-chat chat template * add hardware requirements * move ci test data * fix save_model, add unwrap * fix missing bos * fix missing bos; support grad accumulation with gemini * fix ci * fix ci * fix ci * fix llama2 chat template config * debug sft * debug sft * fix colossalai version requirement * fix ci * add sanity check to prevent NaN loss * fix requirements * add dummy data generation script * add dummy data generation script * add dummy data generation script * add dummy data generation script * update readme * update readme * update readme and ignore * fix logger bug * support parallel_output * modify data preparation logic * fix tokenization * update lr * fix inference * run pre-commit --------- Co-authored-by: Tong Li <tong.li352711588@gmail.com>	8 months ago
Insu Jang	00525f7772	[shardformer] fix pipeline forward error if custom layer distribution is used (#5189 ) * Use self.[distribute_layers\|get_stage_index] to exploit custom layer distribution * Change static methods for t5 layer distribution to member functions * Change static methods for whisper layer distribution to member functions * Replace whisper policy usage with self one * Fix test case to use non-static layer distribution methods * fix: fix typo --------- Co-authored-by: Wenhao Chen <cwher@outlook.com>	8 months ago
Wenhao Chen	bb0a668fee	[hotfix] set return_outputs=False in examples and polish code (#5404 ) * fix: simplify merge_batch * fix: use return_outputs=False to eliminate extra memory consumption * feat: add return_outputs warning * style: remove `return_outputs=False` as it is the default value	8 months ago
binmakeswell	d158fc0e64	[doc] update open-sora demo (#5479 ) * [doc] update open-sora demo * [doc] update open-sora demo * [doc] update open-sora demo	8 months ago
digger yu	385e85afd4	[hotfix] fix typo s/keywrods/keywords etc. (#5429 )	9 months ago
Camille Zhong	da885ed540	fix tensor data update for gemini loss caluculation (#5442 )	9 months ago
Camille Zhong	743e7fad2f	[colossal-llama2] add stream chat examlple for chat version model (#5428 ) * add stream chat for chat version * remove os.system clear * modify function name	9 months ago
hugo-syn	c8003d463b	[doc] Fix typo s/infered/inferred/ (#5288 ) Signed-off-by: hugo-syn <hugo.vincent@synacktiv.com>	9 months ago
Dongruixuan Li	a7ae2b5b4c	[eval-hotfix] set few_shot_data to None when few shot is disabled (#5422 )	9 months ago
binmakeswell	822241a99c	[doc] sora release (#5425 ) * [doc] sora release * [doc] sora release * [doc] sora release * [doc] sora release	9 months ago
Camille Zhong	4b8312c08e	fix sft single turn inference example (#5416 )	9 months ago
Tong Li	a28c971516	update requirements (#5407 )	9 months ago
CZYCW	b833153fd5	[hotfix] fix variable type for top_p (#5313 ) Co-authored-by: binmakeswell <binmakeswell@gmail.com>	9 months ago
Hongxin Liu	7303801854	[llama] fix training and inference scripts (#5384 ) * [llama] refactor inference example to fit sft * [llama] fix training script to fit gemini * [llama] fix inference script	9 months ago
Frank Lee	efef43b53c	Merge pull request #5372 from hpcaitech/exp/mixtral	10 months ago
Hongxin Liu	65e5d6baa5	[moe] fix mixtral optim checkpoint (#5344 )	10 months ago
Hongxin Liu	956b561b54	[moe] fix mixtral forward default value (#5329 )	10 months ago
Hongxin Liu	b60be18dcc	[moe] fix mixtral checkpoint io (#5314 )	10 months ago
Hongxin Liu	da39d21b71	[moe] support mixtral (#5309 ) * [moe] add mixtral block for single expert * [moe] mixtral block fwd support uneven ep * [moe] mixtral block bwd support uneven ep * [moe] add mixtral moe layer * [moe] simplify replace * [meo] support save sharded mixtral * [meo] support load sharded mixtral * [meo] support save sharded optim * [meo] integrate moe manager into plug * [meo] fix optimizer load * [meo] fix mixtral layer	10 months ago
Hongxin Liu	c904d2ae99	[moe] update capacity computing (#5253 ) * [moe] top2 allow uneven input * [moe] update capacity computing * [moe] remove debug info * [moe] update capacity computing * [moe] update capacity computing	10 months ago
Xuanlei Zhao	7d8e0338a4	[moe] init mixtral impl	10 months ago
Hongxin Liu	084c91246c	[llama] fix memory issue (#5371 ) * [llama] fix memory issue * [llama] add comment	10 months ago
Hongxin Liu	eb4f2d90f9	[llama] polish training script and fix optim ckpt (#5368 )	10 months ago
Camille Zhong	a5756a8720	[eval] update llama npu eval (#5366 )	10 months ago

1 2 3 4 5 ...

272 Commits (7f9ec599be461cef555f4da2f796b46a3631d18f)