ColossalAI

Commit Graph

Author	SHA1	Message	Date
Wenhao Chen	e614aa34f3	[shardformer, pipeline] add `gradient_checkpointing_ratio` and heterogenous shard policy for llama (#5508 ) * feat: add `GradientCheckpointConfig` and `PipelineGradientCheckpointConfig` * feat: apply `GradientCheckpointConfig` to policy and llama_forward * feat: move `distribute_layer` and `get_stage_index` to PipelineStageManager * fix: add optional args for `distribute_layer` and `get_stage_index` * fix: fix changed API calls * test: update llama tests * style: polish `GradientCheckpointConfig` * fix: fix pipeline utils tests	8 months ago
YeAnbang	df5e9c53cf	[ColossalChat] Update RLHF V2 (#5286 ) * Add dpo. Fix sft, ppo, lora. Refactor all * fix and tested ppo * 2 nd round refactor * add ci tests * fix ci * fix ci * fix readme, style * fix readme style * fix style, fix benchmark * reproduce benchmark result, remove useless files * rename to ColossalChat * use new image * fix ci workflow * fix ci * use local model/tokenizer for ci tests * fix ci * fix ci * fix ci * fix ci timeout * fix rm progress bar. fix ci timeout * fix ci * fix ci typo * remove 3d plugin from ci temporary * test environment * cannot save optimizer * support chat template * fix readme * fix path * test ci locally * restore build_or_pr * fix ci data path * fix benchmark * fix ci, move ci tests to 3080, disable fast tokenizer * move ci to 85 * support flash attention 2 * add all-in-one data preparation script. Fix colossal-llama2-chat chat template * add hardware requirements * move ci test data * fix save_model, add unwrap * fix missing bos * fix missing bos; support grad accumulation with gemini * fix ci * fix ci * fix ci * fix llama2 chat template config * debug sft * debug sft * fix colossalai version requirement * fix ci * add sanity check to prevent NaN loss * fix requirements * add dummy data generation script * add dummy data generation script * add dummy data generation script * add dummy data generation script * update readme * update readme * update readme and ignore * fix logger bug * support parallel_output * modify data preparation logic * fix tokenization * update lr * fix inference * run pre-commit --------- Co-authored-by: Tong Li <tong.li352711588@gmail.com>	8 months ago
Insu Jang	00525f7772	[shardformer] fix pipeline forward error if custom layer distribution is used (#5189 ) * Use self.[distribute_layers\|get_stage_index] to exploit custom layer distribution * Change static methods for t5 layer distribution to member functions * Change static methods for whisper layer distribution to member functions * Replace whisper policy usage with self one * Fix test case to use non-static layer distribution methods * fix: fix typo --------- Co-authored-by: Wenhao Chen <cwher@outlook.com>	8 months ago
Wenhao Chen	bb0a668fee	[hotfix] set return_outputs=False in examples and polish code (#5404 ) * fix: simplify merge_batch * fix: use return_outputs=False to eliminate extra memory consumption * feat: add return_outputs warning * style: remove `return_outputs=False` as it is the default value	8 months ago
binmakeswell	d158fc0e64	[doc] update open-sora demo (#5479 ) * [doc] update open-sora demo * [doc] update open-sora demo * [doc] update open-sora demo	8 months ago
digger yu	385e85afd4	[hotfix] fix typo s/keywrods/keywords etc. (#5429 )	9 months ago
Camille Zhong	da885ed540	fix tensor data update for gemini loss caluculation (#5442 )	9 months ago
Camille Zhong	743e7fad2f	[colossal-llama2] add stream chat examlple for chat version model (#5428 ) * add stream chat for chat version * remove os.system clear * modify function name	9 months ago
hugo-syn	c8003d463b	[doc] Fix typo s/infered/inferred/ (#5288 ) Signed-off-by: hugo-syn <hugo.vincent@synacktiv.com>	9 months ago
Dongruixuan Li	a7ae2b5b4c	[eval-hotfix] set few_shot_data to None when few shot is disabled (#5422 )	9 months ago
binmakeswell	822241a99c	[doc] sora release (#5425 ) * [doc] sora release * [doc] sora release * [doc] sora release * [doc] sora release	9 months ago
Camille Zhong	4b8312c08e	fix sft single turn inference example (#5416 )	9 months ago
Tong Li	a28c971516	update requirements (#5407 )	9 months ago
CZYCW	b833153fd5	[hotfix] fix variable type for top_p (#5313 ) Co-authored-by: binmakeswell <binmakeswell@gmail.com>	9 months ago
Hongxin Liu	7303801854	[llama] fix training and inference scripts (#5384 ) * [llama] refactor inference example to fit sft * [llama] fix training script to fit gemini * [llama] fix inference script	9 months ago
Frank Lee	efef43b53c	Merge pull request #5372 from hpcaitech/exp/mixtral	10 months ago
Hongxin Liu	65e5d6baa5	[moe] fix mixtral optim checkpoint (#5344 )	10 months ago
Hongxin Liu	956b561b54	[moe] fix mixtral forward default value (#5329 )	10 months ago
Hongxin Liu	b60be18dcc	[moe] fix mixtral checkpoint io (#5314 )	10 months ago
Hongxin Liu	da39d21b71	[moe] support mixtral (#5309 ) * [moe] add mixtral block for single expert * [moe] mixtral block fwd support uneven ep * [moe] mixtral block bwd support uneven ep * [moe] add mixtral moe layer * [moe] simplify replace * [meo] support save sharded mixtral * [meo] support load sharded mixtral * [meo] support save sharded optim * [meo] integrate moe manager into plug * [meo] fix optimizer load * [meo] fix mixtral layer	10 months ago
Hongxin Liu	c904d2ae99	[moe] update capacity computing (#5253 ) * [moe] top2 allow uneven input * [moe] update capacity computing * [moe] remove debug info * [moe] update capacity computing * [moe] update capacity computing	10 months ago
Xuanlei Zhao	7d8e0338a4	[moe] init mixtral impl	10 months ago
Hongxin Liu	084c91246c	[llama] fix memory issue (#5371 ) * [llama] fix memory issue * [llama] add comment	10 months ago
Hongxin Liu	eb4f2d90f9	[llama] polish training script and fix optim ckpt (#5368 )	10 months ago
Camille Zhong	a5756a8720	[eval] update llama npu eval (#5366 )	10 months ago
Camille Zhong	44ca61a22b	[llama] fix neftune & pbar with start_step (#5364 )	10 months ago
Hongxin Liu	a4cec1715b	[llama] add flash attn patch for npu (#5362 )	10 months ago
Hongxin Liu	73f9f23fc6	[llama] update training script (#5360 ) * [llama] update training script * [doc] polish docstr	10 months ago
Hongxin Liu	6c0fa7b9a8	[llama] fix dataloader for hybrid parallel (#5358 ) * [plugin] refactor prepare dataloader * [plugin] update train script	10 months ago
YeAnbang	c5239840e6	[Chat] fix sft loss nan (#5345 ) * fix script * fix script * fix chat nan * fix chat nan	10 months ago
Frank Lee	8823cc4831	Merge pull request #5310 from hpcaitech/feature/npu Feature/npu	10 months ago
李文军	ec912b1ba9	[NFC] polish applications/Colossal-LLaMA-2/colossal_llama2/tokenizer/init_tokenizer.py code style (#5228 )	10 months ago
Desperado-Jia	ddf879e2db	fix bug for mefture (#5299 )	10 months ago
Michelle	32cb74493a	fix auto loading gpt2 tokenizer (#5279 )	10 months ago
ver217	148469348a	Merge branch 'main' into sync/npu	10 months ago
digger yu	756c400ad2	fix typo in applications/ColossalEval/README.md (#5250 )	11 months ago
digger yu	41e52c1c6e	[doc] fix typo in Colossal-LLaMA-2/README.md (#5247 )	11 months ago
Hongxin Liu	d202cc28c0	[npu] change device to accelerator api (#5239 ) * update accelerator * fix timer * fix amp * update * fix * update bug * add error raise * fix autocast * fix set device * remove doc accelerator * update doc * update doc * update doc * use nullcontext * update cpu * update null context * change time limit for example * udpate * update * update * update * [npu] polish accelerator code --------- Co-authored-by: Xuanlei Zhao <xuanlei.zhao@gmail.com> Co-authored-by: zxl <43881818+oahzxl@users.noreply.github.com>	11 months ago
binmakeswell	7bc6969ce6	[doc] SwiftInfer release (#5236 ) * [doc] SwiftInfer release * [doc] SwiftInfer release * [doc] SwiftInfer release * [doc] SwiftInfer release * [doc] SwiftInfer release	11 months ago
github-actions[bot]	4fb4a22a72	[format] applied code formatting on changed files in pull request 5234 (#5235 ) Co-authored-by: github-actions <github-actions@github.com>	11 months ago
binmakeswell	b9b32b15e6	[doc] add Colossal-LLaMA-2-13B (#5234 ) * [doc] add Colossal-LLaMA-2-13B * [doc] add Colossal-LLaMA-2-13B * [doc] add Colossal-LLaMA-2-13B	11 months ago
Camille Zhong	915b4652f3	[doc] Update README.md of Colossal-LLAMA2 (#5233 ) * Update README.md * Update README.md	11 months ago
Tong Li	d992b55968	[Colossal-LLaMA-2] Release Colossal-LLaMA-2-13b-base model (#5224 ) * update readme * update readme * update link * update * update readme * update * update * update * update title * update example * update example * fix content * add conclusion * add license * update * update * update version * fix minor	11 months ago
Yuanchen	eae01b6740	Improve logic for selecting metrics (#5196 ) Co-authored-by: Xu <yuanchen.xu00@gmail.com>	11 months ago
BlueRum	af952673f7	polish readme in application/chat (#5194 )	11 months ago
Yuanchen	3ff60d13b0	Fix ColossalEval (#5186 ) Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>	12 months ago
Yuanchen	cefdc32615	[ColossalEval] Support GSM, Data Leakage Evaluation and Tensor Parallel (#5169 ) * Support GSM, Data Leakage Evaluation and Tensor Parallel * remove redundant code and update inference.py in examples/gpt_evaluation --------- Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>	12 months ago
Michelle	b07a6f4e27	[colossalqa] fix pangu api (#5170 ) * fix pangu api * add comment	12 months ago
Yuanchen	b397104438	[Colossal-Llama-2] Add finetuning Colossal-Llama-2 example (#4878 ) * Add finetuning Colossal-Llama-2 example * Add finetuning Colossal-Llama-2 example 2 * Add finetuning Colossal-Llama-2 example and support NEFTuning * Add inference example and refine neftune * Modify readme file * update the imports --------- Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com> Co-authored-by: Camille Zhong <44392324+Camille7777@users.noreply.github.com>	12 months ago
Michelle	368b5e3d64	[doc] fix colossalqa document (#5146 ) * fix doc * modify doc	12 months ago

1 2 3 4 5

247 Commits (15055f9a36d215e5a4fba8658e1b04c895881da6)