ColossalAI

Commit Graph

Author	SHA1	Message	Date
YeAnbang	84eab13078	update sft trainning script	2024-06-11 02:44:20 +00:00
YeAnbang	2abdede1d7	fix readme	2024-06-10 01:08:42 +00:00
YeAnbang	77db21610a	replace the customized dataloader setup with the build-in one	2024-06-07 09:44:25 +00:00
YeAnbang	0d7ff10ea5	replace the customized dataloader setup with the build-in one	2024-06-07 09:43:42 +00:00
YeAnbang	790e1362a6	merge	2024-06-07 07:01:32 +00:00
YeAnbang	ac1520cb8f	remove baichuan from template test due to transformer version conflict	2024-06-07 07:01:32 +00:00
YeAnbang	e16ccc272a	update ci	2024-06-07 07:01:32 +00:00
YeAnbang	45195ac53d	remove local data path	2024-06-07 07:01:31 +00:00
YeAnbang	bf57b13dda	remove models that require huggingface auth from ci	2024-06-07 07:01:31 +00:00
YeAnbang	0bbac158ed	fix datasets version	2024-06-07 07:01:31 +00:00
YeAnbang	62eb28b929	remove duplicated test	2024-06-07 07:01:31 +00:00
YeAnbang	b8b5cacf38	fix transformers version	2024-06-07 07:01:31 +00:00
pre-commit-ci[bot]	1b880ce095	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2024-06-07 07:01:31 +00:00
YeAnbang	7ae87b3159	fix training script	2024-06-07 07:01:31 +00:00
YeAnbang	0b4a33548c	moupdate ci tests, st ci test cases passed, tp failed in generation for ppo, sp is buggy	2024-06-07 07:01:31 +00:00
YeAnbang	7e65b71815	run pre-commit	2024-06-07 07:01:30 +00:00
YeAnbang	929e1e3da4	upgrade ppo dpo rm script	2024-06-07 07:01:30 +00:00
YeAnbang	7a7e86987d	upgrade colossal-chat support tp_group>1, add sp for sft	2024-06-07 07:01:30 +00:00
Tong Li	913c920ecc	[Colossal-LLaMA] Fix sft issue for llama2 (#5719 ) * fix minor issue * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2024-05-15 10:52:11 +08:00
Hongxin Liu	7f8b16635b	[misc] refactor launch API and tensor constructor (#5666 ) * [misc] remove config arg from initialize * [misc] remove old tensor contrusctor * [plugin] add npu support for ddp * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [devops] fix doc test ci * [test] fix test launch * [doc] update launch doc --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2024-04-29 10:40:11 +08:00
linsj20	91fa553775	[Feature] qlora support (#5586 ) * [feature] qlora support * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * qlora follow commit * migrate qutization folder to colossalai/ * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fixes --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2024-04-28 10:51:27 +08:00
Tong Li	862fbaaa62	[Feature] Support LLaMA-3 CPT and ST (#5619 ) * support LLaMA-3 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Run pre-commit --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2024-04-23 13:54:05 +08:00
Camille Zhong	89049b0d89	[doc] fix ColossalMoE readme (#5599 ) * fix readme * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2024-04-15 18:06:18 +08:00
Hongxin Liu	641b1ee71a	[devops] remove post commit ci (#5566 ) * [devops] remove post commit ci * [misc] run pre-commit on all files * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2024-04-08 15:09:40 +08:00
digger yu	a799ca343b	[fix] fix typo s/muiti-node /multi-node etc. (#5448 )	2024-04-07 18:42:15 +08:00
Wenhao Chen	e614aa34f3	[shardformer, pipeline] add `gradient_checkpointing_ratio` and heterogenous shard policy for llama (#5508 ) * feat: add `GradientCheckpointConfig` and `PipelineGradientCheckpointConfig` * feat: apply `GradientCheckpointConfig` to policy and llama_forward * feat: move `distribute_layer` and `get_stage_index` to PipelineStageManager * fix: add optional args for `distribute_layer` and `get_stage_index` * fix: fix changed API calls * test: update llama tests * style: polish `GradientCheckpointConfig` * fix: fix pipeline utils tests	2024-04-01 11:34:58 +08:00
YeAnbang	df5e9c53cf	[ColossalChat] Update RLHF V2 (#5286 ) * Add dpo. Fix sft, ppo, lora. Refactor all * fix and tested ppo * 2 nd round refactor * add ci tests * fix ci * fix ci * fix readme, style * fix readme style * fix style, fix benchmark * reproduce benchmark result, remove useless files * rename to ColossalChat * use new image * fix ci workflow * fix ci * use local model/tokenizer for ci tests * fix ci * fix ci * fix ci * fix ci timeout * fix rm progress bar. fix ci timeout * fix ci * fix ci typo * remove 3d plugin from ci temporary * test environment * cannot save optimizer * support chat template * fix readme * fix path * test ci locally * restore build_or_pr * fix ci data path * fix benchmark * fix ci, move ci tests to 3080, disable fast tokenizer * move ci to 85 * support flash attention 2 * add all-in-one data preparation script. Fix colossal-llama2-chat chat template * add hardware requirements * move ci test data * fix save_model, add unwrap * fix missing bos * fix missing bos; support grad accumulation with gemini * fix ci * fix ci * fix ci * fix llama2 chat template config * debug sft * debug sft * fix colossalai version requirement * fix ci * add sanity check to prevent NaN loss * fix requirements * add dummy data generation script * add dummy data generation script * add dummy data generation script * add dummy data generation script * update readme * update readme * update readme and ignore * fix logger bug * support parallel_output * modify data preparation logic * fix tokenization * update lr * fix inference * run pre-commit --------- Co-authored-by: Tong Li <tong.li352711588@gmail.com>	2024-03-29 14:12:29 +08:00
Insu Jang	00525f7772	[shardformer] fix pipeline forward error if custom layer distribution is used (#5189 ) * Use self.[distribute_layers\|get_stage_index] to exploit custom layer distribution * Change static methods for t5 layer distribution to member functions * Change static methods for whisper layer distribution to member functions * Replace whisper policy usage with self one * Fix test case to use non-static layer distribution methods * fix: fix typo --------- Co-authored-by: Wenhao Chen <cwher@outlook.com>	2024-03-27 13:57:00 +08:00
Wenhao Chen	bb0a668fee	[hotfix] set return_outputs=False in examples and polish code (#5404 ) * fix: simplify merge_batch * fix: use return_outputs=False to eliminate extra memory consumption * feat: add return_outputs warning * style: remove `return_outputs=False` as it is the default value	2024-03-25 12:31:09 +08:00
binmakeswell	d158fc0e64	[doc] update open-sora demo (#5479 ) * [doc] update open-sora demo * [doc] update open-sora demo * [doc] update open-sora demo	2024-03-20 16:08:41 +08:00
digger yu	385e85afd4	[hotfix] fix typo s/keywrods/keywords etc. (#5429 )	2024-03-12 11:25:16 +08:00
Camille Zhong	da885ed540	fix tensor data update for gemini loss caluculation (#5442 )	2024-03-11 13:49:58 +08:00
Camille Zhong	743e7fad2f	[colossal-llama2] add stream chat examlple for chat version model (#5428 ) * add stream chat for chat version * remove os.system clear * modify function name	2024-03-07 14:58:56 +08:00
hugo-syn	c8003d463b	[doc] Fix typo s/infered/inferred/ (#5288 ) Signed-off-by: hugo-syn <hugo.vincent@synacktiv.com>	2024-03-05 22:02:08 +08:00
Dongruixuan Li	a7ae2b5b4c	[eval-hotfix] set few_shot_data to None when few shot is disabled (#5422 )	2024-03-05 21:48:55 +08:00
binmakeswell	822241a99c	[doc] sora release (#5425 ) * [doc] sora release * [doc] sora release * [doc] sora release * [doc] sora release	2024-03-05 12:08:58 +08:00
Camille Zhong	4b8312c08e	fix sft single turn inference example (#5416 )	2024-03-01 17:27:50 +08:00
Tong Li	a28c971516	update requirements (#5407 )	2024-02-28 17:46:27 +08:00
CZYCW	b833153fd5	[hotfix] fix variable type for top_p (#5313 ) Co-authored-by: binmakeswell <binmakeswell@gmail.com>	2024-02-19 18:25:44 +08:00
Hongxin Liu	7303801854	[llama] fix training and inference scripts (#5384 ) * [llama] refactor inference example to fit sft * [llama] fix training script to fit gemini * [llama] fix inference script	2024-02-19 16:41:04 +08:00
Frank Lee	efef43b53c	Merge pull request #5372 from hpcaitech/exp/mixtral	2024-02-08 16:30:05 +08:00
Hongxin Liu	65e5d6baa5	[moe] fix mixtral optim checkpoint (#5344 )	2024-02-07 19:21:02 +08:00
Hongxin Liu	956b561b54	[moe] fix mixtral forward default value (#5329 )	2024-02-07 19:21:02 +08:00
Hongxin Liu	b60be18dcc	[moe] fix mixtral checkpoint io (#5314 )	2024-02-07 19:21:02 +08:00
Hongxin Liu	da39d21b71	[moe] support mixtral (#5309 ) * [moe] add mixtral block for single expert * [moe] mixtral block fwd support uneven ep * [moe] mixtral block bwd support uneven ep * [moe] add mixtral moe layer * [moe] simplify replace * [meo] support save sharded mixtral * [meo] support load sharded mixtral * [meo] support save sharded optim * [meo] integrate moe manager into plug * [meo] fix optimizer load * [meo] fix mixtral layer	2024-02-07 19:21:02 +08:00
Hongxin Liu	c904d2ae99	[moe] update capacity computing (#5253 ) * [moe] top2 allow uneven input * [moe] update capacity computing * [moe] remove debug info * [moe] update capacity computing * [moe] update capacity computing	2024-02-07 19:21:02 +08:00
Xuanlei Zhao	7d8e0338a4	[moe] init mixtral impl	2024-02-07 19:21:02 +08:00
Hongxin Liu	084c91246c	[llama] fix memory issue (#5371 ) * [llama] fix memory issue * [llama] add comment	2024-02-06 19:02:37 +08:00
Hongxin Liu	eb4f2d90f9	[llama] polish training script and fix optim ckpt (#5368 )	2024-02-06 11:52:17 +08:00
Camille Zhong	a5756a8720	[eval] update llama npu eval (#5366 )	2024-02-06 10:53:03 +08:00
Camille Zhong	44ca61a22b	[llama] fix neftune & pbar with start_step (#5364 )	2024-02-05 18:04:23 +08:00
Hongxin Liu	a4cec1715b	[llama] add flash attn patch for npu (#5362 )	2024-02-05 16:48:34 +08:00
Hongxin Liu	73f9f23fc6	[llama] update training script (#5360 ) * [llama] update training script * [doc] polish docstr	2024-02-05 16:33:18 +08:00
Hongxin Liu	6c0fa7b9a8	[llama] fix dataloader for hybrid parallel (#5358 ) * [plugin] refactor prepare dataloader * [plugin] update train script	2024-02-05 15:14:56 +08:00
YeAnbang	c5239840e6	[Chat] fix sft loss nan (#5345 ) * fix script * fix script * fix chat nan * fix chat nan	2024-02-01 14:25:16 +08:00
Frank Lee	8823cc4831	Merge pull request #5310 from hpcaitech/feature/npu Feature/npu	2024-01-29 13:49:39 +08:00
李文军	ec912b1ba9	[NFC] polish applications/Colossal-LLaMA-2/colossal_llama2/tokenizer/init_tokenizer.py code style (#5228 )	2024-01-25 13:14:48 +08:00
Desperado-Jia	ddf879e2db	fix bug for mefture (#5299 )	2024-01-22 22:17:54 +08:00
Michelle	32cb74493a	fix auto loading gpt2 tokenizer (#5279 )	2024-01-18 14:08:29 +08:00
ver217	148469348a	Merge branch 'main' into sync/npu	2024-01-18 12:05:21 +08:00
digger yu	756c400ad2	fix typo in applications/ColossalEval/README.md (#5250 )	2024-01-11 17:58:38 +08:00
digger yu	41e52c1c6e	[doc] fix typo in Colossal-LLaMA-2/README.md (#5247 )	2024-01-10 19:24:56 +08:00
Hongxin Liu	d202cc28c0	[npu] change device to accelerator api (#5239 ) * update accelerator * fix timer * fix amp * update * fix * update bug * add error raise * fix autocast * fix set device * remove doc accelerator * update doc * update doc * update doc * use nullcontext * update cpu * update null context * change time limit for example * udpate * update * update * update * [npu] polish accelerator code --------- Co-authored-by: Xuanlei Zhao <xuanlei.zhao@gmail.com> Co-authored-by: zxl <43881818+oahzxl@users.noreply.github.com>	2024-01-09 10:20:05 +08:00
binmakeswell	7bc6969ce6	[doc] SwiftInfer release (#5236 ) * [doc] SwiftInfer release * [doc] SwiftInfer release * [doc] SwiftInfer release * [doc] SwiftInfer release * [doc] SwiftInfer release	2024-01-08 09:55:12 +08:00
github-actions[bot]	4fb4a22a72	[format] applied code formatting on changed files in pull request 5234 (#5235 ) Co-authored-by: github-actions <github-actions@github.com>	2024-01-07 20:55:34 +08:00
binmakeswell	b9b32b15e6	[doc] add Colossal-LLaMA-2-13B (#5234 ) * [doc] add Colossal-LLaMA-2-13B * [doc] add Colossal-LLaMA-2-13B * [doc] add Colossal-LLaMA-2-13B	2024-01-07 20:53:12 +08:00
Camille Zhong	915b4652f3	[doc] Update README.md of Colossal-LLAMA2 (#5233 ) * Update README.md * Update README.md	2024-01-06 17:06:41 +08:00
Tong Li	d992b55968	[Colossal-LLaMA-2] Release Colossal-LLaMA-2-13b-base model (#5224 ) * update readme * update readme * update link * update * update readme * update * update * update * update title * update example * update example * fix content * add conclusion * add license * update * update * update version * fix minor	2024-01-05 17:24:26 +08:00
Yuanchen	eae01b6740	Improve logic for selecting metrics (#5196 ) Co-authored-by: Xu <yuanchen.xu00@gmail.com>	2023-12-22 14:52:50 +08:00
BlueRum	af952673f7	polish readme in application/chat (#5194 )	2023-12-20 11:28:39 +08:00
Yuanchen	3ff60d13b0	Fix ColossalEval (#5186 ) Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>	2023-12-15 15:06:06 +08:00
Yuanchen	cefdc32615	[ColossalEval] Support GSM, Data Leakage Evaluation and Tensor Parallel (#5169 ) * Support GSM, Data Leakage Evaluation and Tensor Parallel * remove redundant code and update inference.py in examples/gpt_evaluation --------- Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>	2023-12-12 14:47:35 +08:00
Michelle	b07a6f4e27	[colossalqa] fix pangu api (#5170 ) * fix pangu api * add comment	2023-12-11 14:08:11 +08:00
Yuanchen	b397104438	[Colossal-Llama-2] Add finetuning Colossal-Llama-2 example (#4878 ) * Add finetuning Colossal-Llama-2 example * Add finetuning Colossal-Llama-2 example 2 * Add finetuning Colossal-Llama-2 example and support NEFTuning * Add inference example and refine neftune * Modify readme file * update the imports --------- Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com> Co-authored-by: Camille Zhong <44392324+Camille7777@users.noreply.github.com>	2023-12-07 14:02:03 +08:00
Michelle	368b5e3d64	[doc] fix colossalqa document (#5146 ) * fix doc * modify doc	2023-12-01 21:39:53 +08:00
Michelle	c7fd9a5213	[ColossalQA] refactor server and webui & add new feature (#5138 ) * refactor server and webui & add new feature * add requirements * modify readme and ui	2023-11-30 22:55:52 +08:00
github-actions[bot]	f6731db67c	[format] applied code formatting on changed files in pull request 5115 (#5118 ) Co-authored-by: github-actions <github-actions@github.com>	2023-11-29 13:39:14 +08:00
digger yu	9110406a47	fix typo change JOSNL TO JSONL etc. (#5116 )	2023-11-29 11:08:32 +08:00
Zian(Andy) Zheng	7b789f4dd2	[FEATURE] Add Safety Eval Datasets to ColossalEval (#5095 ) * add safetybench and cvalues(responsibility) eval dataset * Modify code according to review suggestions --------- Co-authored-by: Orion-Zheng <zhengzian@u.nus.edu>	2023-11-28 11:15:04 +08:00
digger yu	d5661f0f25	[nfc] fix typo change directoty to directory (#5111 )	2023-11-27 18:25:53 +08:00
YeAnbang	e53e729d8e	[Feature] Add document retrieval QA (#5020 ) * add langchain * add langchain * Add files via upload * add langchain * fix style * fix style: remove extra space * add pytest; modified retriever * add pytest; modified retriever * add tests to build_on_pr.yml * fix build_on_pr.yml * fix build on pr; fix environ vars * seperate unit tests for colossalqa from build from pr * fix container setting; fix environ vars * commented dev code * add incremental update * remove stale code * fix style * change to sha3 224 * fix retriever; fix style; add unit test for document loader * fix ci workflow config * fix ci workflow config * add set cuda visible device script in ci * fix doc string * fix style; update readme; refactored * add force log info * change build on pr, ignore colossalqa * fix docstring, captitalize all initial letters * fix indexing; fix text-splitter * remove debug code, update reference * reset previous commit * update LICENSE update README add key-value mode, fix bugs * add files back * revert force push * remove junk file * add test files * fix retriever bug, add intent classification * change conversation chain design * rewrite prompt and conversation chain * add ui v1 * ui v1 * fix atavar * add header * Refactor the RAG Code and support Pangu * Refactor the ColossalQA chain to Object-Oriented Programming and the UI demo. * resolved conversation. tested scripts under examples. web demo still buggy * fix ci tests * Some modifications to add ChatGPT api * modify llm.py and remove unnecessary files * Delete applications/ColossalQA/examples/ui/test_frontend_input.json * Remove OpenAI api key * add colossalqa * move files * move files * move files * move files * fix style * Add Readme and fix some bugs. * Add something to readme and modify some code * modify a directory name for clarity * remove redundant directory * Correct a type in llm.py * fix AI prefix * fix test_memory.py * fix conversation * fix some erros and typos * Fix a missing import in RAG_ChatBot.py * add colossalcloud LLM wrapper, correct issues in code review --------- Co-authored-by: YeAnbang <anbangy2@outlook.com> Co-authored-by: Orion-Zheng <zheng_zian@u.nus.edu> Co-authored-by: Zian(Andy) Zheng <62330719+Orion-Zheng@users.noreply.github.com> Co-authored-by: Orion-Zheng <zhengzian@u.nus.edu>	2023-11-23 10:33:48 +08:00
Orion-Zheng	43ad0d9ef0	fix wrong EOS token in ColossalChat	2023-11-14 10:49:49 +08:00
Yuanchen	239cd92eff	Support mtbench (#5025 ) Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>	2023-11-09 13:41:50 +08:00
Yuanchen	abe071b663	fix ColossalEval (#4992 ) Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>	2023-10-31 10:30:03 +08:00
github-actions[bot]	a41cf88e9b	[format] applied code formatting on changed files in pull request 4908 (#4918 ) Co-authored-by: github-actions <github-actions@github.com>	2023-10-17 10:48:24 +08:00
Zian(Andy) Zheng	7768afbad0	Update flash_attention_patch.py To be compatible with the new change in the Transformers library, where a new argument 'padding_mask' was added to forward function of attention layer. https://github.com/huggingface/transformers/pull/25598	2023-10-16 14:00:45 +08:00
Camille Zhong	652adc2215	Update README.md	2023-10-10 23:19:34 +08:00
Camille Zhong	afe10a85fd	Update README.md	2023-10-10 23:19:34 +08:00
Camille Zhong	3043d5d676	Update modelscope link in README.md add modelscope link	2023-10-10 23:19:34 +08:00
Tong Li	ed06731e00	update Colossal (#4832 )	2023-09-28 16:05:05 +08:00
binmakeswell	822051d888	[doc] update slack link (#4823 )	2023-09-27 17:37:39 +08:00
Yuanchen	1fa8c5e09f	Update Qwen-7B results (#4821 ) Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>	2023-09-27 17:33:54 +08:00
flybird11111	be400a0936	[chat] fix gemini strategy (#4698 ) * [chat] fix gemini strategy * [chat] fix gemini strategy * [chat] fix gemini strategy * [chat] fix gemini strategy * g# This is a combination of 2 commits. [chat] fix gemini strategy fox * [chat] fix gemini strategy update llama2 example [chat] fix gemini strategy * [fix] fix gemini strategy * [fix] fix gemini strategy * [fix] fix gemini strategy * [fix] fix gemini strategy * [fix] fix gemini strategy * [fix] fix gemini strategy * [fix] fix gemini strategy * [fix] fix gemini strategy * [fix] fix gemini strategy * [fix] fix gemini strategy * fix * fix * fix * fix * fix * Update train_prompts.py	2023-09-27 13:15:32 +08:00
Chandler-Bing	b6cf0aca55	[hotfix] change llama2 Colossal-LLaMA-2 script filename (#4800 ) change filename: pretraining.py -> trainin.py there is no file named pretraing.py. wrong writing	2023-09-26 11:44:27 +08:00
Tong Li	8cbce6184d	update	2023-09-26 11:36:53 +08:00
Tong Li	bd014673b0	update readme	2023-09-26 10:58:05 +08:00
binmakeswell	d512a4d38d	[doc] add llama2 domain-specific solution news (#4789 ) * [doc] add llama2 domain-specific solution news	2023-09-25 10:44:15 +08:00
Yuanchen	ce777853ae	[feature] ColossalEval: Evaluation Pipeline for LLMs (#4786 ) * Add ColossalEval * Delete evaluate in Chat --------- Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com> Co-authored-by: Tong Li <tong.li352711588@gmail.com>	2023-09-24 23:14:11 +08:00
Tong Li	74aa7d964a	initial commit: add colossal llama 2 (#4784 )	2023-09-24 23:12:26 +08:00
Wenhao Chen	901ab1eedd	[chat]: add lora merge weights config (#4766 ) * feat: modify lora merge weights fn * feat: add lora merge weights config	2023-09-21 16:23:59 +08:00

1 2 3 4 5 ...

322 Commits (ckpt)