ColossalAI

Commit Graph

Author	SHA1	Message	Date
Hongxin Liu	641b1ee71a	[devops] remove post commit ci (#5566 ) * [devops] remove post commit ci * [misc] run pre-commit on all files * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2024-04-08 15:09:40 +08:00
digger yu	a799ca343b	[fix] fix typo s/muiti-node /multi-node etc. (#5448 )	2024-04-07 18:42:15 +08:00
Wenhao Chen	e614aa34f3	[shardformer, pipeline] add `gradient_checkpointing_ratio` and heterogenous shard policy for llama (#5508 ) * feat: add `GradientCheckpointConfig` and `PipelineGradientCheckpointConfig` * feat: apply `GradientCheckpointConfig` to policy and llama_forward * feat: move `distribute_layer` and `get_stage_index` to PipelineStageManager * fix: add optional args for `distribute_layer` and `get_stage_index` * fix: fix changed API calls * test: update llama tests * style: polish `GradientCheckpointConfig` * fix: fix pipeline utils tests	2024-04-01 11:34:58 +08:00
YeAnbang	df5e9c53cf	[ColossalChat] Update RLHF V2 (#5286 ) * Add dpo. Fix sft, ppo, lora. Refactor all * fix and tested ppo * 2 nd round refactor * add ci tests * fix ci * fix ci * fix readme, style * fix readme style * fix style, fix benchmark * reproduce benchmark result, remove useless files * rename to ColossalChat * use new image * fix ci workflow * fix ci * use local model/tokenizer for ci tests * fix ci * fix ci * fix ci * fix ci timeout * fix rm progress bar. fix ci timeout * fix ci * fix ci typo * remove 3d plugin from ci temporary * test environment * cannot save optimizer * support chat template * fix readme * fix path * test ci locally * restore build_or_pr * fix ci data path * fix benchmark * fix ci, move ci tests to 3080, disable fast tokenizer * move ci to 85 * support flash attention 2 * add all-in-one data preparation script. Fix colossal-llama2-chat chat template * add hardware requirements * move ci test data * fix save_model, add unwrap * fix missing bos * fix missing bos; support grad accumulation with gemini * fix ci * fix ci * fix ci * fix llama2 chat template config * debug sft * debug sft * fix colossalai version requirement * fix ci * add sanity check to prevent NaN loss * fix requirements * add dummy data generation script * add dummy data generation script * add dummy data generation script * add dummy data generation script * update readme * update readme * update readme and ignore * fix logger bug * support parallel_output * modify data preparation logic * fix tokenization * update lr * fix inference * run pre-commit --------- Co-authored-by: Tong Li <tong.li352711588@gmail.com>	2024-03-29 14:12:29 +08:00
Insu Jang	00525f7772	[shardformer] fix pipeline forward error if custom layer distribution is used (#5189 ) * Use self.[distribute_layers\|get_stage_index] to exploit custom layer distribution * Change static methods for t5 layer distribution to member functions * Change static methods for whisper layer distribution to member functions * Replace whisper policy usage with self one * Fix test case to use non-static layer distribution methods * fix: fix typo --------- Co-authored-by: Wenhao Chen <cwher@outlook.com>	2024-03-27 13:57:00 +08:00
Wenhao Chen	bb0a668fee	[hotfix] set return_outputs=False in examples and polish code (#5404 ) * fix: simplify merge_batch * fix: use return_outputs=False to eliminate extra memory consumption * feat: add return_outputs warning * style: remove `return_outputs=False` as it is the default value	2024-03-25 12:31:09 +08:00
binmakeswell	d158fc0e64	[doc] update open-sora demo (#5479 ) * [doc] update open-sora demo * [doc] update open-sora demo * [doc] update open-sora demo	2024-03-20 16:08:41 +08:00
digger yu	385e85afd4	[hotfix] fix typo s/keywrods/keywords etc. (#5429 )	2024-03-12 11:25:16 +08:00
Camille Zhong	da885ed540	fix tensor data update for gemini loss caluculation (#5442 )	2024-03-11 13:49:58 +08:00
Camille Zhong	743e7fad2f	[colossal-llama2] add stream chat examlple for chat version model (#5428 ) * add stream chat for chat version * remove os.system clear * modify function name	2024-03-07 14:58:56 +08:00
hugo-syn	c8003d463b	[doc] Fix typo s/infered/inferred/ (#5288 ) Signed-off-by: hugo-syn <hugo.vincent@synacktiv.com>	2024-03-05 22:02:08 +08:00
Dongruixuan Li	a7ae2b5b4c	[eval-hotfix] set few_shot_data to None when few shot is disabled (#5422 )	2024-03-05 21:48:55 +08:00
binmakeswell	822241a99c	[doc] sora release (#5425 ) * [doc] sora release * [doc] sora release * [doc] sora release * [doc] sora release	2024-03-05 12:08:58 +08:00
Camille Zhong	4b8312c08e	fix sft single turn inference example (#5416 )	2024-03-01 17:27:50 +08:00
Tong Li	a28c971516	update requirements (#5407 )	2024-02-28 17:46:27 +08:00
CZYCW	b833153fd5	[hotfix] fix variable type for top_p (#5313 ) Co-authored-by: binmakeswell <binmakeswell@gmail.com>	2024-02-19 18:25:44 +08:00
Hongxin Liu	7303801854	[llama] fix training and inference scripts (#5384 ) * [llama] refactor inference example to fit sft * [llama] fix training script to fit gemini * [llama] fix inference script	2024-02-19 16:41:04 +08:00
Frank Lee	efef43b53c	Merge pull request #5372 from hpcaitech/exp/mixtral	2024-02-08 16:30:05 +08:00
Hongxin Liu	65e5d6baa5	[moe] fix mixtral optim checkpoint (#5344 )	2024-02-07 19:21:02 +08:00
Hongxin Liu	956b561b54	[moe] fix mixtral forward default value (#5329 )	2024-02-07 19:21:02 +08:00
Hongxin Liu	b60be18dcc	[moe] fix mixtral checkpoint io (#5314 )	2024-02-07 19:21:02 +08:00
Hongxin Liu	da39d21b71	[moe] support mixtral (#5309 ) * [moe] add mixtral block for single expert * [moe] mixtral block fwd support uneven ep * [moe] mixtral block bwd support uneven ep * [moe] add mixtral moe layer * [moe] simplify replace * [meo] support save sharded mixtral * [meo] support load sharded mixtral * [meo] support save sharded optim * [meo] integrate moe manager into plug * [meo] fix optimizer load * [meo] fix mixtral layer	2024-02-07 19:21:02 +08:00
Hongxin Liu	c904d2ae99	[moe] update capacity computing (#5253 ) * [moe] top2 allow uneven input * [moe] update capacity computing * [moe] remove debug info * [moe] update capacity computing * [moe] update capacity computing	2024-02-07 19:21:02 +08:00
Xuanlei Zhao	7d8e0338a4	[moe] init mixtral impl	2024-02-07 19:21:02 +08:00
Hongxin Liu	084c91246c	[llama] fix memory issue (#5371 ) * [llama] fix memory issue * [llama] add comment	2024-02-06 19:02:37 +08:00
Hongxin Liu	eb4f2d90f9	[llama] polish training script and fix optim ckpt (#5368 )	2024-02-06 11:52:17 +08:00
Camille Zhong	a5756a8720	[eval] update llama npu eval (#5366 )	2024-02-06 10:53:03 +08:00
Camille Zhong	44ca61a22b	[llama] fix neftune & pbar with start_step (#5364 )	2024-02-05 18:04:23 +08:00
Hongxin Liu	a4cec1715b	[llama] add flash attn patch for npu (#5362 )	2024-02-05 16:48:34 +08:00
Hongxin Liu	73f9f23fc6	[llama] update training script (#5360 ) * [llama] update training script * [doc] polish docstr	2024-02-05 16:33:18 +08:00
Hongxin Liu	6c0fa7b9a8	[llama] fix dataloader for hybrid parallel (#5358 ) * [plugin] refactor prepare dataloader * [plugin] update train script	2024-02-05 15:14:56 +08:00
YeAnbang	c5239840e6	[Chat] fix sft loss nan (#5345 ) * fix script * fix script * fix chat nan * fix chat nan	2024-02-01 14:25:16 +08:00
Frank Lee	8823cc4831	Merge pull request #5310 from hpcaitech/feature/npu Feature/npu	2024-01-29 13:49:39 +08:00
李文军	ec912b1ba9	[NFC] polish applications/Colossal-LLaMA-2/colossal_llama2/tokenizer/init_tokenizer.py code style (#5228 )	2024-01-25 13:14:48 +08:00
Desperado-Jia	ddf879e2db	fix bug for mefture (#5299 )	2024-01-22 22:17:54 +08:00
Michelle	32cb74493a	fix auto loading gpt2 tokenizer (#5279 )	2024-01-18 14:08:29 +08:00
ver217	148469348a	Merge branch 'main' into sync/npu	2024-01-18 12:05:21 +08:00
digger yu	756c400ad2	fix typo in applications/ColossalEval/README.md (#5250 )	2024-01-11 17:58:38 +08:00
digger yu	41e52c1c6e	[doc] fix typo in Colossal-LLaMA-2/README.md (#5247 )	2024-01-10 19:24:56 +08:00
Hongxin Liu	d202cc28c0	[npu] change device to accelerator api (#5239 ) * update accelerator * fix timer * fix amp * update * fix * update bug * add error raise * fix autocast * fix set device * remove doc accelerator * update doc * update doc * update doc * use nullcontext * update cpu * update null context * change time limit for example * udpate * update * update * update * [npu] polish accelerator code --------- Co-authored-by: Xuanlei Zhao <xuanlei.zhao@gmail.com> Co-authored-by: zxl <43881818+oahzxl@users.noreply.github.com>	2024-01-09 10:20:05 +08:00
binmakeswell	7bc6969ce6	[doc] SwiftInfer release (#5236 ) * [doc] SwiftInfer release * [doc] SwiftInfer release * [doc] SwiftInfer release * [doc] SwiftInfer release * [doc] SwiftInfer release	2024-01-08 09:55:12 +08:00
github-actions[bot]	4fb4a22a72	[format] applied code formatting on changed files in pull request 5234 (#5235 ) Co-authored-by: github-actions <github-actions@github.com>	2024-01-07 20:55:34 +08:00
binmakeswell	b9b32b15e6	[doc] add Colossal-LLaMA-2-13B (#5234 ) * [doc] add Colossal-LLaMA-2-13B * [doc] add Colossal-LLaMA-2-13B * [doc] add Colossal-LLaMA-2-13B	2024-01-07 20:53:12 +08:00
Camille Zhong	915b4652f3	[doc] Update README.md of Colossal-LLAMA2 (#5233 ) * Update README.md * Update README.md	2024-01-06 17:06:41 +08:00
Tong Li	d992b55968	[Colossal-LLaMA-2] Release Colossal-LLaMA-2-13b-base model (#5224 ) * update readme * update readme * update link * update * update readme * update * update * update * update title * update example * update example * fix content * add conclusion * add license * update * update * update version * fix minor	2024-01-05 17:24:26 +08:00
Yuanchen	eae01b6740	Improve logic for selecting metrics (#5196 ) Co-authored-by: Xu <yuanchen.xu00@gmail.com>	2023-12-22 14:52:50 +08:00
BlueRum	af952673f7	polish readme in application/chat (#5194 )	2023-12-20 11:28:39 +08:00
Yuanchen	3ff60d13b0	Fix ColossalEval (#5186 ) Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>	2023-12-15 15:06:06 +08:00
Yuanchen	cefdc32615	[ColossalEval] Support GSM, Data Leakage Evaluation and Tensor Parallel (#5169 ) * Support GSM, Data Leakage Evaluation and Tensor Parallel * remove redundant code and update inference.py in examples/gpt_evaluation --------- Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>	2023-12-12 14:47:35 +08:00
Michelle	b07a6f4e27	[colossalqa] fix pangu api (#5170 ) * fix pangu api * add comment	2023-12-11 14:08:11 +08:00

1 2 3 4 5

249 Commits (641b1ee71a19e2337f3363620b228dd355835b04)