ColossalAI

Commit Graph

Author	SHA1	Message	Date
Hongxin Liu	7303801854	[llama] fix training and inference scripts (#5384 ) * [llama] refactor inference example to fit sft * [llama] fix training script to fit gemini * [llama] fix inference script	2024-02-19 16:41:04 +08:00
Hongxin Liu	adae123df3	[release] update version (#5380 )	2024-02-08 18:50:09 +08:00
Frank Lee	efef43b53c	Merge pull request #5372 from hpcaitech/exp/mixtral	2024-02-08 16:30:05 +08:00
Frank Lee	4c03347fc7	Merge pull request #5377 from hpcaitech/example/llama-npu [llama] support npu for Colossal-LLaMA-2	2024-02-08 14:12:11 +08:00
ver217	06db94fbc9	[moe] fix tests	2024-02-08 12:46:37 +08:00
Hongxin Liu	65e5d6baa5	[moe] fix mixtral optim checkpoint (#5344 )	2024-02-07 19:21:02 +08:00
Hongxin Liu	956b561b54	[moe] fix mixtral forward default value (#5329 )	2024-02-07 19:21:02 +08:00
Hongxin Liu	b60be18dcc	[moe] fix mixtral checkpoint io (#5314 )	2024-02-07 19:21:02 +08:00
Hongxin Liu	da39d21b71	[moe] support mixtral (#5309 ) * [moe] add mixtral block for single expert * [moe] mixtral block fwd support uneven ep * [moe] mixtral block bwd support uneven ep * [moe] add mixtral moe layer * [moe] simplify replace * [meo] support save sharded mixtral * [meo] support load sharded mixtral * [meo] support save sharded optim * [meo] integrate moe manager into plug * [meo] fix optimizer load * [meo] fix mixtral layer	2024-02-07 19:21:02 +08:00
Hongxin Liu	c904d2ae99	[moe] update capacity computing (#5253 ) * [moe] top2 allow uneven input * [moe] update capacity computing * [moe] remove debug info * [moe] update capacity computing * [moe] update capacity computing	2024-02-07 19:21:02 +08:00
Xuanlei Zhao	7d8e0338a4	[moe] init mixtral impl	2024-02-07 19:21:02 +08:00
Hongxin Liu	084c91246c	[llama] fix memory issue (#5371 ) * [llama] fix memory issue * [llama] add comment	2024-02-06 19:02:37 +08:00
Hongxin Liu	c53ddda88f	[lr-scheduler] fix load state dict and add test (#5369 )	2024-02-06 14:23:32 +08:00
Hongxin Liu	eb4f2d90f9	[llama] polish training script and fix optim ckpt (#5368 )	2024-02-06 11:52:17 +08:00
Camille Zhong	a5756a8720	[eval] update llama npu eval (#5366 )	2024-02-06 10:53:03 +08:00
Camille Zhong	44ca61a22b	[llama] fix neftune & pbar with start_step (#5364 )	2024-02-05 18:04:23 +08:00
Hongxin Liu	a4cec1715b	[llama] add flash attn patch for npu (#5362 )	2024-02-05 16:48:34 +08:00
Hongxin Liu	73f9f23fc6	[llama] update training script (#5360 ) * [llama] update training script * [doc] polish docstr	2024-02-05 16:33:18 +08:00
Hongxin Liu	6c0fa7b9a8	[llama] fix dataloader for hybrid parallel (#5358 ) * [plugin] refactor prepare dataloader * [plugin] update train script	2024-02-05 15:14:56 +08:00
Hongxin Liu	2dd01e3a14	[gemini] fix param op hook when output is tuple (#5355 ) * [gemini] fix param op hook when output is tuple * [gemini] fix param op hook	2024-02-04 11:58:26 +08:00
Wenhao Chen	1c790c0877	[fix] remove unnecessary dp_size assert (#5351 ) * fix: remove unnecessary assert * test: add more 3d plugin tests * fix: add warning	2024-02-02 14:40:20 +08:00
Hongxin Liu	ffffc32dc7	[checkpointio] fix gemini and hybrid parallel optim checkpoint (#5347 ) * [checkpointio] fix hybrid parallel optim checkpoint * [extension] fix cuda extension * [checkpointio] fix gemini optimizer checkpoint * polish code	2024-02-01 16:13:06 +08:00
YeAnbang	c5239840e6	[Chat] fix sft loss nan (#5345 ) * fix script * fix script * fix chat nan * fix chat nan	2024-02-01 14:25:16 +08:00
Frank Lee	abd8e77ad8	[extension] fixed exception catch (#5342 )	2024-01-31 18:09:49 +08:00
digger yu	71321a07cf	fix typo change dosen't to doesn't (#5308 )	2024-01-30 09:57:38 +08:00
digger yu	6a3086a505	fix typo under extensions/ (#5330 )	2024-01-30 09:55:16 +08:00
Frank Lee	febed23288	[doc] added docs for extensions (#5324 ) * [doc] added docs for extensions * polish * polish	2024-01-29 17:39:23 +08:00
flybird11111	388179f966	[tests] fix t5 test. (#5322 ) * [ci] fix shardformer tests. (#5255) * fix ci fix * revert: revert p2p * feat: add enable_metadata_cache option * revert: enable t5 tests --------- Co-authored-by: Wenhao Chen <cwher@outlook.com> * fix t5 test --------- Co-authored-by: Wenhao Chen <cwher@outlook.com>	2024-01-29 17:38:46 +08:00
Frank Lee	a6709afe66	Merge pull request #5321 from FrankLeeeee/hotfix/accelerator-api [accelerator] fixed npu api	2024-01-29 14:29:58 +08:00
FrankLeeeee	087d0cb1fc	[accelerator] fixed npu api	2024-01-29 14:27:52 +08:00
Frank Lee	8823cc4831	Merge pull request #5310 from hpcaitech/feature/npu Feature/npu	2024-01-29 13:49:39 +08:00
Frank Lee	73f4dc578e	[workflow] updated CI image (#5318 )	2024-01-29 11:53:07 +08:00
Frank Lee	7cfed5f076	[feat] refactored extension module (#5298 ) * [feat] refactored extension module * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish	2024-01-25 17:01:48 +08:00
digger yu	bce9499ed3	fix some typo (#5307 )	2024-01-25 13:56:27 +08:00
李文军	ec912b1ba9	[NFC] polish applications/Colossal-LLaMA-2/colossal_llama2/tokenizer/init_tokenizer.py code style (#5228 )	2024-01-25 13:14:48 +08:00
Desperado-Jia	ddf879e2db	fix bug for mefture (#5299 )	2024-01-22 22:17:54 +08:00
Hongxin Liu	d7f8db8e21	[hotfix] fix 3d plugin test (#5292 )	2024-01-22 15:19:04 +08:00
flybird11111	f7e3f82a7e	fix llama pretrain (#5287 )	2024-01-19 17:49:02 +08:00
Desperado-Jia	6a56967855	[doc] add llama2-13B disyplay (#5285 ) * Update README.md * fix 13b typo --------- Co-authored-by: binmakeswell <binmakeswell@gmail.com>	2024-01-19 16:04:08 +08:00
Michelle	32cb74493a	fix auto loading gpt2 tokenizer (#5279 )	2024-01-18 14:08:29 +08:00
Frank Lee	d66e6988bc	Merge pull request #5278 from ver217/sync/npu [sync] sync npu branch with main	2024-01-18 13:11:45 +08:00
ver217	148469348a	Merge branch 'main' into sync/npu	2024-01-18 12:05:21 +08:00
Zhongkai Zhao	5d9a0ae75b	[hotfix] Fix ShardFormer test execution path when using sequence parallelism (#5230 )	2024-01-17 17:42:29 +08:00
flybird11111	46e091651b	[shardformer] hybridparallelplugin support gradients accumulation. (#5246 ) * support gradients acc fix fix fix fix fix fix fix fix fix fix fix fix fix * fix fix * fix fix fix	2024-01-17 15:22:33 +08:00
flybird11111	2a0558d8ec	[ci] fix test_hybrid_parallel_plugin_checkpoint_io.py (#5276 ) * fix ci fix * fix test * revert: revert p2p * feat: add enable_metadata_cache option * revert: enable t5 tests * fix --------- Co-authored-by: Wenhao Chen <cwher@outlook.com>	2024-01-17 13:38:55 +08:00
Frank Lee	d69cd2eb89	[workflow] fixed oom tests (#5275 ) * [workflow] fixed oom tests * polish * polish * polish	2024-01-16 18:55:13 +08:00
Frank Lee	04244aaaf1	[workflow] fixed incomplete bash command (#5272 )	2024-01-16 11:54:44 +08:00
Wenhao Chen	ef4f0ee854	[hotfix]: add pp sanity check and fix mbs arg (#5268 ) * fix: fix misleading mbs arg * feat: add pp sanity check * fix: fix 1f1b sanity check	2024-01-15 15:57:40 +08:00
binmakeswell	c174c4fc5f	[doc] fix doc typo (#5256 ) * [doc] fix annotation display * [doc] fix llama2 doc	2024-01-11 21:01:11 +08:00
flybird11111	e830ef917d	[ci] fix shardformer tests. (#5255 ) * fix ci fix * revert: revert p2p * feat: add enable_metadata_cache option * revert: enable t5 tests --------- Co-authored-by: Wenhao Chen <cwher@outlook.com>	2024-01-11 19:07:45 +08:00

1 2 3 4 5 ...

3096 Commits (785cd9a9c971aa58e6f8c76575111a4aa4d9513b) All Branches Search

3096 Commits (785cd9a9c971aa58e6f8c76575111a4aa4d9513b)

All Branches