ColossalAI

Commit Graph

Author	SHA1	Message	Date
digger yu	049121d19d	[hotfix] fix typo change enabel to enable under colossalai/shardformer/ (#5317 )	2024-03-05 21:48:46 +08:00
digger yu	16c96d4d8c	[hotfix] fix typo change _descrption to _description (#5331 )	2024-03-05 21:47:48 +08:00
digger yu	70cce5cbed	[doc] update some translations with README-zh-Hans.md (#5382 )	2024-03-05 21:45:55 +08:00
Luo Yihang	e239cf9060	[hotfix] fix typo of openmoe model source (#5403 )	2024-03-05 21:44:38 +08:00
MickeyCHAN	e304e4db35	[hotfix] fix sd vit import error (#5420 ) * fix import error * Update dpt_depth.py --------- Co-authored-by: binmakeswell <binmakeswell@gmail.com>	2024-03-05 21:41:23 +08:00
Hongxin Liu	070df689e6	[devops] fix extention building (#5427 )	2024-03-05 15:35:54 +08:00
binmakeswell	822241a99c	[doc] sora release (#5425 ) * [doc] sora release * [doc] sora release * [doc] sora release * [doc] sora release	2024-03-05 12:08:58 +08:00
flybird11111	29695cf70c	[example]add gpt2 benchmark example script. (#5295 ) * benchmark gpt2 * fix fix fix fix * [doc] fix typo in Colossal-LLaMA-2/README.md (#5247) * [workflow] fixed build CI (#5240) * [workflow] fixed build CI * polish * polish * polish * polish * polish * [ci] fixed booster test (#5251) * [ci] fixed booster test * [ci] fixed booster test * [ci] fixed booster test * [ci] fixed ddp test (#5254) * [ci] fixed ddp test * polish * fix typo in applications/ColossalEval/README.md (#5250) * [ci] fix shardformer tests. (#5255) * fix ci fix * revert: revert p2p * feat: add enable_metadata_cache option * revert: enable t5 tests --------- Co-authored-by: Wenhao Chen <cwher@outlook.com> * [doc] fix doc typo (#5256) * [doc] fix annotation display * [doc] fix llama2 doc * [hotfix]: add pp sanity check and fix mbs arg (#5268) * fix: fix misleading mbs arg * feat: add pp sanity check * fix: fix 1f1b sanity check * [workflow] fixed incomplete bash command (#5272) * [workflow] fixed oom tests (#5275) * [workflow] fixed oom tests * polish * polish * polish * [ci] fix test_hybrid_parallel_plugin_checkpoint_io.py (#5276) * fix ci fix * fix test * revert: revert p2p * feat: add enable_metadata_cache option * revert: enable t5 tests * fix --------- Co-authored-by: Wenhao Chen <cwher@outlook.com> * [shardformer] hybridparallelplugin support gradients accumulation. (#5246) * support gradients acc fix fix fix fix fix fix fix fix fix fix fix fix fix * fix fix * fix fix fix * [hotfix] Fix ShardFormer test execution path when using sequence parallelism (#5230) * fix auto loading gpt2 tokenizer (#5279) * [doc] add llama2-13B disyplay (#5285) * Update README.md * fix 13b typo --------- Co-authored-by: binmakeswell <binmakeswell@gmail.com> * fix llama pretrain (#5287) * fix * fix * fix fix * fix fix fix * fix fix * benchmark gpt2 * fix fix fix fix * [workflow] fixed build CI (#5240) * [workflow] fixed build CI * polish * polish * polish * polish * polish * [ci] fixed booster test (#5251) * [ci] fixed booster test * [ci] fixed booster test * [ci] fixed booster test * fix fix * fix fix fix * fix * fix fix fix fix fix * fix * Update shardformer.py --------- Co-authored-by: digger yu <digger-yu@outlook.com> Co-authored-by: Frank Lee <somerlee.9@gmail.com> Co-authored-by: Wenhao Chen <cwher@outlook.com> Co-authored-by: binmakeswell <binmakeswell@gmail.com> Co-authored-by: Zhongkai Zhao <kanezz620@gmail.com> Co-authored-by: Michelle <97082656+MichelleMa8@users.noreply.github.com> Co-authored-by: Desperado-Jia <502205863@qq.com>	2024-03-04 16:18:13 +08:00
Camille Zhong	4b8312c08e	fix sft single turn inference example (#5416 )	2024-03-01 17:27:50 +08:00
binmakeswell	a1c6cdb189	[doc] fix blog link	2024-02-29 15:01:43 +08:00
binmakeswell	5de940de32	[doc] fix blog link	2024-02-29 15:01:43 +08:00
Frank Lee	2461f37886	[workflow] added pypi channel (#5412 )	2024-02-29 13:56:55 +08:00
Tong Li	a28c971516	update requirements (#5407 )	2024-02-28 17:46:27 +08:00
flybird11111	0a25e16e46	[shardformer]gather llama logits (#5398 ) * gather llama logits * fix	2024-02-27 22:44:07 +08:00
Frank Lee	dcdd8a5ef7	[setup] fixed nightly release (#5388 )	2024-02-27 15:19:13 +08:00
QinLuo	bf34c6fef6	[fsdp] impl save/load shard model/optimizer (#5357 )	2024-02-27 13:51:14 +08:00
Hongxin Liu	d882d18c65	[example] reuse flash attn patch (#5400 )	2024-02-27 11:22:07 +08:00
Hongxin Liu	95c21e3950	[extension] hotfix jit extension setup (#5402 )	2024-02-26 19:46:58 +08:00
Stephan Kölker	5d380a1a21	[hotfix] Fix wrong import in meta_registry (#5392 )	2024-02-20 19:24:43 +08:00
CZYCW	b833153fd5	[hotfix] fix variable type for top_p (#5313 ) Co-authored-by: binmakeswell <binmakeswell@gmail.com>	2024-02-19 18:25:44 +08:00
Frank Lee	705a62a565	[doc] updated installation command (#5389 )	2024-02-19 16:54:03 +08:00
yixiaoer	69e3ad01ed	[doc] Fix typo (#5361 )	2024-02-19 16:53:28 +08:00
Hongxin Liu	7303801854	[llama] fix training and inference scripts (#5384 ) * [llama] refactor inference example to fit sft * [llama] fix training script to fit gemini * [llama] fix inference script	2024-02-19 16:41:04 +08:00
Hongxin Liu	adae123df3	[release] update version (#5380 )	2024-02-08 18:50:09 +08:00
Frank Lee	efef43b53c	Merge pull request #5372 from hpcaitech/exp/mixtral	2024-02-08 16:30:05 +08:00
Frank Lee	4c03347fc7	Merge pull request #5377 from hpcaitech/example/llama-npu [llama] support npu for Colossal-LLaMA-2	2024-02-08 14:12:11 +08:00
ver217	06db94fbc9	[moe] fix tests	2024-02-08 12:46:37 +08:00
Hongxin Liu	65e5d6baa5	[moe] fix mixtral optim checkpoint (#5344 )	2024-02-07 19:21:02 +08:00
Hongxin Liu	956b561b54	[moe] fix mixtral forward default value (#5329 )	2024-02-07 19:21:02 +08:00
Hongxin Liu	b60be18dcc	[moe] fix mixtral checkpoint io (#5314 )	2024-02-07 19:21:02 +08:00
Hongxin Liu	da39d21b71	[moe] support mixtral (#5309 ) * [moe] add mixtral block for single expert * [moe] mixtral block fwd support uneven ep * [moe] mixtral block bwd support uneven ep * [moe] add mixtral moe layer * [moe] simplify replace * [meo] support save sharded mixtral * [meo] support load sharded mixtral * [meo] support save sharded optim * [meo] integrate moe manager into plug * [meo] fix optimizer load * [meo] fix mixtral layer	2024-02-07 19:21:02 +08:00
Hongxin Liu	c904d2ae99	[moe] update capacity computing (#5253 ) * [moe] top2 allow uneven input * [moe] update capacity computing * [moe] remove debug info * [moe] update capacity computing * [moe] update capacity computing	2024-02-07 19:21:02 +08:00
Xuanlei Zhao	7d8e0338a4	[moe] init mixtral impl	2024-02-07 19:21:02 +08:00
Hongxin Liu	084c91246c	[llama] fix memory issue (#5371 ) * [llama] fix memory issue * [llama] add comment	2024-02-06 19:02:37 +08:00
Hongxin Liu	c53ddda88f	[lr-scheduler] fix load state dict and add test (#5369 )	2024-02-06 14:23:32 +08:00
Hongxin Liu	eb4f2d90f9	[llama] polish training script and fix optim ckpt (#5368 )	2024-02-06 11:52:17 +08:00
Camille Zhong	a5756a8720	[eval] update llama npu eval (#5366 )	2024-02-06 10:53:03 +08:00
Camille Zhong	44ca61a22b	[llama] fix neftune & pbar with start_step (#5364 )	2024-02-05 18:04:23 +08:00
Hongxin Liu	a4cec1715b	[llama] add flash attn patch for npu (#5362 )	2024-02-05 16:48:34 +08:00
Hongxin Liu	73f9f23fc6	[llama] update training script (#5360 ) * [llama] update training script * [doc] polish docstr	2024-02-05 16:33:18 +08:00
Hongxin Liu	6c0fa7b9a8	[llama] fix dataloader for hybrid parallel (#5358 ) * [plugin] refactor prepare dataloader * [plugin] update train script	2024-02-05 15:14:56 +08:00
Hongxin Liu	2dd01e3a14	[gemini] fix param op hook when output is tuple (#5355 ) * [gemini] fix param op hook when output is tuple * [gemini] fix param op hook	2024-02-04 11:58:26 +08:00
Wenhao Chen	1c790c0877	[fix] remove unnecessary dp_size assert (#5351 ) * fix: remove unnecessary assert * test: add more 3d plugin tests * fix: add warning	2024-02-02 14:40:20 +08:00
Hongxin Liu	ffffc32dc7	[checkpointio] fix gemini and hybrid parallel optim checkpoint (#5347 ) * [checkpointio] fix hybrid parallel optim checkpoint * [extension] fix cuda extension * [checkpointio] fix gemini optimizer checkpoint * polish code	2024-02-01 16:13:06 +08:00
YeAnbang	c5239840e6	[Chat] fix sft loss nan (#5345 ) * fix script * fix script * fix chat nan * fix chat nan	2024-02-01 14:25:16 +08:00
Frank Lee	abd8e77ad8	[extension] fixed exception catch (#5342 )	2024-01-31 18:09:49 +08:00
digger yu	71321a07cf	fix typo change dosen't to doesn't (#5308 )	2024-01-30 09:57:38 +08:00
digger yu	6a3086a505	fix typo under extensions/ (#5330 )	2024-01-30 09:55:16 +08:00
Frank Lee	febed23288	[doc] added docs for extensions (#5324 ) * [doc] added docs for extensions * polish * polish	2024-01-29 17:39:23 +08:00
flybird11111	388179f966	[tests] fix t5 test. (#5322 ) * [ci] fix shardformer tests. (#5255) * fix ci fix * revert: revert p2p * feat: add enable_metadata_cache option * revert: enable t5 tests --------- Co-authored-by: Wenhao Chen <cwher@outlook.com> * fix t5 test --------- Co-authored-by: Wenhao Chen <cwher@outlook.com>	2024-01-29 17:38:46 +08:00

1 2 3 4 5 ...

3068 Commits (7ef91606e17cc1e991496c6cc74f73cbd42313ae) All Branches Search

3068 Commits (7ef91606e17cc1e991496c6cc74f73cbd42313ae)

All Branches