ColossalAI

Commit Graph

Author	SHA1	Message	Date
Frank Lee	705a62a565	[doc] updated installation command (#5389 )	9 months ago
yixiaoer	69e3ad01ed	[doc] Fix typo (#5361 )	9 months ago
Hongxin Liu	7303801854	[llama] fix training and inference scripts (#5384 ) * [llama] refactor inference example to fit sft * [llama] fix training script to fit gemini * [llama] fix inference script	9 months ago
Hongxin Liu	adae123df3	[release] update version (#5380 )	10 months ago
Frank Lee	efef43b53c	Merge pull request #5372 from hpcaitech/exp/mixtral	10 months ago
Frank Lee	4c03347fc7	Merge pull request #5377 from hpcaitech/example/llama-npu [llama] support npu for Colossal-LLaMA-2	10 months ago
ver217	06db94fbc9	[moe] fix tests	10 months ago
Hongxin Liu	65e5d6baa5	[moe] fix mixtral optim checkpoint (#5344 )	10 months ago
Hongxin Liu	956b561b54	[moe] fix mixtral forward default value (#5329 )	10 months ago
Hongxin Liu	b60be18dcc	[moe] fix mixtral checkpoint io (#5314 )	10 months ago
Hongxin Liu	da39d21b71	[moe] support mixtral (#5309 ) * [moe] add mixtral block for single expert * [moe] mixtral block fwd support uneven ep * [moe] mixtral block bwd support uneven ep * [moe] add mixtral moe layer * [moe] simplify replace * [meo] support save sharded mixtral * [meo] support load sharded mixtral * [meo] support save sharded optim * [meo] integrate moe manager into plug * [meo] fix optimizer load * [meo] fix mixtral layer	10 months ago
Hongxin Liu	c904d2ae99	[moe] update capacity computing (#5253 ) * [moe] top2 allow uneven input * [moe] update capacity computing * [moe] remove debug info * [moe] update capacity computing * [moe] update capacity computing	10 months ago
Xuanlei Zhao	7d8e0338a4	[moe] init mixtral impl	10 months ago
Hongxin Liu	084c91246c	[llama] fix memory issue (#5371 ) * [llama] fix memory issue * [llama] add comment	10 months ago
Hongxin Liu	c53ddda88f	[lr-scheduler] fix load state dict and add test (#5369 )	10 months ago
Hongxin Liu	eb4f2d90f9	[llama] polish training script and fix optim ckpt (#5368 )	10 months ago
Camille Zhong	a5756a8720	[eval] update llama npu eval (#5366 )	10 months ago
Camille Zhong	44ca61a22b	[llama] fix neftune & pbar with start_step (#5364 )	10 months ago
Hongxin Liu	a4cec1715b	[llama] add flash attn patch for npu (#5362 )	10 months ago
Hongxin Liu	73f9f23fc6	[llama] update training script (#5360 ) * [llama] update training script * [doc] polish docstr	10 months ago
Hongxin Liu	6c0fa7b9a8	[llama] fix dataloader for hybrid parallel (#5358 ) * [plugin] refactor prepare dataloader * [plugin] update train script	10 months ago
Hongxin Liu	2dd01e3a14	[gemini] fix param op hook when output is tuple (#5355 ) * [gemini] fix param op hook when output is tuple * [gemini] fix param op hook	10 months ago
Wenhao Chen	1c790c0877	[fix] remove unnecessary dp_size assert (#5351 ) * fix: remove unnecessary assert * test: add more 3d plugin tests * fix: add warning	10 months ago
Hongxin Liu	ffffc32dc7	[checkpointio] fix gemini and hybrid parallel optim checkpoint (#5347 ) * [checkpointio] fix hybrid parallel optim checkpoint * [extension] fix cuda extension * [checkpointio] fix gemini optimizer checkpoint * polish code	10 months ago
YeAnbang	c5239840e6	[Chat] fix sft loss nan (#5345 ) * fix script * fix script * fix chat nan * fix chat nan	10 months ago
Frank Lee	abd8e77ad8	[extension] fixed exception catch (#5342 )	10 months ago
digger yu	71321a07cf	fix typo change dosen't to doesn't (#5308 )	10 months ago
digger yu	6a3086a505	fix typo under extensions/ (#5330 )	10 months ago
Frank Lee	febed23288	[doc] added docs for extensions (#5324 ) * [doc] added docs for extensions * polish * polish	10 months ago
flybird11111	388179f966	[tests] fix t5 test. (#5322 ) * [ci] fix shardformer tests. (#5255) * fix ci fix * revert: revert p2p * feat: add enable_metadata_cache option * revert: enable t5 tests --------- Co-authored-by: Wenhao Chen <cwher@outlook.com> * fix t5 test --------- Co-authored-by: Wenhao Chen <cwher@outlook.com>	10 months ago
Frank Lee	a6709afe66	Merge pull request #5321 from FrankLeeeee/hotfix/accelerator-api [accelerator] fixed npu api	10 months ago
FrankLeeeee	087d0cb1fc	[accelerator] fixed npu api	10 months ago
Frank Lee	8823cc4831	Merge pull request #5310 from hpcaitech/feature/npu Feature/npu	10 months ago
Frank Lee	73f4dc578e	[workflow] updated CI image (#5318 )	10 months ago
Frank Lee	7cfed5f076	[feat] refactored extension module (#5298 ) * [feat] refactored extension module * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish	10 months ago
digger yu	bce9499ed3	fix some typo (#5307 )	10 months ago
李文军	ec912b1ba9	[NFC] polish applications/Colossal-LLaMA-2/colossal_llama2/tokenizer/init_tokenizer.py code style (#5228 )	10 months ago
Desperado-Jia	ddf879e2db	fix bug for mefture (#5299 )	10 months ago
Hongxin Liu	d7f8db8e21	[hotfix] fix 3d plugin test (#5292 )	10 months ago
flybird11111	f7e3f82a7e	fix llama pretrain (#5287 )	10 months ago
Desperado-Jia	6a56967855	[doc] add llama2-13B disyplay (#5285 ) * Update README.md * fix 13b typo --------- Co-authored-by: binmakeswell <binmakeswell@gmail.com>	10 months ago
Michelle	32cb74493a	fix auto loading gpt2 tokenizer (#5279 )	10 months ago
Frank Lee	d66e6988bc	Merge pull request #5278 from ver217/sync/npu [sync] sync npu branch with main	10 months ago
ver217	148469348a	Merge branch 'main' into sync/npu	10 months ago
Zhongkai Zhao	5d9a0ae75b	[hotfix] Fix ShardFormer test execution path when using sequence parallelism (#5230 )	10 months ago
flybird11111	46e091651b	[shardformer] hybridparallelplugin support gradients accumulation. (#5246 ) * support gradients acc fix fix fix fix fix fix fix fix fix fix fix fix fix * fix fix * fix fix fix	10 months ago
flybird11111	2a0558d8ec	[ci] fix test_hybrid_parallel_plugin_checkpoint_io.py (#5276 ) * fix ci fix * fix test * revert: revert p2p * feat: add enable_metadata_cache option * revert: enable t5 tests * fix --------- Co-authored-by: Wenhao Chen <cwher@outlook.com>	10 months ago
Frank Lee	d69cd2eb89	[workflow] fixed oom tests (#5275 ) * [workflow] fixed oom tests * polish * polish * polish	10 months ago
Frank Lee	04244aaaf1	[workflow] fixed incomplete bash command (#5272 )	10 months ago
Wenhao Chen	ef4f0ee854	[hotfix]: add pp sanity check and fix mbs arg (#5268 ) * fix: fix misleading mbs arg * feat: add pp sanity check * fix: fix 1f1b sanity check	11 months ago

1 2 3 4 5 ...

3098 Commits (b2e97458883c64e4f357059f585ff2585fa12edd) All Branches Search

3098 Commits (b2e97458883c64e4f357059f585ff2585fa12edd)

All Branches