ColossalAI

Commit Graph

Author	SHA1	Message	Date
Steve Luo	ed431de4e4	fix rmsnorm template function invocation problem(template function partial specialization is not allowed in Cpp) and luckily pass e2e precision test (#5454 )	2024-03-13 16:00:55 +08:00
Hongxin Liu	f2e8b9ef9f	[devops] fix compatibility (#5444 ) * [devops] fix compatibility * [hotfix] update compatibility test on pr * [devops] fix compatibility * [devops] record duration during comp test * [test] decrease test duration * fix falcon	2024-03-13 15:24:13 +08:00
傅剑寒	6fd355a5a6	Merge pull request #5452 from Courtesy-Xs/fix_include_path fix include path	2024-03-13 11:26:41 +08:00
xs_courtesy	c1c45e9d8e	fix include path	2024-03-13 11:21:06 +08:00
Steve Luo	b699f54007	optimize rmsnorm: add vectorized elementwise op, feat loop unrolling (#5441 )	2024-03-12 17:48:02 +08:00
傅剑寒	368a2aa543	Merge pull request #5445 from Courtesy-Xs/refactor_infer_compilation Refactor colossal-infer code arch	2024-03-12 14:14:37 +08:00
digger yu	385e85afd4	[hotfix] fix typo s/keywrods/keywords etc. (#5429 )	2024-03-12 11:25:16 +08:00
xs_courtesy	095c070a6e	refactor code	2024-03-11 17:06:57 +08:00
Camille Zhong	da885ed540	fix tensor data update for gemini loss caluculation (#5442 )	2024-03-11 13:49:58 +08:00
傅剑寒	21e1e3645c	Merge pull request #5435 from Courtesy-Xs/add_gpu_launch_config Add query and other components	2024-03-11 11:15:29 +08:00
Runyu Lu	633e95b301	[doc] add doc	2024-03-11 10:56:51 +08:00
Runyu Lu	9dec66fad6	[fix] multi graphs capture error	2024-03-11 10:51:16 +08:00
Runyu Lu	b2c0d9ff2b	[fix] multi graphs capture error	2024-03-11 10:49:31 +08:00
Steve Luo	f7aecc0c6b	feat rmsnorm cuda kernel and add unittest, benchmark script (#5417 )	2024-03-08 16:21:12 +08:00
xs_courtesy	5eb5ff1464	refactor code	2024-03-08 15:41:14 +08:00
xs_courtesy	01d289d8e5	Merge branch 'feature/colossal-infer' of https://github.com/hpcaitech/ColossalAI into add_gpu_launch_config	2024-03-08 15:04:55 +08:00
xs_courtesy	a46598ac59	add reusable utils for cuda	2024-03-08 14:53:29 +08:00
傅剑寒	2b28b54ac6	Merge pull request #5433 from Courtesy-Xs/add_silu_and_mul 【Inference】Add silu_and_mul for infer	2024-03-08 14:44:37 +08:00
Runyu Lu	cefaeb5fdd	[feat] cuda graph support and refactor non-functional api	2024-03-08 14:19:35 +08:00
Hongxin Liu	8020f42630	[release] update version (#5411 )	2024-03-07 23:36:07 +08:00
xs_courtesy	95c21498d4	add silu_and_mul for infer	2024-03-07 16:57:49 +08:00
Camille Zhong	743e7fad2f	[colossal-llama2] add stream chat examlple for chat version model (#5428 ) * add stream chat for chat version * remove os.system clear * modify function name	2024-03-07 14:58:56 +08:00
Youngon	68f55a709c	[hotfix] fix stable diffusion inference bug. (#5289 ) * Update train_ddp.yaml delete "strategy" to fix DDP config loading bug in "main.py" * Update train_ddp.yaml fix inference with scripts/txt2img.py config file load bug. * Update README.md add pretrain model test code.	2024-03-05 22:03:40 +08:00
hugo-syn	c8003d463b	[doc] Fix typo s/infered/inferred/ (#5288 ) Signed-off-by: hugo-syn <hugo.vincent@synacktiv.com>	2024-03-05 22:02:08 +08:00
digger yu	5e1c93d732	[hotfix] fix typo change MoECheckpintIO to MoECheckpointIO (#5335 ) Co-authored-by: binmakeswell <binmakeswell@gmail.com>	2024-03-05 21:52:30 +08:00
Dongruixuan Li	a7ae2b5b4c	[eval-hotfix] set few_shot_data to None when few shot is disabled (#5422 )	2024-03-05 21:48:55 +08:00
digger yu	049121d19d	[hotfix] fix typo change enabel to enable under colossalai/shardformer/ (#5317 )	2024-03-05 21:48:46 +08:00
digger yu	16c96d4d8c	[hotfix] fix typo change _descrption to _description (#5331 )	2024-03-05 21:47:48 +08:00
digger yu	70cce5cbed	[doc] update some translations with README-zh-Hans.md (#5382 )	2024-03-05 21:45:55 +08:00
Luo Yihang	e239cf9060	[hotfix] fix typo of openmoe model source (#5403 )	2024-03-05 21:44:38 +08:00
MickeyCHAN	e304e4db35	[hotfix] fix sd vit import error (#5420 ) * fix import error * Update dpt_depth.py --------- Co-authored-by: binmakeswell <binmakeswell@gmail.com>	2024-03-05 21:41:23 +08:00
Hongxin Liu	070df689e6	[devops] fix extention building (#5427 )	2024-03-05 15:35:54 +08:00
binmakeswell	822241a99c	[doc] sora release (#5425 ) * [doc] sora release * [doc] sora release * [doc] sora release * [doc] sora release	2024-03-05 12:08:58 +08:00
flybird11111	29695cf70c	[example]add gpt2 benchmark example script. (#5295 ) * benchmark gpt2 * fix fix fix fix * [doc] fix typo in Colossal-LLaMA-2/README.md (#5247) * [workflow] fixed build CI (#5240) * [workflow] fixed build CI * polish * polish * polish * polish * polish * [ci] fixed booster test (#5251) * [ci] fixed booster test * [ci] fixed booster test * [ci] fixed booster test * [ci] fixed ddp test (#5254) * [ci] fixed ddp test * polish * fix typo in applications/ColossalEval/README.md (#5250) * [ci] fix shardformer tests. (#5255) * fix ci fix * revert: revert p2p * feat: add enable_metadata_cache option * revert: enable t5 tests --------- Co-authored-by: Wenhao Chen <cwher@outlook.com> * [doc] fix doc typo (#5256) * [doc] fix annotation display * [doc] fix llama2 doc * [hotfix]: add pp sanity check and fix mbs arg (#5268) * fix: fix misleading mbs arg * feat: add pp sanity check * fix: fix 1f1b sanity check * [workflow] fixed incomplete bash command (#5272) * [workflow] fixed oom tests (#5275) * [workflow] fixed oom tests * polish * polish * polish * [ci] fix test_hybrid_parallel_plugin_checkpoint_io.py (#5276) * fix ci fix * fix test * revert: revert p2p * feat: add enable_metadata_cache option * revert: enable t5 tests * fix --------- Co-authored-by: Wenhao Chen <cwher@outlook.com> * [shardformer] hybridparallelplugin support gradients accumulation. (#5246) * support gradients acc fix fix fix fix fix fix fix fix fix fix fix fix fix * fix fix * fix fix fix * [hotfix] Fix ShardFormer test execution path when using sequence parallelism (#5230) * fix auto loading gpt2 tokenizer (#5279) * [doc] add llama2-13B disyplay (#5285) * Update README.md * fix 13b typo --------- Co-authored-by: binmakeswell <binmakeswell@gmail.com> * fix llama pretrain (#5287) * fix * fix * fix fix * fix fix fix * fix fix * benchmark gpt2 * fix fix fix fix * [workflow] fixed build CI (#5240) * [workflow] fixed build CI * polish * polish * polish * polish * polish * [ci] fixed booster test (#5251) * [ci] fixed booster test * [ci] fixed booster test * [ci] fixed booster test * fix fix * fix fix fix * fix * fix fix fix fix fix * fix * Update shardformer.py --------- Co-authored-by: digger yu <digger-yu@outlook.com> Co-authored-by: Frank Lee <somerlee.9@gmail.com> Co-authored-by: Wenhao Chen <cwher@outlook.com> Co-authored-by: binmakeswell <binmakeswell@gmail.com> Co-authored-by: Zhongkai Zhao <kanezz620@gmail.com> Co-authored-by: Michelle <97082656+MichelleMa8@users.noreply.github.com> Co-authored-by: Desperado-Jia <502205863@qq.com>	2024-03-04 16:18:13 +08:00
Frank Lee	593a72e4d5	Merge pull request #5424 from FrankLeeeee/sync/main Sync/main	2024-03-04 10:13:59 +08:00
FrankLeeeee	0310b76e9d	Merge branch 'main' into sync/main	2024-03-04 10:09:36 +08:00
Camille Zhong	4b8312c08e	fix sft single turn inference example (#5416 )	2024-03-01 17:27:50 +08:00
binmakeswell	a1c6cdb189	[doc] fix blog link	2024-02-29 15:01:43 +08:00
binmakeswell	5de940de32	[doc] fix blog link	2024-02-29 15:01:43 +08:00
Frank Lee	2461f37886	[workflow] added pypi channel (#5412 )	2024-02-29 13:56:55 +08:00
Tong Li	a28c971516	update requirements (#5407 )	2024-02-28 17:46:27 +08:00
yuehuayingxueluo	0aa27f1961	[Inference]Move benchmark-related code to the example directory. (#5408 ) * move benchmark-related code to the example directory. * fix bugs in test_fused_rotary_embedding.py	2024-02-28 16:46:03 +08:00
yuehuayingxueluo	600881a8ea	[Inference]Add CUDA KVCache Kernel (#5406 ) * add cuda KVCache kernel * annotation benchmark_kvcache_copy * add use cuda * fix import path * move benchmark scripts to example/ * rm benchmark codes in test_kv_cache_memcpy.py * rm redundancy codes * rm redundancy codes * pr was modified according to the review	2024-02-28 14:36:50 +08:00
flybird11111	0a25e16e46	[shardformer]gather llama logits (#5398 ) * gather llama logits * fix	2024-02-27 22:44:07 +08:00
Frank Lee	dcdd8a5ef7	[setup] fixed nightly release (#5388 )	2024-02-27 15:19:13 +08:00
QinLuo	bf34c6fef6	[fsdp] impl save/load shard model/optimizer (#5357 )	2024-02-27 13:51:14 +08:00
Hongxin Liu	d882d18c65	[example] reuse flash attn patch (#5400 )	2024-02-27 11:22:07 +08:00
Hongxin Liu	95c21e3950	[extension] hotfix jit extension setup (#5402 )	2024-02-26 19:46:58 +08:00
Yuanheng Zhao	19061188c3	[Infer/Fix] Fix Dependency in test - RMSNorm kernel (#5399 ) fix dependency in pytest	2024-02-26 16:17:47 +08:00
yuehuayingxueluo	bc1da87366	[Fix/Inference] Fix format of input prompts and input model in inference engine (#5395 ) * Fix bugs in inference_engine * fix bugs in engine.py * rm CUDA_VISIBLE_DEVICES * add request_ids in generate * fix bug in engine.py * add logger.debug for BatchBucket	2024-02-23 10:51:35 +08:00

... 8 9 10 11 12 ...

3572 Commits (colossalchat) All Branches Search

3572 Commits (colossalchat)

All Branches