ColossalAI

Commit Graph

Author	SHA1	Message	Date
Runyu Lu	aabc9fb6aa	[feat] add use_cuda_kernel option	2024-03-19 13:24:25 +08:00
xs_courtesy	48c4f29b27	refactor vector utils	2024-03-19 11:32:01 +08:00
binmakeswell	bd998ced03	[doc] release Open-Sora 1.0 with model weights (#5468 ) * [doc] release Open-Sora 1.0 with model weights * [doc] release Open-Sora 1.0 with model weights * [doc] release Open-Sora 1.0 with model weights	2024-03-18 18:31:18 +08:00
flybird11111	5e16bf7980	[shardformer] fix gathering output when using tensor parallelism (#5431 ) * fix * padding vocab_size when using pipeline parallellism padding vocab_size when using pipeline parallellism fix fix * fix * fix fix fix * fix gather output * fix * fix * fix fix resize embedding fix resize embedding * fix resize embedding fix * revert * revert * revert	2024-03-18 15:55:11 +08:00
傅剑寒	b6e9785885	Merge pull request #5457 from Courtesy-Xs/ly_add_implementation_for_launch_config add implementatino for GetGPULaunchConfig1D	2024-03-15 11:23:44 +08:00
xs_courtesy	5724b9e31e	add some comments	2024-03-15 11:18:57 +08:00
Runyu Lu	6e30248683	[fix] tmp for test	2024-03-14 16:13:00 +08:00
xs_courtesy	388e043930	add implementatino for GetGPULaunchConfig1D	2024-03-14 11:13:40 +08:00
Runyu Lu	d02e257abd	Merge branch 'feature/colossal-infer' into colossal-infer-cuda-graph	2024-03-14 10:37:05 +08:00
Runyu Lu	ae24b4f025	diverse tests	2024-03-14 10:35:08 +08:00
Runyu Lu	1821a6dab0	[fix] pytest and fix dyn grid bug	2024-03-13 17:28:32 +08:00
yuehuayingxueluo	f366a5ea1f	[Inference/kernel]Add Fused Rotary Embedding and KVCache Memcopy CUDA Kernel (#5418 ) * add rotary embedding kernel * add rotary_embedding_kernel * add fused rotary_emb and kvcache memcopy * add fused_rotary_emb_and_cache_kernel.cu * add fused_rotary_emb_and_memcopy * fix bugs in fused_rotary_emb_and_cache_kernel.cu * fix ci bugs * use vec memcopy and opt the gloabl memory access * fix code style * fix test_rotary_embdding_unpad.py * codes revised based on the review comments * fix bugs about include path * rm inline	2024-03-13 17:20:03 +08:00
Steve Luo	ed431de4e4	fix rmsnorm template function invocation problem(template function partial specialization is not allowed in Cpp) and luckily pass e2e precision test (#5454 )	2024-03-13 16:00:55 +08:00
Hongxin Liu	f2e8b9ef9f	[devops] fix compatibility (#5444 ) * [devops] fix compatibility * [hotfix] update compatibility test on pr * [devops] fix compatibility * [devops] record duration during comp test * [test] decrease test duration * fix falcon	2024-03-13 15:24:13 +08:00
傅剑寒	6fd355a5a6	Merge pull request #5452 from Courtesy-Xs/fix_include_path fix include path	2024-03-13 11:26:41 +08:00
xs_courtesy	c1c45e9d8e	fix include path	2024-03-13 11:21:06 +08:00
Steve Luo	b699f54007	optimize rmsnorm: add vectorized elementwise op, feat loop unrolling (#5441 )	2024-03-12 17:48:02 +08:00
傅剑寒	368a2aa543	Merge pull request #5445 from Courtesy-Xs/refactor_infer_compilation Refactor colossal-infer code arch	2024-03-12 14:14:37 +08:00
digger yu	385e85afd4	[hotfix] fix typo s/keywrods/keywords etc. (#5429 )	2024-03-12 11:25:16 +08:00
xs_courtesy	095c070a6e	refactor code	2024-03-11 17:06:57 +08:00
Camille Zhong	da885ed540	fix tensor data update for gemini loss caluculation (#5442 )	2024-03-11 13:49:58 +08:00
傅剑寒	21e1e3645c	Merge pull request #5435 from Courtesy-Xs/add_gpu_launch_config Add query and other components	2024-03-11 11:15:29 +08:00
Runyu Lu	633e95b301	[doc] add doc	2024-03-11 10:56:51 +08:00
Runyu Lu	9dec66fad6	[fix] multi graphs capture error	2024-03-11 10:51:16 +08:00
Runyu Lu	b2c0d9ff2b	[fix] multi graphs capture error	2024-03-11 10:49:31 +08:00
Steve Luo	f7aecc0c6b	feat rmsnorm cuda kernel and add unittest, benchmark script (#5417 )	2024-03-08 16:21:12 +08:00
xs_courtesy	5eb5ff1464	refactor code	2024-03-08 15:41:14 +08:00
xs_courtesy	01d289d8e5	Merge branch 'feature/colossal-infer' of https://github.com/hpcaitech/ColossalAI into add_gpu_launch_config	2024-03-08 15:04:55 +08:00
xs_courtesy	a46598ac59	add reusable utils for cuda	2024-03-08 14:53:29 +08:00
傅剑寒	2b28b54ac6	Merge pull request #5433 from Courtesy-Xs/add_silu_and_mul 【Inference】Add silu_and_mul for infer	2024-03-08 14:44:37 +08:00
Runyu Lu	cefaeb5fdd	[feat] cuda graph support and refactor non-functional api	2024-03-08 14:19:35 +08:00
Hongxin Liu	8020f42630	[release] update version (#5411 )	2024-03-07 23:36:07 +08:00
xs_courtesy	95c21498d4	add silu_and_mul for infer	2024-03-07 16:57:49 +08:00
Camille Zhong	743e7fad2f	[colossal-llama2] add stream chat examlple for chat version model (#5428 ) * add stream chat for chat version * remove os.system clear * modify function name	2024-03-07 14:58:56 +08:00
Youngon	68f55a709c	[hotfix] fix stable diffusion inference bug. (#5289 ) * Update train_ddp.yaml delete "strategy" to fix DDP config loading bug in "main.py" * Update train_ddp.yaml fix inference with scripts/txt2img.py config file load bug. * Update README.md add pretrain model test code.	2024-03-05 22:03:40 +08:00
hugo-syn	c8003d463b	[doc] Fix typo s/infered/inferred/ (#5288 ) Signed-off-by: hugo-syn <hugo.vincent@synacktiv.com>	2024-03-05 22:02:08 +08:00
digger yu	5e1c93d732	[hotfix] fix typo change MoECheckpintIO to MoECheckpointIO (#5335 ) Co-authored-by: binmakeswell <binmakeswell@gmail.com>	2024-03-05 21:52:30 +08:00
Dongruixuan Li	a7ae2b5b4c	[eval-hotfix] set few_shot_data to None when few shot is disabled (#5422 )	2024-03-05 21:48:55 +08:00
digger yu	049121d19d	[hotfix] fix typo change enabel to enable under colossalai/shardformer/ (#5317 )	2024-03-05 21:48:46 +08:00
digger yu	16c96d4d8c	[hotfix] fix typo change _descrption to _description (#5331 )	2024-03-05 21:47:48 +08:00
digger yu	70cce5cbed	[doc] update some translations with README-zh-Hans.md (#5382 )	2024-03-05 21:45:55 +08:00
Luo Yihang	e239cf9060	[hotfix] fix typo of openmoe model source (#5403 )	2024-03-05 21:44:38 +08:00
MickeyCHAN	e304e4db35	[hotfix] fix sd vit import error (#5420 ) * fix import error * Update dpt_depth.py --------- Co-authored-by: binmakeswell <binmakeswell@gmail.com>	2024-03-05 21:41:23 +08:00
Hongxin Liu	070df689e6	[devops] fix extention building (#5427 )	2024-03-05 15:35:54 +08:00
binmakeswell	822241a99c	[doc] sora release (#5425 ) * [doc] sora release * [doc] sora release * [doc] sora release * [doc] sora release	2024-03-05 12:08:58 +08:00
flybird11111	29695cf70c	[example]add gpt2 benchmark example script. (#5295 ) * benchmark gpt2 * fix fix fix fix * [doc] fix typo in Colossal-LLaMA-2/README.md (#5247) * [workflow] fixed build CI (#5240) * [workflow] fixed build CI * polish * polish * polish * polish * polish * [ci] fixed booster test (#5251) * [ci] fixed booster test * [ci] fixed booster test * [ci] fixed booster test * [ci] fixed ddp test (#5254) * [ci] fixed ddp test * polish * fix typo in applications/ColossalEval/README.md (#5250) * [ci] fix shardformer tests. (#5255) * fix ci fix * revert: revert p2p * feat: add enable_metadata_cache option * revert: enable t5 tests --------- Co-authored-by: Wenhao Chen <cwher@outlook.com> * [doc] fix doc typo (#5256) * [doc] fix annotation display * [doc] fix llama2 doc * [hotfix]: add pp sanity check and fix mbs arg (#5268) * fix: fix misleading mbs arg * feat: add pp sanity check * fix: fix 1f1b sanity check * [workflow] fixed incomplete bash command (#5272) * [workflow] fixed oom tests (#5275) * [workflow] fixed oom tests * polish * polish * polish * [ci] fix test_hybrid_parallel_plugin_checkpoint_io.py (#5276) * fix ci fix * fix test * revert: revert p2p * feat: add enable_metadata_cache option * revert: enable t5 tests * fix --------- Co-authored-by: Wenhao Chen <cwher@outlook.com> * [shardformer] hybridparallelplugin support gradients accumulation. (#5246) * support gradients acc fix fix fix fix fix fix fix fix fix fix fix fix fix * fix fix * fix fix fix * [hotfix] Fix ShardFormer test execution path when using sequence parallelism (#5230) * fix auto loading gpt2 tokenizer (#5279) * [doc] add llama2-13B disyplay (#5285) * Update README.md * fix 13b typo --------- Co-authored-by: binmakeswell <binmakeswell@gmail.com> * fix llama pretrain (#5287) * fix * fix * fix fix * fix fix fix * fix fix * benchmark gpt2 * fix fix fix fix * [workflow] fixed build CI (#5240) * [workflow] fixed build CI * polish * polish * polish * polish * polish * [ci] fixed booster test (#5251) * [ci] fixed booster test * [ci] fixed booster test * [ci] fixed booster test * fix fix * fix fix fix * fix * fix fix fix fix fix * fix * Update shardformer.py --------- Co-authored-by: digger yu <digger-yu@outlook.com> Co-authored-by: Frank Lee <somerlee.9@gmail.com> Co-authored-by: Wenhao Chen <cwher@outlook.com> Co-authored-by: binmakeswell <binmakeswell@gmail.com> Co-authored-by: Zhongkai Zhao <kanezz620@gmail.com> Co-authored-by: Michelle <97082656+MichelleMa8@users.noreply.github.com> Co-authored-by: Desperado-Jia <502205863@qq.com>	2024-03-04 16:18:13 +08:00
Frank Lee	593a72e4d5	Merge pull request #5424 from FrankLeeeee/sync/main Sync/main	2024-03-04 10:13:59 +08:00
FrankLeeeee	0310b76e9d	Merge branch 'main' into sync/main	2024-03-04 10:09:36 +08:00
Camille Zhong	4b8312c08e	fix sft single turn inference example (#5416 )	2024-03-01 17:27:50 +08:00
binmakeswell	a1c6cdb189	[doc] fix blog link	2024-02-29 15:01:43 +08:00

1 2 3 4 5 ...

3184 Commits (a37f82629d7b9e3c3a0f430b8dd3ff6f38ddf1d4) All Branches Search

3184 Commits (a37f82629d7b9e3c3a0f430b8dd3ff6f38ddf1d4)

All Branches