ColossalAI

Commit Graph

Author	SHA1	Message	Date
Runyu Lu	6e30248683	[fix] tmp for test	9 months ago
xs_courtesy	388e043930	add implementatino for GetGPULaunchConfig1D	9 months ago
Runyu Lu	d02e257abd	Merge branch 'feature/colossal-infer' into colossal-infer-cuda-graph	9 months ago
Runyu Lu	ae24b4f025	diverse tests	9 months ago
Runyu Lu	1821a6dab0	[fix] pytest and fix dyn grid bug	9 months ago
yuehuayingxueluo	f366a5ea1f	[Inference/kernel]Add Fused Rotary Embedding and KVCache Memcopy CUDA Kernel (#5418 ) * add rotary embedding kernel * add rotary_embedding_kernel * add fused rotary_emb and kvcache memcopy * add fused_rotary_emb_and_cache_kernel.cu * add fused_rotary_emb_and_memcopy * fix bugs in fused_rotary_emb_and_cache_kernel.cu * fix ci bugs * use vec memcopy and opt the gloabl memory access * fix code style * fix test_rotary_embdding_unpad.py * codes revised based on the review comments * fix bugs about include path * rm inline	9 months ago
Steve Luo	ed431de4e4	fix rmsnorm template function invocation problem(template function partial specialization is not allowed in Cpp) and luckily pass e2e precision test (#5454 )	9 months ago
Hongxin Liu	f2e8b9ef9f	[devops] fix compatibility (#5444 ) * [devops] fix compatibility * [hotfix] update compatibility test on pr * [devops] fix compatibility * [devops] record duration during comp test * [test] decrease test duration * fix falcon	9 months ago
傅剑寒	6fd355a5a6	Merge pull request #5452 from Courtesy-Xs/fix_include_path fix include path	9 months ago
xs_courtesy	c1c45e9d8e	fix include path	9 months ago
Steve Luo	b699f54007	optimize rmsnorm: add vectorized elementwise op, feat loop unrolling (#5441 )	9 months ago
傅剑寒	368a2aa543	Merge pull request #5445 from Courtesy-Xs/refactor_infer_compilation Refactor colossal-infer code arch	9 months ago
digger yu	385e85afd4	[hotfix] fix typo s/keywrods/keywords etc. (#5429 )	9 months ago
xs_courtesy	095c070a6e	refactor code	9 months ago
Camille Zhong	da885ed540	fix tensor data update for gemini loss caluculation (#5442 )	9 months ago
傅剑寒	21e1e3645c	Merge pull request #5435 from Courtesy-Xs/add_gpu_launch_config Add query and other components	9 months ago
Runyu Lu	633e95b301	[doc] add doc	9 months ago
Runyu Lu	9dec66fad6	[fix] multi graphs capture error	9 months ago
Runyu Lu	b2c0d9ff2b	[fix] multi graphs capture error	9 months ago
Steve Luo	f7aecc0c6b	feat rmsnorm cuda kernel and add unittest, benchmark script (#5417 )	9 months ago
xs_courtesy	5eb5ff1464	refactor code	9 months ago
xs_courtesy	01d289d8e5	Merge branch 'feature/colossal-infer' of https://github.com/hpcaitech/ColossalAI into add_gpu_launch_config	9 months ago
xs_courtesy	a46598ac59	add reusable utils for cuda	9 months ago
傅剑寒	2b28b54ac6	Merge pull request #5433 from Courtesy-Xs/add_silu_and_mul 【Inference】Add silu_and_mul for infer	9 months ago
Runyu Lu	cefaeb5fdd	[feat] cuda graph support and refactor non-functional api	9 months ago
Hongxin Liu	8020f42630	[release] update version (#5411 )	9 months ago
xs_courtesy	95c21498d4	add silu_and_mul for infer	9 months ago
Camille Zhong	743e7fad2f	[colossal-llama2] add stream chat examlple for chat version model (#5428 ) * add stream chat for chat version * remove os.system clear * modify function name	9 months ago
Youngon	68f55a709c	[hotfix] fix stable diffusion inference bug. (#5289 ) * Update train_ddp.yaml delete "strategy" to fix DDP config loading bug in "main.py" * Update train_ddp.yaml fix inference with scripts/txt2img.py config file load bug. * Update README.md add pretrain model test code.	9 months ago
hugo-syn	c8003d463b	[doc] Fix typo s/infered/inferred/ (#5288 ) Signed-off-by: hugo-syn <hugo.vincent@synacktiv.com>	9 months ago
digger yu	5e1c93d732	[hotfix] fix typo change MoECheckpintIO to MoECheckpointIO (#5335 ) Co-authored-by: binmakeswell <binmakeswell@gmail.com>	9 months ago
Dongruixuan Li	a7ae2b5b4c	[eval-hotfix] set few_shot_data to None when few shot is disabled (#5422 )	9 months ago
digger yu	049121d19d	[hotfix] fix typo change enabel to enable under colossalai/shardformer/ (#5317 )	9 months ago
digger yu	16c96d4d8c	[hotfix] fix typo change _descrption to _description (#5331 )	9 months ago
digger yu	70cce5cbed	[doc] update some translations with README-zh-Hans.md (#5382 )	9 months ago
Luo Yihang	e239cf9060	[hotfix] fix typo of openmoe model source (#5403 )	9 months ago
MickeyCHAN	e304e4db35	[hotfix] fix sd vit import error (#5420 ) * fix import error * Update dpt_depth.py --------- Co-authored-by: binmakeswell <binmakeswell@gmail.com>	9 months ago
Hongxin Liu	070df689e6	[devops] fix extention building (#5427 )	9 months ago
binmakeswell	822241a99c	[doc] sora release (#5425 ) * [doc] sora release * [doc] sora release * [doc] sora release * [doc] sora release	9 months ago
flybird11111	29695cf70c	[example]add gpt2 benchmark example script. (#5295 ) * benchmark gpt2 * fix fix fix fix * [doc] fix typo in Colossal-LLaMA-2/README.md (#5247) * [workflow] fixed build CI (#5240) * [workflow] fixed build CI * polish * polish * polish * polish * polish * [ci] fixed booster test (#5251) * [ci] fixed booster test * [ci] fixed booster test * [ci] fixed booster test * [ci] fixed ddp test (#5254) * [ci] fixed ddp test * polish * fix typo in applications/ColossalEval/README.md (#5250) * [ci] fix shardformer tests. (#5255) * fix ci fix * revert: revert p2p * feat: add enable_metadata_cache option * revert: enable t5 tests --------- Co-authored-by: Wenhao Chen <cwher@outlook.com> * [doc] fix doc typo (#5256) * [doc] fix annotation display * [doc] fix llama2 doc * [hotfix]: add pp sanity check and fix mbs arg (#5268) * fix: fix misleading mbs arg * feat: add pp sanity check * fix: fix 1f1b sanity check * [workflow] fixed incomplete bash command (#5272) * [workflow] fixed oom tests (#5275) * [workflow] fixed oom tests * polish * polish * polish * [ci] fix test_hybrid_parallel_plugin_checkpoint_io.py (#5276) * fix ci fix * fix test * revert: revert p2p * feat: add enable_metadata_cache option * revert: enable t5 tests * fix --------- Co-authored-by: Wenhao Chen <cwher@outlook.com> * [shardformer] hybridparallelplugin support gradients accumulation. (#5246) * support gradients acc fix fix fix fix fix fix fix fix fix fix fix fix fix * fix fix * fix fix fix * [hotfix] Fix ShardFormer test execution path when using sequence parallelism (#5230) * fix auto loading gpt2 tokenizer (#5279) * [doc] add llama2-13B disyplay (#5285) * Update README.md * fix 13b typo --------- Co-authored-by: binmakeswell <binmakeswell@gmail.com> * fix llama pretrain (#5287) * fix * fix * fix fix * fix fix fix * fix fix * benchmark gpt2 * fix fix fix fix * [workflow] fixed build CI (#5240) * [workflow] fixed build CI * polish * polish * polish * polish * polish * [ci] fixed booster test (#5251) * [ci] fixed booster test * [ci] fixed booster test * [ci] fixed booster test * fix fix * fix fix fix * fix * fix fix fix fix fix * fix * Update shardformer.py --------- Co-authored-by: digger yu <digger-yu@outlook.com> Co-authored-by: Frank Lee <somerlee.9@gmail.com> Co-authored-by: Wenhao Chen <cwher@outlook.com> Co-authored-by: binmakeswell <binmakeswell@gmail.com> Co-authored-by: Zhongkai Zhao <kanezz620@gmail.com> Co-authored-by: Michelle <97082656+MichelleMa8@users.noreply.github.com> Co-authored-by: Desperado-Jia <502205863@qq.com>	9 months ago
Frank Lee	593a72e4d5	Merge pull request #5424 from FrankLeeeee/sync/main Sync/main	9 months ago
FrankLeeeee	0310b76e9d	Merge branch 'main' into sync/main	9 months ago
Camille Zhong	4b8312c08e	fix sft single turn inference example (#5416 )	9 months ago
binmakeswell	a1c6cdb189	[doc] fix blog link	9 months ago
binmakeswell	5de940de32	[doc] fix blog link	9 months ago
Frank Lee	2461f37886	[workflow] added pypi channel (#5412 )	9 months ago
Tong Li	a28c971516	update requirements (#5407 )	9 months ago
yuehuayingxueluo	0aa27f1961	[Inference]Move benchmark-related code to the example directory. (#5408 ) * move benchmark-related code to the example directory. * fix bugs in test_fused_rotary_embedding.py	9 months ago
yuehuayingxueluo	600881a8ea	[Inference]Add CUDA KVCache Kernel (#5406 ) * add cuda KVCache kernel * annotation benchmark_kvcache_copy * add use cuda * fix import path * move benchmark scripts to example/ * rm benchmark codes in test_kv_cache_memcpy.py * rm redundancy codes * rm redundancy codes * pr was modified according to the review	9 months ago
flybird11111	0a25e16e46	[shardformer]gather llama logits (#5398 ) * gather llama logits * fix	9 months ago

... 7 8 9 10 11 ...

3528 Commits (37443cc7e499aa836d4897bf51b1119815da45b3) All Branches Search

3528 Commits (37443cc7e499aa836d4897bf51b1119815da45b3)

All Branches