Runyu Lu
6e30248683
[fix] tmp for test
9 months ago
xs_courtesy
388e043930
add implementatino for GetGPULaunchConfig1D
9 months ago
Runyu Lu
d02e257abd
Merge branch 'feature/colossal-infer' into colossal-infer-cuda-graph
9 months ago
Runyu Lu
ae24b4f025
diverse tests
9 months ago
Runyu Lu
1821a6dab0
[fix] pytest and fix dyn grid bug
9 months ago
yuehuayingxueluo
f366a5ea1f
[Inference/kernel]Add Fused Rotary Embedding and KVCache Memcopy CUDA Kernel ( #5418 )
...
* add rotary embedding kernel
* add rotary_embedding_kernel
* add fused rotary_emb and kvcache memcopy
* add fused_rotary_emb_and_cache_kernel.cu
* add fused_rotary_emb_and_memcopy
* fix bugs in fused_rotary_emb_and_cache_kernel.cu
* fix ci bugs
* use vec memcopy and opt the gloabl memory access
* fix code style
* fix test_rotary_embdding_unpad.py
* codes revised based on the review comments
* fix bugs about include path
* rm inline
9 months ago
Steve Luo
ed431de4e4
fix rmsnorm template function invocation problem(template function partial specialization is not allowed in Cpp) and luckily pass e2e precision test ( #5454 )
9 months ago
Hongxin Liu
f2e8b9ef9f
[devops] fix compatibility ( #5444 )
...
* [devops] fix compatibility
* [hotfix] update compatibility test on pr
* [devops] fix compatibility
* [devops] record duration during comp test
* [test] decrease test duration
* fix falcon
9 months ago
傅剑寒
6fd355a5a6
Merge pull request #5452 from Courtesy-Xs/fix_include_path
...
fix include path
9 months ago
xs_courtesy
c1c45e9d8e
fix include path
9 months ago
Steve Luo
b699f54007
optimize rmsnorm: add vectorized elementwise op, feat loop unrolling ( #5441 )
9 months ago
傅剑寒
368a2aa543
Merge pull request #5445 from Courtesy-Xs/refactor_infer_compilation
...
Refactor colossal-infer code arch
9 months ago
digger yu
385e85afd4
[hotfix] fix typo s/keywrods/keywords etc. ( #5429 )
9 months ago
xs_courtesy
095c070a6e
refactor code
9 months ago
Camille Zhong
da885ed540
fix tensor data update for gemini loss caluculation ( #5442 )
9 months ago
傅剑寒
21e1e3645c
Merge pull request #5435 from Courtesy-Xs/add_gpu_launch_config
...
Add query and other components
9 months ago
Runyu Lu
633e95b301
[doc] add doc
9 months ago
Runyu Lu
9dec66fad6
[fix] multi graphs capture error
9 months ago
Runyu Lu
b2c0d9ff2b
[fix] multi graphs capture error
9 months ago
Steve Luo
f7aecc0c6b
feat rmsnorm cuda kernel and add unittest, benchmark script ( #5417 )
9 months ago
xs_courtesy
5eb5ff1464
refactor code
9 months ago
xs_courtesy
01d289d8e5
Merge branch 'feature/colossal-infer' of https://github.com/hpcaitech/ColossalAI into add_gpu_launch_config
9 months ago
xs_courtesy
a46598ac59
add reusable utils for cuda
9 months ago
傅剑寒
2b28b54ac6
Merge pull request #5433 from Courtesy-Xs/add_silu_and_mul
...
【Inference】Add silu_and_mul for infer
9 months ago
Runyu Lu
cefaeb5fdd
[feat] cuda graph support and refactor non-functional api
9 months ago
Hongxin Liu
8020f42630
[release] update version ( #5411 )
9 months ago
xs_courtesy
95c21498d4
add silu_and_mul for infer
9 months ago
Camille Zhong
743e7fad2f
[colossal-llama2] add stream chat examlple for chat version model ( #5428 )
...
* add stream chat for chat version
* remove os.system clear
* modify function name
9 months ago
Youngon
68f55a709c
[hotfix] fix stable diffusion inference bug. ( #5289 )
...
* Update train_ddp.yaml
delete "strategy" to fix DDP config loading bug in "main.py"
* Update train_ddp.yaml
fix inference with scripts/txt2img.py config file load bug.
* Update README.md
add pretrain model test code.
9 months ago
hugo-syn
c8003d463b
[doc] Fix typo s/infered/inferred/ ( #5288 )
...
Signed-off-by: hugo-syn <hugo.vincent@synacktiv.com>
9 months ago
digger yu
5e1c93d732
[hotfix] fix typo change MoECheckpintIO to MoECheckpointIO ( #5335 )
...
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
9 months ago
Dongruixuan Li
a7ae2b5b4c
[eval-hotfix] set few_shot_data to None when few shot is disabled ( #5422 )
9 months ago
digger yu
049121d19d
[hotfix] fix typo change enabel to enable under colossalai/shardformer/ ( #5317 )
9 months ago
digger yu
16c96d4d8c
[hotfix] fix typo change _descrption to _description ( #5331 )
9 months ago
digger yu
70cce5cbed
[doc] update some translations with README-zh-Hans.md ( #5382 )
9 months ago
Luo Yihang
e239cf9060
[hotfix] fix typo of openmoe model source ( #5403 )
9 months ago
MickeyCHAN
e304e4db35
[hotfix] fix sd vit import error ( #5420 )
...
* fix import error
* Update dpt_depth.py
---------
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
9 months ago
Hongxin Liu
070df689e6
[devops] fix extention building ( #5427 )
9 months ago
binmakeswell
822241a99c
[doc] sora release ( #5425 )
...
* [doc] sora release
* [doc] sora release
* [doc] sora release
* [doc] sora release
9 months ago
flybird11111
29695cf70c
[example]add gpt2 benchmark example script. ( #5295 )
...
* benchmark gpt2
* fix
fix
fix
fix
* [doc] fix typo in Colossal-LLaMA-2/README.md (#5247 )
* [workflow] fixed build CI (#5240 )
* [workflow] fixed build CI
* polish
* polish
* polish
* polish
* polish
* [ci] fixed booster test (#5251 )
* [ci] fixed booster test
* [ci] fixed booster test
* [ci] fixed booster test
* [ci] fixed ddp test (#5254 )
* [ci] fixed ddp test
* polish
* fix typo in applications/ColossalEval/README.md (#5250 )
* [ci] fix shardformer tests. (#5255 )
* fix ci
fix
* revert: revert p2p
* feat: add enable_metadata_cache option
* revert: enable t5 tests
---------
Co-authored-by: Wenhao Chen <cwher@outlook.com>
* [doc] fix doc typo (#5256 )
* [doc] fix annotation display
* [doc] fix llama2 doc
* [hotfix]: add pp sanity check and fix mbs arg (#5268 )
* fix: fix misleading mbs arg
* feat: add pp sanity check
* fix: fix 1f1b sanity check
* [workflow] fixed incomplete bash command (#5272 )
* [workflow] fixed oom tests (#5275 )
* [workflow] fixed oom tests
* polish
* polish
* polish
* [ci] fix test_hybrid_parallel_plugin_checkpoint_io.py (#5276 )
* fix ci
fix
* fix test
* revert: revert p2p
* feat: add enable_metadata_cache option
* revert: enable t5 tests
* fix
---------
Co-authored-by: Wenhao Chen <cwher@outlook.com>
* [shardformer] hybridparallelplugin support gradients accumulation. (#5246 )
* support gradients acc
fix
fix
fix
fix
fix
fix
fix
fix
fix
fix
fix
fix
fix
* fix
fix
* fix
fix
fix
* [hotfix] Fix ShardFormer test execution path when using sequence parallelism (#5230 )
* fix auto loading gpt2 tokenizer (#5279 )
* [doc] add llama2-13B disyplay (#5285 )
* Update README.md
* fix 13b typo
---------
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
* fix llama pretrain (#5287 )
* fix
* fix
* fix
fix
* fix
fix
fix
* fix
fix
* benchmark gpt2
* fix
fix
fix
fix
* [workflow] fixed build CI (#5240 )
* [workflow] fixed build CI
* polish
* polish
* polish
* polish
* polish
* [ci] fixed booster test (#5251 )
* [ci] fixed booster test
* [ci] fixed booster test
* [ci] fixed booster test
* fix
fix
* fix
fix
fix
* fix
* fix
fix
fix
fix
fix
* fix
* Update shardformer.py
---------
Co-authored-by: digger yu <digger-yu@outlook.com>
Co-authored-by: Frank Lee <somerlee.9@gmail.com>
Co-authored-by: Wenhao Chen <cwher@outlook.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
Co-authored-by: Zhongkai Zhao <kanezz620@gmail.com>
Co-authored-by: Michelle <97082656+MichelleMa8@users.noreply.github.com>
Co-authored-by: Desperado-Jia <502205863@qq.com>
9 months ago
Frank Lee
593a72e4d5
Merge pull request #5424 from FrankLeeeee/sync/main
...
Sync/main
9 months ago
FrankLeeeee
0310b76e9d
Merge branch 'main' into sync/main
9 months ago
Camille Zhong
4b8312c08e
fix sft single turn inference example ( #5416 )
9 months ago
binmakeswell
a1c6cdb189
[doc] fix blog link
9 months ago
binmakeswell
5de940de32
[doc] fix blog link
9 months ago
Frank Lee
2461f37886
[workflow] added pypi channel ( #5412 )
9 months ago
Tong Li
a28c971516
update requirements ( #5407 )
9 months ago
yuehuayingxueluo
0aa27f1961
[Inference]Move benchmark-related code to the example directory. ( #5408 )
...
* move benchmark-related code to the example directory.
* fix bugs in test_fused_rotary_embedding.py
9 months ago
yuehuayingxueluo
600881a8ea
[Inference]Add CUDA KVCache Kernel ( #5406 )
...
* add cuda KVCache kernel
* annotation benchmark_kvcache_copy
* add use cuda
* fix import path
* move benchmark scripts to example/
* rm benchmark codes in test_kv_cache_memcpy.py
* rm redundancy codes
* rm redundancy codes
* pr was modified according to the review
9 months ago
flybird11111
0a25e16e46
[shardformer]gather llama logits ( #5398 )
...
* gather llama logits
* fix
9 months ago