binmakeswell
6df844b8c4
[release] grok-1 314b inference ( #5490 )
...
* [release] grok-1 inference
* [release] grok-1 inference
* [release] grok-1 inference
8 months ago
Hongxin Liu
848a574c26
[example] add grok-1 inference ( #5485 )
...
* [misc] add submodule
* remove submodule
* [example] support grok-1 tp inference
* [example] add grok-1 inference script
* [example] refactor code
* [example] add grok-1 readme
* [exmaple] add test ci
* [exmaple] update readme
8 months ago
Runyu Lu
5b017d6324
[fix]
8 months ago
Runyu Lu
606603bb88
Merge branch 'feature/colossal-infer' of https://github.com/hpcaitech/ColossalAI into colossal-infer-cuda-graph
8 months ago
Runyu Lu
4eafe0c814
[fix] unused option
8 months ago
binmakeswell
d158fc0e64
[doc] update open-sora demo ( #5479 )
...
* [doc] update open-sora demo
* [doc] update open-sora demo
* [doc] update open-sora demo
8 months ago
傅剑寒
7ff42cc06d
add vec_type_trait implementation ( #5473 )
8 months ago
傅剑寒
b96557b5e1
Merge pull request #5469 from Courtesy-Xs/add_vec_traits
...
Refactor vector utils
8 months ago
Runyu Lu
aabc9fb6aa
[feat] add use_cuda_kernel option
8 months ago
xs_courtesy
48c4f29b27
refactor vector utils
8 months ago
binmakeswell
bd998ced03
[doc] release Open-Sora 1.0 with model weights ( #5468 )
...
* [doc] release Open-Sora 1.0 with model weights
* [doc] release Open-Sora 1.0 with model weights
* [doc] release Open-Sora 1.0 with model weights
8 months ago
flybird11111
5e16bf7980
[shardformer] fix gathering output when using tensor parallelism ( #5431 )
...
* fix
* padding vocab_size when using pipeline parallellism
padding vocab_size when using pipeline parallellism
fix
fix
* fix
* fix
fix
fix
* fix gather output
* fix
* fix
* fix
fix resize embedding
fix resize embedding
* fix resize embedding
fix
* revert
* revert
* revert
8 months ago
傅剑寒
b6e9785885
Merge pull request #5457 from Courtesy-Xs/ly_add_implementation_for_launch_config
...
add implementatino for GetGPULaunchConfig1D
8 months ago
xs_courtesy
5724b9e31e
add some comments
8 months ago
Runyu Lu
6e30248683
[fix] tmp for test
9 months ago
xs_courtesy
388e043930
add implementatino for GetGPULaunchConfig1D
9 months ago
Runyu Lu
d02e257abd
Merge branch 'feature/colossal-infer' into colossal-infer-cuda-graph
9 months ago
Runyu Lu
ae24b4f025
diverse tests
9 months ago
Runyu Lu
1821a6dab0
[fix] pytest and fix dyn grid bug
9 months ago
yuehuayingxueluo
f366a5ea1f
[Inference/kernel]Add Fused Rotary Embedding and KVCache Memcopy CUDA Kernel ( #5418 )
...
* add rotary embedding kernel
* add rotary_embedding_kernel
* add fused rotary_emb and kvcache memcopy
* add fused_rotary_emb_and_cache_kernel.cu
* add fused_rotary_emb_and_memcopy
* fix bugs in fused_rotary_emb_and_cache_kernel.cu
* fix ci bugs
* use vec memcopy and opt the gloabl memory access
* fix code style
* fix test_rotary_embdding_unpad.py
* codes revised based on the review comments
* fix bugs about include path
* rm inline
9 months ago
Steve Luo
ed431de4e4
fix rmsnorm template function invocation problem(template function partial specialization is not allowed in Cpp) and luckily pass e2e precision test ( #5454 )
9 months ago
Hongxin Liu
f2e8b9ef9f
[devops] fix compatibility ( #5444 )
...
* [devops] fix compatibility
* [hotfix] update compatibility test on pr
* [devops] fix compatibility
* [devops] record duration during comp test
* [test] decrease test duration
* fix falcon
9 months ago
傅剑寒
6fd355a5a6
Merge pull request #5452 from Courtesy-Xs/fix_include_path
...
fix include path
9 months ago
xs_courtesy
c1c45e9d8e
fix include path
9 months ago
Steve Luo
b699f54007
optimize rmsnorm: add vectorized elementwise op, feat loop unrolling ( #5441 )
9 months ago
傅剑寒
368a2aa543
Merge pull request #5445 from Courtesy-Xs/refactor_infer_compilation
...
Refactor colossal-infer code arch
9 months ago
digger yu
385e85afd4
[hotfix] fix typo s/keywrods/keywords etc. ( #5429 )
9 months ago
xs_courtesy
095c070a6e
refactor code
9 months ago
Camille Zhong
da885ed540
fix tensor data update for gemini loss caluculation ( #5442 )
9 months ago
傅剑寒
21e1e3645c
Merge pull request #5435 from Courtesy-Xs/add_gpu_launch_config
...
Add query and other components
9 months ago
Runyu Lu
633e95b301
[doc] add doc
9 months ago
Runyu Lu
9dec66fad6
[fix] multi graphs capture error
9 months ago
Runyu Lu
b2c0d9ff2b
[fix] multi graphs capture error
9 months ago
Steve Luo
f7aecc0c6b
feat rmsnorm cuda kernel and add unittest, benchmark script ( #5417 )
9 months ago
xs_courtesy
5eb5ff1464
refactor code
9 months ago
xs_courtesy
01d289d8e5
Merge branch 'feature/colossal-infer' of https://github.com/hpcaitech/ColossalAI into add_gpu_launch_config
9 months ago
xs_courtesy
a46598ac59
add reusable utils for cuda
9 months ago
傅剑寒
2b28b54ac6
Merge pull request #5433 from Courtesy-Xs/add_silu_and_mul
...
【Inference】Add silu_and_mul for infer
9 months ago
Runyu Lu
cefaeb5fdd
[feat] cuda graph support and refactor non-functional api
9 months ago
Hongxin Liu
8020f42630
[release] update version ( #5411 )
9 months ago
xs_courtesy
95c21498d4
add silu_and_mul for infer
9 months ago
Camille Zhong
743e7fad2f
[colossal-llama2] add stream chat examlple for chat version model ( #5428 )
...
* add stream chat for chat version
* remove os.system clear
* modify function name
9 months ago
Youngon
68f55a709c
[hotfix] fix stable diffusion inference bug. ( #5289 )
...
* Update train_ddp.yaml
delete "strategy" to fix DDP config loading bug in "main.py"
* Update train_ddp.yaml
fix inference with scripts/txt2img.py config file load bug.
* Update README.md
add pretrain model test code.
9 months ago
hugo-syn
c8003d463b
[doc] Fix typo s/infered/inferred/ ( #5288 )
...
Signed-off-by: hugo-syn <hugo.vincent@synacktiv.com>
9 months ago
digger yu
5e1c93d732
[hotfix] fix typo change MoECheckpintIO to MoECheckpointIO ( #5335 )
...
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
9 months ago
Dongruixuan Li
a7ae2b5b4c
[eval-hotfix] set few_shot_data to None when few shot is disabled ( #5422 )
9 months ago
digger yu
049121d19d
[hotfix] fix typo change enabel to enable under colossalai/shardformer/ ( #5317 )
9 months ago
digger yu
16c96d4d8c
[hotfix] fix typo change _descrption to _description ( #5331 )
9 months ago
digger yu
70cce5cbed
[doc] update some translations with README-zh-Hans.md ( #5382 )
9 months ago
Luo Yihang
e239cf9060
[hotfix] fix typo of openmoe model source ( #5403 )
9 months ago