Edenzzzz
61da3fbc52
fixed layout converter caching and updated tester
8 months ago
傅剑寒
e6496dd371
[Inference] Optimize request handler of llama ( #5512 )
...
* optimize request_handler
* fix ways of writing
8 months ago
Rocky Duan
cbe34c557c
Fix ColoTensorSpec for py11 ( #5440 )
8 months ago
Hongxin Liu
a7790a92e8
[devops] fix example test ci ( #5504 )
8 months ago
Yuanheng Zhao
131f32a076
[fix] fix grok-1 example typo ( #5506 )
8 months ago
flybird11111
0688d92e2d
[shardformer]Fix lm parallel. ( #5480 )
...
* fix
* padding vocab_size when using pipeline parallellism
padding vocab_size when using pipeline parallellism
fix
fix
* fix
* fix
fix
fix
* fix gather output
* fix
* fix
* fix
fix resize embedding
fix resize embedding
* fix resize embedding
fix
* revert
* revert
* revert
* fix lm forward distribution
* fix
* test ci
* fix
8 months ago
Runyu Lu
6251d68dc9
[fix] PR #5354 ( #5501 )
...
* [fix]
* [fix]
* Update config.py docstring
* [fix] docstring align
* [fix] docstring align
* [fix] docstring align
8 months ago
Runyu Lu
1d626233ce
Merge pull request #5434 from LRY89757/colossal-infer-cuda-graph
...
[feat] cuda graph support and refactor non-functional api
8 months ago
Runyu Lu
68e9396bc0
[fix] merge conflicts
8 months ago
binmakeswell
34e909256c
[release] grok-1 inference benchmark ( #5500 )
...
* [release] grok-1 inference benchmark
* [release] grok-1 inference benchmark
* [release] grok-1 inference benchmark
* [release] grok-1 inference benchmark
* [release] grok-1 inference benchmark
8 months ago
yuehuayingxueluo
87079cffe8
[Inference]Support FP16/BF16 Flash Attention 2 And Add high_precision Flag To Rotary Embedding ( #5461 )
...
* Support FP16/BF16 Flash Attention 2
* fix bugs in test_kv_cache_memcpy.py
* add context_kv_cache_memcpy_kernel.cu
* rm typename MT
* add tail process
* add high_precision
* add high_precision to config.py
* rm unused code
* change the comment for the high_precision parameter
* update test_rotary_embdding_unpad.py
* fix vector_copy_utils.h
* add comment for self.high_precision when using float32
8 months ago
Wenhao Chen
bb0a668fee
[hotfix] set return_outputs=False in examples and polish code ( #5404 )
...
* fix: simplify merge_batch
* fix: use return_outputs=False to eliminate extra memory consumption
* feat: add return_outputs warning
* style: remove `return_outputs=False` as it is the default value
8 months ago
Runyu Lu
ff4998c6f3
[fix] remove unused comment
8 months ago
Runyu Lu
9fe61b4475
[fix]
8 months ago
Yuanheng Zhao
5fcd7795cd
[example] update Grok-1 inference ( #5495 )
...
* revise grok-1 example
* remove unused arg in scripts
* prevent re-installing torch
* update readme
* revert modifying colossalai requirements
* add perf
* trivial
* add tokenizer url
8 months ago
binmakeswell
6df844b8c4
[release] grok-1 314b inference ( #5490 )
...
* [release] grok-1 inference
* [release] grok-1 inference
* [release] grok-1 inference
8 months ago
Hongxin Liu
848a574c26
[example] add grok-1 inference ( #5485 )
...
* [misc] add submodule
* remove submodule
* [example] support grok-1 tp inference
* [example] add grok-1 inference script
* [example] refactor code
* [example] add grok-1 readme
* [exmaple] add test ci
* [exmaple] update readme
8 months ago
Runyu Lu
5b017d6324
[fix]
8 months ago
Runyu Lu
606603bb88
Merge branch 'feature/colossal-infer' of https://github.com/hpcaitech/ColossalAI into colossal-infer-cuda-graph
8 months ago
Runyu Lu
4eafe0c814
[fix] unused option
8 months ago
binmakeswell
d158fc0e64
[doc] update open-sora demo ( #5479 )
...
* [doc] update open-sora demo
* [doc] update open-sora demo
* [doc] update open-sora demo
8 months ago
傅剑寒
7ff42cc06d
add vec_type_trait implementation ( #5473 )
8 months ago
傅剑寒
b96557b5e1
Merge pull request #5469 from Courtesy-Xs/add_vec_traits
...
Refactor vector utils
8 months ago
Runyu Lu
aabc9fb6aa
[feat] add use_cuda_kernel option
8 months ago
xs_courtesy
48c4f29b27
refactor vector utils
8 months ago
binmakeswell
bd998ced03
[doc] release Open-Sora 1.0 with model weights ( #5468 )
...
* [doc] release Open-Sora 1.0 with model weights
* [doc] release Open-Sora 1.0 with model weights
* [doc] release Open-Sora 1.0 with model weights
8 months ago
flybird11111
5e16bf7980
[shardformer] fix gathering output when using tensor parallelism ( #5431 )
...
* fix
* padding vocab_size when using pipeline parallellism
padding vocab_size when using pipeline parallellism
fix
fix
* fix
* fix
fix
fix
* fix gather output
* fix
* fix
* fix
fix resize embedding
fix resize embedding
* fix resize embedding
fix
* revert
* revert
* revert
8 months ago
傅剑寒
b6e9785885
Merge pull request #5457 from Courtesy-Xs/ly_add_implementation_for_launch_config
...
add implementatino for GetGPULaunchConfig1D
8 months ago
xs_courtesy
5724b9e31e
add some comments
8 months ago
Runyu Lu
6e30248683
[fix] tmp for test
8 months ago
xs_courtesy
388e043930
add implementatino for GetGPULaunchConfig1D
9 months ago
Runyu Lu
d02e257abd
Merge branch 'feature/colossal-infer' into colossal-infer-cuda-graph
9 months ago
Runyu Lu
ae24b4f025
diverse tests
9 months ago
Runyu Lu
1821a6dab0
[fix] pytest and fix dyn grid bug
9 months ago
yuehuayingxueluo
f366a5ea1f
[Inference/kernel]Add Fused Rotary Embedding and KVCache Memcopy CUDA Kernel ( #5418 )
...
* add rotary embedding kernel
* add rotary_embedding_kernel
* add fused rotary_emb and kvcache memcopy
* add fused_rotary_emb_and_cache_kernel.cu
* add fused_rotary_emb_and_memcopy
* fix bugs in fused_rotary_emb_and_cache_kernel.cu
* fix ci bugs
* use vec memcopy and opt the gloabl memory access
* fix code style
* fix test_rotary_embdding_unpad.py
* codes revised based on the review comments
* fix bugs about include path
* rm inline
9 months ago
Steve Luo
ed431de4e4
fix rmsnorm template function invocation problem(template function partial specialization is not allowed in Cpp) and luckily pass e2e precision test ( #5454 )
9 months ago
Hongxin Liu
f2e8b9ef9f
[devops] fix compatibility ( #5444 )
...
* [devops] fix compatibility
* [hotfix] update compatibility test on pr
* [devops] fix compatibility
* [devops] record duration during comp test
* [test] decrease test duration
* fix falcon
9 months ago
傅剑寒
6fd355a5a6
Merge pull request #5452 from Courtesy-Xs/fix_include_path
...
fix include path
9 months ago
xs_courtesy
c1c45e9d8e
fix include path
9 months ago
Steve Luo
b699f54007
optimize rmsnorm: add vectorized elementwise op, feat loop unrolling ( #5441 )
9 months ago
傅剑寒
368a2aa543
Merge pull request #5445 from Courtesy-Xs/refactor_infer_compilation
...
Refactor colossal-infer code arch
9 months ago
digger yu
385e85afd4
[hotfix] fix typo s/keywrods/keywords etc. ( #5429 )
9 months ago
xs_courtesy
095c070a6e
refactor code
9 months ago
Camille Zhong
da885ed540
fix tensor data update for gemini loss caluculation ( #5442 )
9 months ago
傅剑寒
21e1e3645c
Merge pull request #5435 from Courtesy-Xs/add_gpu_launch_config
...
Add query and other components
9 months ago
Runyu Lu
633e95b301
[doc] add doc
9 months ago
Runyu Lu
9dec66fad6
[fix] multi graphs capture error
9 months ago
Runyu Lu
b2c0d9ff2b
[fix] multi graphs capture error
9 months ago
Steve Luo
f7aecc0c6b
feat rmsnorm cuda kernel and add unittest, benchmark script ( #5417 )
9 months ago
xs_courtesy
5eb5ff1464
refactor code
9 months ago