ColossalAI

Commit Graph

Author	SHA1	Message	Date
Edenzzzz	61da3fbc52	fixed layout converter caching and updated tester	8 months ago
傅剑寒	e6496dd371	[Inference] Optimize request handler of llama (#5512 ) * optimize request_handler * fix ways of writing	8 months ago
Rocky Duan	cbe34c557c	Fix ColoTensorSpec for py11 (#5440 )	8 months ago
Hongxin Liu	a7790a92e8	[devops] fix example test ci (#5504 )	8 months ago
Yuanheng Zhao	131f32a076	[fix] fix grok-1 example typo (#5506 )	8 months ago
flybird11111	0688d92e2d	[shardformer]Fix lm parallel. (#5480 ) * fix * padding vocab_size when using pipeline parallellism padding vocab_size when using pipeline parallellism fix fix * fix * fix fix fix * fix gather output * fix * fix * fix fix resize embedding fix resize embedding * fix resize embedding fix * revert * revert * revert * fix lm forward distribution * fix * test ci * fix	8 months ago
Runyu Lu	6251d68dc9	[fix] PR #5354 (#5501 ) * [fix] * [fix] * Update config.py docstring * [fix] docstring align * [fix] docstring align * [fix] docstring align	8 months ago
Runyu Lu	1d626233ce	Merge pull request #5434 from LRY89757/colossal-infer-cuda-graph [feat] cuda graph support and refactor non-functional api	8 months ago
Runyu Lu	68e9396bc0	[fix] merge conflicts	8 months ago
binmakeswell	34e909256c	[release] grok-1 inference benchmark (#5500 ) * [release] grok-1 inference benchmark * [release] grok-1 inference benchmark * [release] grok-1 inference benchmark * [release] grok-1 inference benchmark * [release] grok-1 inference benchmark	8 months ago
yuehuayingxueluo	87079cffe8	[Inference]Support FP16/BF16 Flash Attention 2 And Add high_precision Flag To Rotary Embedding (#5461 ) * Support FP16/BF16 Flash Attention 2 * fix bugs in test_kv_cache_memcpy.py * add context_kv_cache_memcpy_kernel.cu * rm typename MT * add tail process * add high_precision * add high_precision to config.py * rm unused code * change the comment for the high_precision parameter * update test_rotary_embdding_unpad.py * fix vector_copy_utils.h * add comment for self.high_precision when using float32	8 months ago
Wenhao Chen	bb0a668fee	[hotfix] set return_outputs=False in examples and polish code (#5404 ) * fix: simplify merge_batch * fix: use return_outputs=False to eliminate extra memory consumption * feat: add return_outputs warning * style: remove `return_outputs=False` as it is the default value	8 months ago
Runyu Lu	ff4998c6f3	[fix] remove unused comment	8 months ago
Runyu Lu	9fe61b4475	[fix]	8 months ago
Yuanheng Zhao	5fcd7795cd	[example] update Grok-1 inference (#5495 ) * revise grok-1 example * remove unused arg in scripts * prevent re-installing torch * update readme * revert modifying colossalai requirements * add perf * trivial * add tokenizer url	8 months ago
binmakeswell	6df844b8c4	[release] grok-1 314b inference (#5490 ) * [release] grok-1 inference * [release] grok-1 inference * [release] grok-1 inference	8 months ago
Hongxin Liu	848a574c26	[example] add grok-1 inference (#5485 ) * [misc] add submodule * remove submodule * [example] support grok-1 tp inference * [example] add grok-1 inference script * [example] refactor code * [example] add grok-1 readme * [exmaple] add test ci * [exmaple] update readme	8 months ago
Runyu Lu	5b017d6324	[fix]	8 months ago
Runyu Lu	606603bb88	Merge branch 'feature/colossal-infer' of https://github.com/hpcaitech/ColossalAI into colossal-infer-cuda-graph	8 months ago
Runyu Lu	4eafe0c814	[fix] unused option	8 months ago
binmakeswell	d158fc0e64	[doc] update open-sora demo (#5479 ) * [doc] update open-sora demo * [doc] update open-sora demo * [doc] update open-sora demo	8 months ago
傅剑寒	7ff42cc06d	add vec_type_trait implementation (#5473 )	8 months ago
傅剑寒	b96557b5e1	Merge pull request #5469 from Courtesy-Xs/add_vec_traits Refactor vector utils	8 months ago
Runyu Lu	aabc9fb6aa	[feat] add use_cuda_kernel option	8 months ago
xs_courtesy	48c4f29b27	refactor vector utils	8 months ago
binmakeswell	bd998ced03	[doc] release Open-Sora 1.0 with model weights (#5468 ) * [doc] release Open-Sora 1.0 with model weights * [doc] release Open-Sora 1.0 with model weights * [doc] release Open-Sora 1.0 with model weights	9 months ago
flybird11111	5e16bf7980	[shardformer] fix gathering output when using tensor parallelism (#5431 ) * fix * padding vocab_size when using pipeline parallellism padding vocab_size when using pipeline parallellism fix fix * fix * fix fix fix * fix gather output * fix * fix * fix fix resize embedding fix resize embedding * fix resize embedding fix * revert * revert * revert	9 months ago
傅剑寒	b6e9785885	Merge pull request #5457 from Courtesy-Xs/ly_add_implementation_for_launch_config add implementatino for GetGPULaunchConfig1D	9 months ago
xs_courtesy	5724b9e31e	add some comments	9 months ago
Runyu Lu	6e30248683	[fix] tmp for test	9 months ago
xs_courtesy	388e043930	add implementatino for GetGPULaunchConfig1D	9 months ago
Runyu Lu	d02e257abd	Merge branch 'feature/colossal-infer' into colossal-infer-cuda-graph	9 months ago
Runyu Lu	ae24b4f025	diverse tests	9 months ago
Runyu Lu	1821a6dab0	[fix] pytest and fix dyn grid bug	9 months ago
yuehuayingxueluo	f366a5ea1f	[Inference/kernel]Add Fused Rotary Embedding and KVCache Memcopy CUDA Kernel (#5418 ) * add rotary embedding kernel * add rotary_embedding_kernel * add fused rotary_emb and kvcache memcopy * add fused_rotary_emb_and_cache_kernel.cu * add fused_rotary_emb_and_memcopy * fix bugs in fused_rotary_emb_and_cache_kernel.cu * fix ci bugs * use vec memcopy and opt the gloabl memory access * fix code style * fix test_rotary_embdding_unpad.py * codes revised based on the review comments * fix bugs about include path * rm inline	9 months ago
Steve Luo	ed431de4e4	fix rmsnorm template function invocation problem(template function partial specialization is not allowed in Cpp) and luckily pass e2e precision test (#5454 )	9 months ago
Hongxin Liu	f2e8b9ef9f	[devops] fix compatibility (#5444 ) * [devops] fix compatibility * [hotfix] update compatibility test on pr * [devops] fix compatibility * [devops] record duration during comp test * [test] decrease test duration * fix falcon	9 months ago
傅剑寒	6fd355a5a6	Merge pull request #5452 from Courtesy-Xs/fix_include_path fix include path	9 months ago
xs_courtesy	c1c45e9d8e	fix include path	9 months ago
Steve Luo	b699f54007	optimize rmsnorm: add vectorized elementwise op, feat loop unrolling (#5441 )	9 months ago
傅剑寒	368a2aa543	Merge pull request #5445 from Courtesy-Xs/refactor_infer_compilation Refactor colossal-infer code arch	9 months ago
digger yu	385e85afd4	[hotfix] fix typo s/keywrods/keywords etc. (#5429 )	9 months ago
xs_courtesy	095c070a6e	refactor code	9 months ago
Camille Zhong	da885ed540	fix tensor data update for gemini loss caluculation (#5442 )	9 months ago
傅剑寒	21e1e3645c	Merge pull request #5435 from Courtesy-Xs/add_gpu_launch_config Add query and other components	9 months ago
Runyu Lu	633e95b301	[doc] add doc	9 months ago
Runyu Lu	9dec66fad6	[fix] multi graphs capture error	9 months ago
Runyu Lu	b2c0d9ff2b	[fix] multi graphs capture error	9 months ago
Steve Luo	f7aecc0c6b	feat rmsnorm cuda kernel and add unittest, benchmark script (#5417 )	9 months ago
xs_courtesy	5eb5ff1464	refactor code	9 months ago

... 2 3 4 5 6 ...

3307 Commits (58ad76d4665032bbe548d066116d1c572ce98979) All Branches Search

3307 Commits (58ad76d4665032bbe548d066116d1c572ce98979)

All Branches