ColossalAI

Commit Graph

Author	SHA1	Message	Date
yuehuayingxueluo	249644c23b	[Inference]Repalce Attention layer and MLP layer by shardformer to optimize the weight transpose operation，add fused_qkv and fused linear_add (#5340 ) * add fused qkv * replace attn and mlp by shardformer * fix bugs in mlp * add docstrings * fix test_inference_engine.py * add optimize unbind * add fused_addmm * rm squeeze(1) * refactor codes * fix ci bugs * rename ShardFormerLlamaMLP and ShardFormerLlamaAttention * Removed the dependency on LlamaFlashAttention2 * rollback test_inference_engine.py	10 months ago
Frank Lee	f8e456d202	[inference] simplified config verification (#5346 ) * [inference] simplified config verification * polish * polish	10 months ago
Jianghai	df0aa49585	[Inference] Kernel Fusion, fused copy kv cache into rotary embedding (#5336 ) * revise rotary embedding * remove useless print * adapt	10 months ago
Frank Lee	1336838a91	Merge pull request #5339 from FrankLeeeee/sync/merge-main Sync/merge main	10 months ago
FrankLeeeee	c565519913	merge commit	10 months ago
Yuanheng Zhao	5f98a9d68a	[Infer] Optimize Blocked KVCache And Kernels Using It (#5325 ) * revise shape of kvcache (context attn kernel) * revise shape of kvcache (flash decoding kernel) * revise shape of kvcache (kvcache copy) and attn func * init of kvcache in kvcache manager * revise llama modeling * revise block size retrieval * use torch for rms_norm benchmarking * revise block size retrieval	10 months ago
yuehuayingxueluo	e8f0642f28	[Inference]Add Nopadding Llama Modeling (#5327 ) * add nopadding llama modeling * add nopadding_llama.py * rm unused codes * fix bugs in test_xine_copy.py * fix code style	10 months ago
digger yu	71321a07cf	fix typo change dosen't to doesn't (#5308 )	10 months ago
digger yu	6a3086a505	fix typo under extensions/ (#5330 )	10 months ago
Frank Lee	febed23288	[doc] added docs for extensions (#5324 ) * [doc] added docs for extensions * polish * polish	10 months ago
flybird11111	388179f966	[tests] fix t5 test. (#5322 ) * [ci] fix shardformer tests. (#5255) * fix ci fix * revert: revert p2p * feat: add enable_metadata_cache option * revert: enable t5 tests --------- Co-authored-by: Wenhao Chen <cwher@outlook.com> * fix t5 test --------- Co-authored-by: Wenhao Chen <cwher@outlook.com>	10 months ago
Jianghai	c7c104cb7c	[DOC] Update inference readme (#5280 ) * add readme * add readme * 1 * update engine * finish readme * add readme	10 months ago
Frank Lee	a6709afe66	Merge pull request #5321 from FrankLeeeee/hotfix/accelerator-api [accelerator] fixed npu api	10 months ago
FrankLeeeee	087d0cb1fc	[accelerator] fixed npu api	10 months ago
Frank Lee	8823cc4831	Merge pull request #5310 from hpcaitech/feature/npu Feature/npu	10 months ago
Frank Lee	73f4dc578e	[workflow] updated CI image (#5318 )	10 months ago
Jianghai	1f8a75d470	[Inference] Update rms norm kernel, benchmark with vLLM (#5315 ) * add * xi * del * del * fix	10 months ago
Jianghai	7ddd8b37f0	fix (#5311 )	10 months ago
yuehuayingxueluo	4f28cb43c0	[inference]Optimize the usage of the mid tensors space in flash attn (#5304 ) * opt flash attn * opt tmp tensor * fix benchmark_llama * fix code style * fix None logic for output tensor * fix adapted to get_xine_cache * add comment * fix ci bugs * fix some codes * rm duplicated codes * rm duplicated codes * fix code style * add _get_dtype in config.py	10 months ago
Frank Lee	7cfed5f076	[feat] refactored extension module (#5298 ) * [feat] refactored extension module * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish	10 months ago
digger yu	bce9499ed3	fix some typo (#5307 )	10 months ago
李文军	ec912b1ba9	[NFC] polish applications/Colossal-LLaMA-2/colossal_llama2/tokenizer/init_tokenizer.py code style (#5228 )	10 months ago
Yuanheng Zhao	af8359c430	[hotfix] fix boundary check in batch (#5306 )	10 months ago
Jianghai	c647e00e3c	[Inference]Add fused rotary kernel and get cos cache kernel (#5302 ) * add fused rotary and get cos cache func * staged * fix bugs * fix bugs	10 months ago
Yuanheng Zhao	3da9993b0d	[Kernel/Fix] Revise flash attention triton kernel API and add benchmark (#5301 ) * fix decoding kernel pytest * revise and add triton context attn benchmark	10 months ago
Jianghai	8e606ecc7e	[Inference] Benchmarking rotary embedding and add a fetch function (#5277 ) * fix bugs and add a cos/sin cache fetch func * add docstring * fix bug * fix	10 months ago
Desperado-Jia	ddf879e2db	fix bug for mefture (#5299 )	10 months ago
yuehuayingxueluo	b7853196a0	Merge pull request #5297 from yuehuayingxueluo/fix_rotary_embedding [Inference/fix]Add utils.py for Rotary Embedding	10 months ago
yuehuayingxueluo	cea9c86e45	add utils.py	10 months ago
Hongxin Liu	d7f8db8e21	[hotfix] fix 3d plugin test (#5292 )	10 months ago
yuehuayingxueluo	bfff9254ac	[inference] Adapted to Rotary Embedding and RMS Norm (#5283 ) * adapted to rotary_embedding * adapted to nopad rms norm * fix bugs in benchmark * fix flash_decoding.py	10 months ago
flybird11111	f7e3f82a7e	fix llama pretrain (#5287 )	10 months ago
Desperado-Jia	6a56967855	[doc] add llama2-13B disyplay (#5285 ) * Update README.md * fix 13b typo --------- Co-authored-by: binmakeswell <binmakeswell@gmail.com>	10 months ago
Yuanheng Zhao	6e487e7d3c	[kernel/fix] Performance Optimization for Decoding Kernel and Benchmarking (#5274 ) * prevent re-creating intermediate tensors * add singleton class holding intermediate values * fix triton kernel api * add benchmark in pytest * fix kernel api and add benchmark * revise flash decoding triton kernel in/out shapes * fix calling of triton kernel in modeling * fix pytest: extract to util functions	10 months ago
Jianghai	9e2342bde2	[Hotfix] Fix bugs in testing continuous batching (#5270 ) * fix bug * fix bugs * fix bugs * fix bugs and add padding * add funcs and fix bugs * fix typos * fix bugs * add func	10 months ago
Michelle	32cb74493a	fix auto loading gpt2 tokenizer (#5279 )	10 months ago
Frank Lee	d66e6988bc	Merge pull request #5278 from ver217/sync/npu [sync] sync npu branch with main	10 months ago
ver217	148469348a	Merge branch 'main' into sync/npu	10 months ago
Yaozheng Fang	5ae9099f92	[kernel] Add RMSLayerNorm triton kernel (#5262 ) * add layerrmsnorm triton kernel * add layerrmsnorm kernel * modify the atol and rtol in test file * Remove the logics of mean computations, and update the name of ther kernel functions and files * add benchmark of rms norm	10 months ago
Zhongkai Zhao	5d9a0ae75b	[hotfix] Fix ShardFormer test execution path when using sequence parallelism (#5230 )	11 months ago
yuehuayingxueluo	86b63f720c	[Inference]Adapted to the triton attn kernels (#5264 ) * adapted to the triton attn kernels * fix pad input * adapted to copy_kv_to_blocked_cache * fix ci test * update kv memcpy * remove print	11 months ago
flybird11111	46e091651b	[shardformer] hybridparallelplugin support gradients accumulation. (#5246 ) * support gradients acc fix fix fix fix fix fix fix fix fix fix fix fix fix * fix fix * fix fix fix	11 months ago
flybird11111	2a0558d8ec	[ci] fix test_hybrid_parallel_plugin_checkpoint_io.py (#5276 ) * fix ci fix * fix test * revert: revert p2p * feat: add enable_metadata_cache option * revert: enable t5 tests * fix --------- Co-authored-by: Wenhao Chen <cwher@outlook.com>	11 months ago
Frank Lee	d69cd2eb89	[workflow] fixed oom tests (#5275 ) * [workflow] fixed oom tests * polish * polish * polish	11 months ago
Yuanheng Zhao	0f2b46a41c	[kernel] Revise KVCache copy triton kernel API (#5273 ) * [kernel/fix] revise kvcache copy kernel api * fix benchmark	11 months ago
Frank Lee	04244aaaf1	[workflow] fixed incomplete bash command (#5272 )	11 months ago
Jianghai	d8db500efc	[Inference] Fix request handler and add recycle logic (#5260 ) * fix request handler * fix comment	11 months ago
Frank Lee	c597678da4	[doc] updated inference readme (#5269 )	11 months ago
Yuanheng Zhao	fa85e02b3b	[kernel] Add KV cache copy kernel during decoding (#5261 ) * add kv copy triton kernel during decoding stage * add pytest and fix kernel * fix test utilities * revise kernel config * add benchmark for kvcache copy	11 months ago
Wenhao Chen	ef4f0ee854	[hotfix]: add pp sanity check and fix mbs arg (#5268 ) * fix: fix misleading mbs arg * feat: add pp sanity check * fix: fix 1f1b sanity check	11 months ago

1 2 3 4 5 ...

3027 Commits (249644c23b0402ccf9d0908f13ed15b41b95145f) All Branches Search

3027 Commits (249644c23b0402ccf9d0908f13ed15b41b95145f)

All Branches