ColossalAI

Commit Graph

Author	SHA1	Message	Date
digger yu	6a3086a505	fix typo under extensions/ (#5330 )	2024-01-30 09:55:16 +08:00
Frank Lee	febed23288	[doc] added docs for extensions (#5324 ) * [doc] added docs for extensions * polish * polish	2024-01-29 17:39:23 +08:00
flybird11111	388179f966	[tests] fix t5 test. (#5322 ) * [ci] fix shardformer tests. (#5255) * fix ci fix * revert: revert p2p * feat: add enable_metadata_cache option * revert: enable t5 tests --------- Co-authored-by: Wenhao Chen <cwher@outlook.com> * fix t5 test --------- Co-authored-by: Wenhao Chen <cwher@outlook.com>	2024-01-29 17:38:46 +08:00
Jianghai	c7c104cb7c	[DOC] Update inference readme (#5280 ) * add readme * add readme * 1 * update engine * finish readme * add readme	2024-01-29 16:21:06 +08:00
Frank Lee	a6709afe66	Merge pull request #5321 from FrankLeeeee/hotfix/accelerator-api [accelerator] fixed npu api	2024-01-29 14:29:58 +08:00
FrankLeeeee	087d0cb1fc	[accelerator] fixed npu api	2024-01-29 14:27:52 +08:00
Frank Lee	8823cc4831	Merge pull request #5310 from hpcaitech/feature/npu Feature/npu	2024-01-29 13:49:39 +08:00
Frank Lee	73f4dc578e	[workflow] updated CI image (#5318 )	2024-01-29 11:53:07 +08:00
Jianghai	1f8a75d470	[Inference] Update rms norm kernel, benchmark with vLLM (#5315 ) * add * xi * del * del * fix	2024-01-29 10:22:33 +08:00
Jianghai	7ddd8b37f0	fix (#5311 )	2024-01-26 15:02:12 +08:00
yuehuayingxueluo	4f28cb43c0	[inference]Optimize the usage of the mid tensors space in flash attn (#5304 ) * opt flash attn * opt tmp tensor * fix benchmark_llama * fix code style * fix None logic for output tensor * fix adapted to get_xine_cache * add comment * fix ci bugs * fix some codes * rm duplicated codes * rm duplicated codes * fix code style * add _get_dtype in config.py	2024-01-26 14:00:10 +08:00
Frank Lee	7cfed5f076	[feat] refactored extension module (#5298 ) * [feat] refactored extension module * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish	2024-01-25 17:01:48 +08:00
digger yu	bce9499ed3	fix some typo (#5307 )	2024-01-25 13:56:27 +08:00
李文军	ec912b1ba9	[NFC] polish applications/Colossal-LLaMA-2/colossal_llama2/tokenizer/init_tokenizer.py code style (#5228 )	2024-01-25 13:14:48 +08:00
Yuanheng Zhao	af8359c430	[hotfix] fix boundary check in batch (#5306 )	2024-01-25 10:23:12 +08:00
Jianghai	c647e00e3c	[Inference]Add fused rotary kernel and get cos cache kernel (#5302 ) * add fused rotary and get cos cache func * staged * fix bugs * fix bugs	2024-01-24 16:20:42 +08:00
Yuanheng Zhao	3da9993b0d	[Kernel/Fix] Revise flash attention triton kernel API and add benchmark (#5301 ) * fix decoding kernel pytest * revise and add triton context attn benchmark	2024-01-23 17:16:02 +08:00
Jianghai	8e606ecc7e	[Inference] Benchmarking rotary embedding and add a fetch function (#5277 ) * fix bugs and add a cos/sin cache fetch func * add docstring * fix bug * fix	2024-01-23 12:11:53 +08:00
Desperado-Jia	ddf879e2db	fix bug for mefture (#5299 )	2024-01-22 22:17:54 +08:00
yuehuayingxueluo	b7853196a0	Merge pull request #5297 from yuehuayingxueluo/fix_rotary_embedding [Inference/fix]Add utils.py for Rotary Embedding	2024-01-22 17:07:14 +08:00
yuehuayingxueluo	cea9c86e45	add utils.py	2024-01-22 16:06:27 +08:00
Hongxin Liu	d7f8db8e21	[hotfix] fix 3d plugin test (#5292 )	2024-01-22 15:19:04 +08:00
yuehuayingxueluo	bfff9254ac	[inference] Adapted to Rotary Embedding and RMS Norm (#5283 ) * adapted to rotary_embedding * adapted to nopad rms norm * fix bugs in benchmark * fix flash_decoding.py	2024-01-22 10:55:34 +08:00
flybird11111	f7e3f82a7e	fix llama pretrain (#5287 )	2024-01-19 17:49:02 +08:00
Desperado-Jia	6a56967855	[doc] add llama2-13B disyplay (#5285 ) * Update README.md * fix 13b typo --------- Co-authored-by: binmakeswell <binmakeswell@gmail.com>	2024-01-19 16:04:08 +08:00
Yuanheng Zhao	6e487e7d3c	[kernel/fix] Performance Optimization for Decoding Kernel and Benchmarking (#5274 ) * prevent re-creating intermediate tensors * add singleton class holding intermediate values * fix triton kernel api * add benchmark in pytest * fix kernel api and add benchmark * revise flash decoding triton kernel in/out shapes * fix calling of triton kernel in modeling * fix pytest: extract to util functions	2024-01-19 15:47:16 +08:00
Jianghai	9e2342bde2	[Hotfix] Fix bugs in testing continuous batching (#5270 ) * fix bug * fix bugs * fix bugs * fix bugs and add padding * add funcs and fix bugs * fix typos * fix bugs * add func	2024-01-18 16:31:14 +08:00
Michelle	32cb74493a	fix auto loading gpt2 tokenizer (#5279 )	2024-01-18 14:08:29 +08:00
Frank Lee	d66e6988bc	Merge pull request #5278 from ver217/sync/npu [sync] sync npu branch with main	2024-01-18 13:11:45 +08:00
ver217	148469348a	Merge branch 'main' into sync/npu	2024-01-18 12:05:21 +08:00
Yaozheng Fang	5ae9099f92	[kernel] Add RMSLayerNorm triton kernel (#5262 ) * add layerrmsnorm triton kernel * add layerrmsnorm kernel * modify the atol and rtol in test file * Remove the logics of mean computations, and update the name of ther kernel functions and files * add benchmark of rms norm	2024-01-18 10:21:03 +08:00
Zhongkai Zhao	5d9a0ae75b	[hotfix] Fix ShardFormer test execution path when using sequence parallelism (#5230 )	2024-01-17 17:42:29 +08:00
yuehuayingxueluo	86b63f720c	[Inference]Adapted to the triton attn kernels (#5264 ) * adapted to the triton attn kernels * fix pad input * adapted to copy_kv_to_blocked_cache * fix ci test * update kv memcpy * remove print	2024-01-17 16:03:10 +08:00
flybird11111	46e091651b	[shardformer] hybridparallelplugin support gradients accumulation. (#5246 ) * support gradients acc fix fix fix fix fix fix fix fix fix fix fix fix fix * fix fix * fix fix fix	2024-01-17 15:22:33 +08:00
flybird11111	2a0558d8ec	[ci] fix test_hybrid_parallel_plugin_checkpoint_io.py (#5276 ) * fix ci fix * fix test * revert: revert p2p * feat: add enable_metadata_cache option * revert: enable t5 tests * fix --------- Co-authored-by: Wenhao Chen <cwher@outlook.com>	2024-01-17 13:38:55 +08:00
Frank Lee	d69cd2eb89	[workflow] fixed oom tests (#5275 ) * [workflow] fixed oom tests * polish * polish * polish	2024-01-16 18:55:13 +08:00
Yuanheng Zhao	0f2b46a41c	[kernel] Revise KVCache copy triton kernel API (#5273 ) * [kernel/fix] revise kvcache copy kernel api * fix benchmark	2024-01-16 14:41:02 +08:00
Frank Lee	04244aaaf1	[workflow] fixed incomplete bash command (#5272 )	2024-01-16 11:54:44 +08:00
Jianghai	d8db500efc	[Inference] Fix request handler and add recycle logic (#5260 ) * fix request handler * fix comment	2024-01-15 17:50:46 +08:00
Frank Lee	c597678da4	[doc] updated inference readme (#5269 )	2024-01-15 17:37:41 +08:00
Yuanheng Zhao	fa85e02b3b	[kernel] Add KV cache copy kernel during decoding (#5261 ) * add kv copy triton kernel during decoding stage * add pytest and fix kernel * fix test utilities * revise kernel config * add benchmark for kvcache copy	2024-01-15 17:37:20 +08:00
Wenhao Chen	ef4f0ee854	[hotfix]: add pp sanity check and fix mbs arg (#5268 ) * fix: fix misleading mbs arg * feat: add pp sanity check * fix: fix 1f1b sanity check	2024-01-15 15:57:40 +08:00
FrankLeeeee	1ded7e81ef	[git] fixed rebased files	2024-01-11 13:50:45 +00:00
Yuanheng Zhao	1513f20f4d	[kernel] Add flash decoding triton kernel for blocked kv cache (#5249 ) * add flash decoding unpad triton kernel * rename flash decoding kernel * add kernel testing (draft) * revise pytest * support kv group (GQA) * (trivial) fix api and pytest * (trivial) func renaming * (trivial) func/file renaming * refactor pytest for attention * (trivial) format and consistent vars of context/decode attn * (trivial) remove test redundancy	2024-01-11 13:46:14 +00:00
Jianghai	fded91d049	[Inference] Kernel: no pad rotary embedding (#5252 ) * fix bugs * comment * use more accurate atol * fix	2024-01-11 13:46:14 +00:00
yuehuayingxueluo	d40eb26029	fix bugs in request_handler.py and engine.py	2024-01-11 13:46:14 +00:00
yuehuayingxueluo	10e3c9f923	rm torch.cuda.synchronize	2024-01-11 13:46:14 +00:00
yuehuayingxueluo	fab294c7f4	fix CI bugs	2024-01-11 13:46:14 +00:00
yuehuayingxueluo	2a73e828eb	fix bugs related to processing padding mask	2024-01-11 13:46:14 +00:00
Jianghai	e545a871b8	[Hotfix] Fix accuracy and align attention method api with Triton kernel (#5229 ) * fix accuracy * alignment in attention * fix attention * fix * fix bugs * fix bugs * fix bugs	2024-01-11 13:46:14 +00:00

1 2 3 4 5 ...

3119 Commits (606603bb8805c39f6ee01029337ddc614c8d46ef) All Branches Search

3119 Commits (606603bb8805c39f6ee01029337ddc614c8d46ef)

All Branches