ColossalAI

Commit Graph

Author	SHA1	Message	Date
Jianghai	f47f2fbb24	[Inference] Fix API server, test and example (#5712 ) * fix api server * fix generation config * fix api server * fix comments * fix infer hanging bug * resolve comments, change backend to free port	7 months ago
Steve Luo	7806842f2d	add paged-attetionv2: support seq length split across thread block (#5707 )	7 months ago
CjhHa1	5d9a49483d	[Inference] Add example test_ci script	7 months ago
Jianghai	61a1b2e798	[Inference] Fix bugs and docs for feat/online-server (#5598 ) * fix test bugs * add do sample test * del useless lines * fix comments * fix tests * delete version tag * delete version tag * add * del test sever * fix test * fix * Revert "add" This reverts commit `b9305fb024`.	7 months ago
Jianghai	c064032865	[Online Server] Chat Api for streaming and not streaming response (#5470 ) * fix bugs * fix bugs * fix api server * fix api server * add chat api and test * del request.n	7 months ago
Jianghai	de378cd2ab	[Inference] Finish Online Serving Test, add streaming output api, continuous batching test and example (#5432 ) * finish online test and add examples * fix test_contionus_batching * fix some bugs * fix bash * fix * fix inference * finish revision * fix typos * revision	7 months ago
Yuanheng Zhao	12e7c28d5e	[hotfix] fix OpenMOE example import path (#5697 )	7 months ago
Yuanheng Zhao	55cc7f3df7	[Fix] Fix Inference Example, Tests, and Requirements (#5688 ) * clean requirements * modify example inference struct * add test ci scripts * mark test_infer as submodule * rm deprecated cls & deps * import of HAS_FLASH_ATTN * prune inference tests to be run * prune triton kernel tests * increment pytest timeout mins * revert import path in openmoe	7 months ago
Yuanheng Zhao	8754abae24	[Fix] Fix & Update Inference Tests (compatibility w/ main)	7 months ago
Yuanheng Zhao	56ed09aba5	[sync] resolve conflicts of merging main	7 months ago
Yuanheng Zhao	537a3cbc4d	[kernel] Support New KCache Layout - Triton Kernel (#5677 ) * kvmemcpy triton for new kcache layout * revise tests for new kcache layout * naive triton flash decoding - new kcache layout * rotary triton kernel - new kcache layout * remove redundancy - triton decoding * remove redundancy - triton kvcache copy * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	7 months ago
Steve Luo	5cd75ce4c7	[Inference/Kernel] refactor kvcache manager and rotary_embedding and kvcache_memcpy oper… (#5663 ) * refactor kvcache manager and rotary_embedding and kvcache_memcpy operator * refactor decode_kv_cache_memcpy * enable alibi in pagedattention	7 months ago
Hongxin Liu	7f8b16635b	[misc] refactor launch API and tensor constructor (#5666 ) * [misc] remove config arg from initialize * [misc] remove old tensor contrusctor * [plugin] add npu support for ddp * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [devops] fix doc test ci * [test] fix test launch * [doc] update launch doc --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	7 months ago
Tong Li	68ec99e946	[hotfix] add soft link to support required files (#5661 )	7 months ago
Yuanheng Zhao	5be590b99e	[kernel] Support new KCache Layout - Context Attention Triton Kernel (#5658 ) * add context attn triton kernel - new kcache layout * add benchmark triton * tiny revise * trivial - code style, comment	7 months ago
Yuanheng Zhao	f342a93871	[Fix] Remove obsolete files - inference (#5650 )	7 months ago
Hongxin Liu	1b387ca9fe	[shardformer] refactor pipeline grad ckpt config (#5646 ) * [shardformer] refactor pipeline grad ckpt config * [shardformer] refactor pipeline grad ckpt config * [pipeline] fix stage manager	7 months ago
Steve Luo	a8fd3b0342	[Inference/Kernel] Optimize paged attention: Refactor key cache layout (#5643 ) * optimize flashdecodingattention: refactor code with different key cache layout(from [num_blocks, num_kv_heads, block_size, head_size] to [num_blocks, num_kv_heads, head_size/x, block_size, x]) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	7 months ago
yuehuayingxueluo	90cd5227a3	[Fix/Inference]Fix vllm benchmark (#5630 ) * Fix bugs about OOM when running vllm-0.4.0 * rm used params * change generation_config * change benchmark log file name	7 months ago
傅剑寒	279300dc5f	[Inference/Refactor] Refactor compilation mechanism and unified multi hw (#5613 ) * refactor compilation mechanism and unified multi hw * fix file path bug * add init.py to make pybind a module to avoid relative path error caused by softlink * delete duplicated micros * fix micros bug in gcc	7 months ago
Yuanheng Zhao	04863a9b14	[example] Update Llama Inference example (#5629 ) * [example] add infernece benchmark llama3 * revise inference config - arg * remove unused args * add llama generation demo script * fix init rope in llama policy * add benchmark-llama3 - cleanup	7 months ago
binmakeswell	f4c5aafe29	[example] llama3 (#5631 ) * release llama3 * [release] llama3 * [release] llama3 * [release] llama3 * [release] llama3	7 months ago
Hongxin Liu	4de4e31818	[exampe] update llama example (#5626 ) * [plugin] support dp inside for hybriad parallel * [example] update llama benchmark * [example] update llama benchmark * [example] update llama readme * [example] update llama readme	7 months ago
Steve Luo	ccf72797e3	feat baichuan2 rmsnorm whose hidden size equals to 5120 (#5611 )	7 months ago
Edenzzzz	d83c633ca6	[hotfix] Fix examples no pad token & auto parallel codegen bug; (#5606 ) * fix no pad token bug * fixed some auto parallel codegen bug, but might not run on torch 2.1 --------- Co-authored-by: Edenzzzz <wtan45@wisc.edu>	7 months ago
Steve Luo	be396ad6cc	[Inference/Kernel] Add Paged Decoding kernel, sequence split within the same thread block (#5531 ) * feat flash decoding for paged attention * refactor flashdecodingattention * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	7 months ago
yuehuayingxueluo	56b222eff8	[inference/model]Adapted to the baichuan2-7B model (#5591 ) * Adapted to the baichuan2-7B model * modified according to the review comments. * Modified the method of obtaining random weights. * modified according to the review comments. * change mlp layewr 'NOTE'	8 months ago
Yuanheng	ed5ebd1735	[Fix] resolve conflicts of merging main	8 months ago
Hongxin Liu	641b1ee71a	[devops] remove post commit ci (#5566 ) * [devops] remove post commit ci * [misc] run pre-commit on all files * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	8 months ago
digger yu	341263df48	[hotfix] fix typo s/get_defualt_parser /get_default_parser (#5548 )	8 months ago
digger yu	a799ca343b	[fix] fix typo s/muiti-node /multi-node etc. (#5448 )	8 months ago
Edenzzzz	15055f9a36	[hotfix] quick fixes to make legacy tutorials runnable (#5559 ) Co-authored-by: Edenzzzz <wtan45@wisc.edu>	8 months ago
Wenhao Chen	e614aa34f3	[shardformer, pipeline] add `gradient_checkpointing_ratio` and heterogenous shard policy for llama (#5508 ) * feat: add `GradientCheckpointConfig` and `PipelineGradientCheckpointConfig` * feat: apply `GradientCheckpointConfig` to policy and llama_forward * feat: move `distribute_layer` and `get_stage_index` to PipelineStageManager * fix: add optional args for `distribute_layer` and `get_stage_index` * fix: fix changed API calls * test: update llama tests * style: polish `GradientCheckpointConfig` * fix: fix pipeline utils tests	8 months ago
Yuanheng Zhao	36c4bb2893	[Fix] Grok-1 use tokenizer from the same pretrained path (#5532 ) * [fix] use tokenizer from the same pretrained path * trust remote code	8 months ago
yuehuayingxueluo	934e31afb2	The writing style of tail processing and the logic related to macro definitions have been optimized. (#5519 )	8 months ago
Insu Jang	00525f7772	[shardformer] fix pipeline forward error if custom layer distribution is used (#5189 ) * Use self.[distribute_layers\|get_stage_index] to exploit custom layer distribution * Change static methods for t5 layer distribution to member functions * Change static methods for whisper layer distribution to member functions * Replace whisper policy usage with self one * Fix test case to use non-static layer distribution methods * fix: fix typo --------- Co-authored-by: Wenhao Chen <cwher@outlook.com>	8 months ago
Yuanheng Zhao	131f32a076	[fix] fix grok-1 example typo (#5506 )	8 months ago
binmakeswell	34e909256c	[release] grok-1 inference benchmark (#5500 ) * [release] grok-1 inference benchmark * [release] grok-1 inference benchmark * [release] grok-1 inference benchmark * [release] grok-1 inference benchmark * [release] grok-1 inference benchmark	8 months ago
yuehuayingxueluo	87079cffe8	[Inference]Support FP16/BF16 Flash Attention 2 And Add high_precision Flag To Rotary Embedding (#5461 ) * Support FP16/BF16 Flash Attention 2 * fix bugs in test_kv_cache_memcpy.py * add context_kv_cache_memcpy_kernel.cu * rm typename MT * add tail process * add high_precision * add high_precision to config.py * rm unused code * change the comment for the high_precision parameter * update test_rotary_embdding_unpad.py * fix vector_copy_utils.h * add comment for self.high_precision when using float32	8 months ago
Wenhao Chen	bb0a668fee	[hotfix] set return_outputs=False in examples and polish code (#5404 ) * fix: simplify merge_batch * fix: use return_outputs=False to eliminate extra memory consumption * feat: add return_outputs warning * style: remove `return_outputs=False` as it is the default value	8 months ago
Yuanheng Zhao	5fcd7795cd	[example] update Grok-1 inference (#5495 ) * revise grok-1 example * remove unused arg in scripts * prevent re-installing torch * update readme * revert modifying colossalai requirements * add perf * trivial * add tokenizer url	8 months ago
binmakeswell	6df844b8c4	[release] grok-1 314b inference (#5490 ) * [release] grok-1 inference * [release] grok-1 inference * [release] grok-1 inference	8 months ago
Hongxin Liu	848a574c26	[example] add grok-1 inference (#5485 ) * [misc] add submodule * remove submodule * [example] support grok-1 tp inference * [example] add grok-1 inference script * [example] refactor code * [example] add grok-1 readme * [exmaple] add test ci * [exmaple] update readme	8 months ago
yuehuayingxueluo	f366a5ea1f	[Inference/kernel]Add Fused Rotary Embedding and KVCache Memcopy CUDA Kernel (#5418 ) * add rotary embedding kernel * add rotary_embedding_kernel * add fused rotary_emb and kvcache memcopy * add fused_rotary_emb_and_cache_kernel.cu * add fused_rotary_emb_and_memcopy * fix bugs in fused_rotary_emb_and_cache_kernel.cu * fix ci bugs * use vec memcopy and opt the gloabl memory access * fix code style * fix test_rotary_embdding_unpad.py * codes revised based on the review comments * fix bugs about include path * rm inline	9 months ago
digger yu	385e85afd4	[hotfix] fix typo s/keywrods/keywords etc. (#5429 )	9 months ago
Steve Luo	f7aecc0c6b	feat rmsnorm cuda kernel and add unittest, benchmark script (#5417 )	9 months ago
Youngon	68f55a709c	[hotfix] fix stable diffusion inference bug. (#5289 ) * Update train_ddp.yaml delete "strategy" to fix DDP config loading bug in "main.py" * Update train_ddp.yaml fix inference with scripts/txt2img.py config file load bug. * Update README.md add pretrain model test code.	9 months ago
Luo Yihang	e239cf9060	[hotfix] fix typo of openmoe model source (#5403 )	9 months ago
MickeyCHAN	e304e4db35	[hotfix] fix sd vit import error (#5420 ) * fix import error * Update dpt_depth.py --------- Co-authored-by: binmakeswell <binmakeswell@gmail.com>	9 months ago
Hongxin Liu	070df689e6	[devops] fix extention building (#5427 )	9 months ago

1 2 3 4 5 ...

435 Commits (a8d459f99a1d415fc843327e4dafce19ecee1f3e)