ColossalAI

Commit Graph

Author	SHA1	Message	Date
Runyu Lu	633e95b301	[doc] add doc	9 months ago
Runyu Lu	9dec66fad6	[fix] multi graphs capture error	9 months ago
Runyu Lu	b2c0d9ff2b	[fix] multi graphs capture error	9 months ago
Runyu Lu	cefaeb5fdd	[feat] cuda graph support and refactor non-functional api	9 months ago
Frank Lee	593a72e4d5	Merge pull request #5424 from FrankLeeeee/sync/main Sync/main	9 months ago
FrankLeeeee	0310b76e9d	Merge branch 'main' into sync/main	9 months ago
Camille Zhong	4b8312c08e	fix sft single turn inference example (#5416 )	9 months ago
binmakeswell	a1c6cdb189	[doc] fix blog link	9 months ago
binmakeswell	5de940de32	[doc] fix blog link	9 months ago
Frank Lee	2461f37886	[workflow] added pypi channel (#5412 )	9 months ago
Tong Li	a28c971516	update requirements (#5407 )	9 months ago
yuehuayingxueluo	0aa27f1961	[Inference]Move benchmark-related code to the example directory. (#5408 ) * move benchmark-related code to the example directory. * fix bugs in test_fused_rotary_embedding.py	9 months ago
yuehuayingxueluo	600881a8ea	[Inference]Add CUDA KVCache Kernel (#5406 ) * add cuda KVCache kernel * annotation benchmark_kvcache_copy * add use cuda * fix import path * move benchmark scripts to example/ * rm benchmark codes in test_kv_cache_memcpy.py * rm redundancy codes * rm redundancy codes * pr was modified according to the review	9 months ago
flybird11111	0a25e16e46	[shardformer]gather llama logits (#5398 ) * gather llama logits * fix	9 months ago
Frank Lee	dcdd8a5ef7	[setup] fixed nightly release (#5388 )	9 months ago
QinLuo	bf34c6fef6	[fsdp] impl save/load shard model/optimizer (#5357 )	9 months ago
Hongxin Liu	d882d18c65	[example] reuse flash attn patch (#5400 )	9 months ago
Hongxin Liu	95c21e3950	[extension] hotfix jit extension setup (#5402 )	9 months ago
Yuanheng Zhao	19061188c3	[Infer/Fix] Fix Dependency in test - RMSNorm kernel (#5399 ) fix dependency in pytest	9 months ago
yuehuayingxueluo	bc1da87366	[Fix/Inference] Fix format of input prompts and input model in inference engine (#5395 ) * Fix bugs in inference_engine * fix bugs in engine.py * rm CUDA_VISIBLE_DEVICES * add request_ids in generate * fix bug in engine.py * add logger.debug for BatchBucket	9 months ago
yuehuayingxueluo	2a718c8be8	Optimized the execution interval time between cuda kernels caused by view and memcopy (#5390 ) * opt_view_and_memcopy * fix bugs in ci * fix ci bugs * update benchmark scripts * fix ci bugs	9 months ago
Jianghai	730103819d	[Inference]Fused kv copy into rotary calculation (#5383 ) * revise rotary embedding * remove useless print * adapt * fix * add * fix * modeling * fix * fix * fix * fused kv copy * fused copy * colossalai/kernel/triton/no_pad_rotary_embedding.py * del padding llama * del	9 months ago
Stephan Kölker	5d380a1a21	[hotfix] Fix wrong import in meta_registry (#5392 )	9 months ago
CZYCW	b833153fd5	[hotfix] fix variable type for top_p (#5313 ) Co-authored-by: binmakeswell <binmakeswell@gmail.com>	9 months ago
Yuanheng Zhao	b21aac5bae	[Inference] Optimize and Refactor Inference Batching/Scheduling (#5367 ) * add kvcache manager funcs for batching * add batch bucket for batching * revise RunningList struct in handler * add kvcache/batch funcs for compatibility * use new batching methods * fix indexing bugs * revise abort logic * use cpu seq lengths/block tables * rm unused attr in Sequence * fix type conversion/default arg * add and revise pytests * revise pytests, rm unused tests * rm unused statements * fix pop finished indexing issue * fix: use index in batch when retrieving inputs/update seqs * use dict instead of odict in batch struct * arg type hinting * fix make compress * refine comments * fix: pop_n_seqs to pop the first n seqs * add check in request handler * remove redundant conversion * fix test for request handler * fix pop method in batch bucket * fix prefill adding	9 months ago
Frank Lee	705a62a565	[doc] updated installation command (#5389 )	9 months ago
yixiaoer	69e3ad01ed	[doc] Fix typo (#5361 )	9 months ago
Hongxin Liu	7303801854	[llama] fix training and inference scripts (#5384 ) * [llama] refactor inference example to fit sft * [llama] fix training script to fit gemini * [llama] fix inference script	9 months ago
Hongxin Liu	adae123df3	[release] update version (#5380 )	10 months ago
Frank Lee	efef43b53c	Merge pull request #5372 from hpcaitech/exp/mixtral	10 months ago
yuehuayingxueluo	8c69debdc7	[Inference]Support vllm testing in benchmark scripts (#5379 ) * add vllm benchmark scripts * fix code style * update run_benchmark.sh * fix code style	10 months ago
Frank Lee	4c03347fc7	Merge pull request #5377 from hpcaitech/example/llama-npu [llama] support npu for Colossal-LLaMA-2	10 months ago
Frank Lee	9afa52061f	[inference] refactored config (#5376 )	10 months ago
ver217	06db94fbc9	[moe] fix tests	10 months ago
Hongxin Liu	65e5d6baa5	[moe] fix mixtral optim checkpoint (#5344 )	10 months ago
Hongxin Liu	956b561b54	[moe] fix mixtral forward default value (#5329 )	10 months ago
Hongxin Liu	b60be18dcc	[moe] fix mixtral checkpoint io (#5314 )	10 months ago
Hongxin Liu	da39d21b71	[moe] support mixtral (#5309 ) * [moe] add mixtral block for single expert * [moe] mixtral block fwd support uneven ep * [moe] mixtral block bwd support uneven ep * [moe] add mixtral moe layer * [moe] simplify replace * [meo] support save sharded mixtral * [meo] support load sharded mixtral * [meo] support save sharded optim * [meo] integrate moe manager into plug * [meo] fix optimizer load * [meo] fix mixtral layer	10 months ago
Hongxin Liu	c904d2ae99	[moe] update capacity computing (#5253 ) * [moe] top2 allow uneven input * [moe] update capacity computing * [moe] remove debug info * [moe] update capacity computing * [moe] update capacity computing	10 months ago
Xuanlei Zhao	7d8e0338a4	[moe] init mixtral impl	10 months ago
Jianghai	1f8c7e7046	[Inference] User Experience: update the logic of default tokenizer and generation config. (#5337 ) * add * fix * fix * pause * fix * fix pytest * align * fix * license * fix * fix * fix readme * fix some bugs * remove tokenizer config	10 months ago
yuehuayingxueluo	6fb4bcbb24	[Inference/opt] Fused KVCahce Memcopy (#5374 ) * fused kv memcopy * add TODO in test_kvcache_copy.py	10 months ago
Frank Lee	58740b5f68	[inference] added inference template (#5375 )	10 months ago
Frank Lee	8106ede07f	Revert "[Inference] Adapt to Fused rotary (#5348 )" (#5373 ) This reverts commit `9f4ab2eb92`.	10 months ago
Jianghai	9f4ab2eb92	[Inference] Adapt to Fused rotary (#5348 ) * revise rotary embedding * remove useless print * adapt * fix * add * fix * modeling * fix * fix * fix	10 months ago
yuehuayingxueluo	35382a7fbf	[Inference]Fused the gate and up proj in mlp，and optimized the autograd process. (#5365 ) * fused the gate and up proj in mlp * fix code styles * opt auto_grad * rollback test_inference_engine.py * modifications based on the review feedback. * fix bugs in flash attn * Change reshape to view * fix test_rmsnorm_triton.py	10 months ago
Hongxin Liu	084c91246c	[llama] fix memory issue (#5371 ) * [llama] fix memory issue * [llama] add comment	10 months ago
Yuanheng Zhao	1dedb57747	[Fix/Infer] Remove unused deps and revise requirements (#5341 ) * remove flash-attn dep * rm padding llama * revise infer requirements * move requirements out of module	10 months ago
Hongxin Liu	c53ddda88f	[lr-scheduler] fix load state dict and add test (#5369 )	10 months ago
Hongxin Liu	eb4f2d90f9	[llama] polish training script and fix optim ckpt (#5368 )	10 months ago

1 2 3 4 5 ...

3092 Commits (633e95b301336c4c237537f584882b3d8e5f4145) All Branches Search

3092 Commits (633e95b301336c4c237537f584882b3d8e5f4145)

All Branches