ColossalAI

Commit Graph

Author	SHA1	Message	Date
Frank Lee	4c03347fc7	Merge pull request #5377 from hpcaitech/example/llama-npu [llama] support npu for Colossal-LLaMA-2	2024-02-08 14:12:11 +08:00
Frank Lee	9afa52061f	[inference] refactored config (#5376 )	2024-02-08 14:04:14 +08:00
ver217	06db94fbc9	[moe] fix tests	2024-02-08 12:46:37 +08:00
Hongxin Liu	65e5d6baa5	[moe] fix mixtral optim checkpoint (#5344 )	2024-02-07 19:21:02 +08:00
Hongxin Liu	956b561b54	[moe] fix mixtral forward default value (#5329 )	2024-02-07 19:21:02 +08:00
Hongxin Liu	b60be18dcc	[moe] fix mixtral checkpoint io (#5314 )	2024-02-07 19:21:02 +08:00
Hongxin Liu	da39d21b71	[moe] support mixtral (#5309 ) * [moe] add mixtral block for single expert * [moe] mixtral block fwd support uneven ep * [moe] mixtral block bwd support uneven ep * [moe] add mixtral moe layer * [moe] simplify replace * [meo] support save sharded mixtral * [meo] support load sharded mixtral * [meo] support save sharded optim * [meo] integrate moe manager into plug * [meo] fix optimizer load * [meo] fix mixtral layer	2024-02-07 19:21:02 +08:00
Hongxin Liu	c904d2ae99	[moe] update capacity computing (#5253 ) * [moe] top2 allow uneven input * [moe] update capacity computing * [moe] remove debug info * [moe] update capacity computing * [moe] update capacity computing	2024-02-07 19:21:02 +08:00
Xuanlei Zhao	7d8e0338a4	[moe] init mixtral impl	2024-02-07 19:21:02 +08:00
Jianghai	1f8c7e7046	[Inference] User Experience: update the logic of default tokenizer and generation config. (#5337 ) * add * fix * fix * pause * fix * fix pytest * align * fix * license * fix * fix * fix readme * fix some bugs * remove tokenizer config	2024-02-07 17:55:48 +08:00
yuehuayingxueluo	6fb4bcbb24	[Inference/opt] Fused KVCahce Memcopy (#5374 ) * fused kv memcopy * add TODO in test_kvcache_copy.py	2024-02-07 17:15:42 +08:00
Frank Lee	58740b5f68	[inference] added inference template (#5375 )	2024-02-07 17:11:43 +08:00
Frank Lee	8106ede07f	Revert "[Inference] Adapt to Fused rotary (#5348 )" (#5373 ) This reverts commit `9f4ab2eb92`.	2024-02-07 14:27:04 +08:00
Jianghai	9f4ab2eb92	[Inference] Adapt to Fused rotary (#5348 ) * revise rotary embedding * remove useless print * adapt * fix * add * fix * modeling * fix * fix * fix	2024-02-07 11:36:04 +08:00
yuehuayingxueluo	35382a7fbf	[Inference]Fused the gate and up proj in mlp，and optimized the autograd process. (#5365 ) * fused the gate and up proj in mlp * fix code styles * opt auto_grad * rollback test_inference_engine.py * modifications based on the review feedback. * fix bugs in flash attn * Change reshape to view * fix test_rmsnorm_triton.py	2024-02-06 19:38:25 +08:00
Hongxin Liu	084c91246c	[llama] fix memory issue (#5371 ) * [llama] fix memory issue * [llama] add comment	2024-02-06 19:02:37 +08:00
Yuanheng Zhao	1dedb57747	[Fix/Infer] Remove unused deps and revise requirements (#5341 ) * remove flash-attn dep * rm padding llama * revise infer requirements * move requirements out of module	2024-02-06 17:27:45 +08:00
Hongxin Liu	c53ddda88f	[lr-scheduler] fix load state dict and add test (#5369 )	2024-02-06 14:23:32 +08:00
Hongxin Liu	eb4f2d90f9	[llama] polish training script and fix optim ckpt (#5368 )	2024-02-06 11:52:17 +08:00
Camille Zhong	a5756a8720	[eval] update llama npu eval (#5366 )	2024-02-06 10:53:03 +08:00
Camille Zhong	44ca61a22b	[llama] fix neftune & pbar with start_step (#5364 )	2024-02-05 18:04:23 +08:00
Hongxin Liu	a4cec1715b	[llama] add flash attn patch for npu (#5362 )	2024-02-05 16:48:34 +08:00
Hongxin Liu	73f9f23fc6	[llama] update training script (#5360 ) * [llama] update training script * [doc] polish docstr	2024-02-05 16:33:18 +08:00
Hongxin Liu	6c0fa7b9a8	[llama] fix dataloader for hybrid parallel (#5358 ) * [plugin] refactor prepare dataloader * [plugin] update train script	2024-02-05 15:14:56 +08:00
Hongxin Liu	2dd01e3a14	[gemini] fix param op hook when output is tuple (#5355 ) * [gemini] fix param op hook when output is tuple * [gemini] fix param op hook	2024-02-04 11:58:26 +08:00
yuehuayingxueluo	631862f339	[Inference]Optimize generation process of inference engine (#5356 ) * opt inference engine * fix run_benchmark.sh * fix generate in engine.py * rollback tesh_inference_engine.py	2024-02-02 15:38:21 +08:00
yuehuayingxueluo	21ad4a27f9	[Inference/opt]Optimize the mid tensor of RMS Norm (#5350 ) * opt rms_norm * fix bugs in rms_layernorm	2024-02-02 15:06:01 +08:00
Wenhao Chen	1c790c0877	[fix] remove unnecessary dp_size assert (#5351 ) * fix: remove unnecessary assert * test: add more 3d plugin tests * fix: add warning	2024-02-02 14:40:20 +08:00
Frank Lee	027aa1043f	[doc] updated inference readme (#5343 )	2024-02-02 14:31:10 +08:00
Frank Lee	e76acbb076	[inference] moved ops tests to test_infer (#5354 )	2024-02-02 13:51:22 +08:00
Frank Lee	db1a763307	[inference] removed redundancy init_batch (#5353 )	2024-02-02 11:44:15 +08:00
Hongxin Liu	ffffc32dc7	[checkpointio] fix gemini and hybrid parallel optim checkpoint (#5347 ) * [checkpointio] fix hybrid parallel optim checkpoint * [extension] fix cuda extension * [checkpointio] fix gemini optimizer checkpoint * polish code	2024-02-01 16:13:06 +08:00
yuehuayingxueluo	249644c23b	[Inference]Repalce Attention layer and MLP layer by shardformer to optimize the weight transpose operation，add fused_qkv and fused linear_add (#5340 ) * add fused qkv * replace attn and mlp by shardformer * fix bugs in mlp * add docstrings * fix test_inference_engine.py * add optimize unbind * add fused_addmm * rm squeeze(1) * refactor codes * fix ci bugs * rename ShardFormerLlamaMLP and ShardFormerLlamaAttention * Removed the dependency on LlamaFlashAttention2 * rollback test_inference_engine.py	2024-02-01 15:49:39 +08:00
Frank Lee	f8e456d202	[inference] simplified config verification (#5346 ) * [inference] simplified config verification * polish * polish	2024-02-01 15:31:01 +08:00
YeAnbang	c5239840e6	[Chat] fix sft loss nan (#5345 ) * fix script * fix script * fix chat nan * fix chat nan	2024-02-01 14:25:16 +08:00
Frank Lee	abd8e77ad8	[extension] fixed exception catch (#5342 )	2024-01-31 18:09:49 +08:00
Jianghai	df0aa49585	[Inference] Kernel Fusion, fused copy kv cache into rotary embedding (#5336 ) * revise rotary embedding * remove useless print * adapt	2024-01-31 16:31:29 +08:00
Frank Lee	1336838a91	Merge pull request #5339 from FrankLeeeee/sync/merge-main Sync/merge main	2024-01-31 16:29:26 +08:00
FrankLeeeee	c565519913	merge commit	2024-01-31 10:41:47 +08:00
Yuanheng Zhao	5f98a9d68a	[Infer] Optimize Blocked KVCache And Kernels Using It (#5325 ) * revise shape of kvcache (context attn kernel) * revise shape of kvcache (flash decoding kernel) * revise shape of kvcache (kvcache copy) and attn func * init of kvcache in kvcache manager * revise llama modeling * revise block size retrieval * use torch for rms_norm benchmarking * revise block size retrieval	2024-01-30 16:06:09 +08:00
yuehuayingxueluo	e8f0642f28	[Inference]Add Nopadding Llama Modeling (#5327 ) * add nopadding llama modeling * add nopadding_llama.py * rm unused codes * fix bugs in test_xine_copy.py * fix code style	2024-01-30 10:31:46 +08:00
digger yu	71321a07cf	fix typo change dosen't to doesn't (#5308 )	2024-01-30 09:57:38 +08:00
digger yu	6a3086a505	fix typo under extensions/ (#5330 )	2024-01-30 09:55:16 +08:00
Frank Lee	febed23288	[doc] added docs for extensions (#5324 ) * [doc] added docs for extensions * polish * polish	2024-01-29 17:39:23 +08:00
flybird11111	388179f966	[tests] fix t5 test. (#5322 ) * [ci] fix shardformer tests. (#5255) * fix ci fix * revert: revert p2p * feat: add enable_metadata_cache option * revert: enable t5 tests --------- Co-authored-by: Wenhao Chen <cwher@outlook.com> * fix t5 test --------- Co-authored-by: Wenhao Chen <cwher@outlook.com>	2024-01-29 17:38:46 +08:00
Jianghai	c7c104cb7c	[DOC] Update inference readme (#5280 ) * add readme * add readme * 1 * update engine * finish readme * add readme	2024-01-29 16:21:06 +08:00
Frank Lee	a6709afe66	Merge pull request #5321 from FrankLeeeee/hotfix/accelerator-api [accelerator] fixed npu api	2024-01-29 14:29:58 +08:00
FrankLeeeee	087d0cb1fc	[accelerator] fixed npu api	2024-01-29 14:27:52 +08:00
Frank Lee	8823cc4831	Merge pull request #5310 from hpcaitech/feature/npu Feature/npu	2024-01-29 13:49:39 +08:00
Frank Lee	73f4dc578e	[workflow] updated CI image (#5318 )	2024-01-29 11:53:07 +08:00

... 7 8 9 10 11 ...

3461 Commits (fbf33ecd019ce0e075b76b628e6e8a319cfc43e3) All Branches Search

3461 Commits (fbf33ecd019ce0e075b76b628e6e8a319cfc43e3)

All Branches