ColossalAI

Commit Graph

Author	SHA1	Message	Date
Xu Kai	946ab56c48	[feature] add gptq for inference (#4754 ) * [gptq] add gptq kernel (#4416) * add gptq * refactor code * fix tests * replace auto-gptq * rname inferance/quant * refactor test * add auto-gptq as an option * reset requirements * change assert and check auto-gptq * add import warnings * change test flash attn version * remove example * change requirements of flash_attn * modify tests * [skip ci] change requirements-test * [gptq] faster gptq cuda kernel (#4494) * [skip ci] add cuda kernels * add license * [skip ci] fix max_input_len * format files & change test size * [skip ci] * [gptq] add gptq tensor parallel (#4538) * add gptq tensor parallel * add gptq tp * delete print * add test gptq check * add test auto gptq check * [gptq] combine gptq and kv cache manager (#4706) * combine gptq and kv cache manager * add init bits * delete useless code * add model path * delete usless print and update test * delete usless import * move option gptq to shard config * change replace linear to shardformer * update bloom policy * delete useless code * fix import bug and delete uselss code * change colossalai/gptq to colossalai/quant/gptq * update import linear for tests * delete useless code and mv gptq_kernel to kernel directory * fix triton kernel * add triton import	2023-09-22 11:02:50 +08:00
littsk	1e0e080837	[bug] Fix the version check bug in colossalai run when generating the cmd. (#4713 ) * Fix the version check bug in colossalai run when generating the cmd. * polish code	2023-09-22 10:50:47 +08:00
Hongxin Liu	3e05c07bb8	[lazy] support torch 2.0 (#4763 ) * [lazy] support _like methods and clamp * [lazy] pass transformers models * [lazy] fix device move and requires grad * [lazy] fix requires grad and refactor api * [lazy] fix requires grad	2023-09-21 16:30:23 +08:00
Wenhao Chen	901ab1eedd	[chat]: add lora merge weights config (#4766 ) * feat: modify lora merge weights fn * feat: add lora merge weights config	2023-09-21 16:23:59 +08:00
Baizhou Zhang	493a5efeab	[doc] add shardformer doc to sidebar (#4768 )	2023-09-21 14:53:16 +08:00
Hongxin Liu	66f3926019	[doc] clean up outdated docs (#4765 ) * [doc] clean up outdated docs * [doc] fix linking * [doc] fix linking	2023-09-21 11:36:20 +08:00
Baizhou Zhang	df66741f77	[bug] fix get_default_parser in examples (#4764 )	2023-09-21 10:42:25 +08:00
Baizhou Zhang	c0a033700c	[shardformer] fix master param sync for hybrid plugin/rewrite unwrapping logic (#4758 ) * fix master param sync for hybrid plugin * rewrite unwrap for ddp/fsdp * rewrite unwrap for zero/gemini * rewrite unwrap for hybrid plugin * fix geemini unwrap * fix bugs	2023-09-20 18:29:37 +08:00
Wenhao Chen	7b9b86441f	[chat]: update rm, add wandb and fix bugs (#4471 ) * feat: modify forward fn of critic and reward model * feat: modify calc_action_log_probs * to: add wandb in sft and rm trainer * feat: update train_sft * feat: update train_rm * style: modify type annotation and add warning * feat: pass tokenizer to ppo trainer * to: modify trainer base and maker base * feat: add wandb in ppo trainer * feat: pass tokenizer to generate * test: update generate fn tests * test: update train tests * fix: remove action_mask * feat: remove unused code * fix: fix wrong ignore_index * fix: fix mock tokenizer * chore: update requirements * revert: modify make_experience * fix: fix inference * fix: add padding side * style: modify _on_learn_batch_end * test: use mock tokenizer * fix: use bf16 to avoid overflow * fix: fix workflow * [chat] fix gemini strategy * [chat] fix * sync: update colossalai strategy * fix: fix args and model dtype * fix: fix checkpoint test * fix: fix requirements * fix: fix missing import and wrong arg * fix: temporarily skip gemini test in stage 3 * style: apply pre-commit * fix: temporarily skip gemini test in stage 1&2 --------- Co-authored-by: Mingyan Jiang <1829166702@qq.com>	2023-09-20 15:53:58 +08:00
ppt0011	07c2e3d09c	Merge pull request #4757 from ppt0011/main [doc] explain suitable use case for each plugin	2023-09-20 11:57:43 +08:00
Pengtai Xu	4d7537ba25	[doc] put native colossalai plugins first in description section	2023-09-20 09:24:10 +08:00
Pengtai Xu	e10d9f087e	[doc] add model examples for each plugin	2023-09-19 18:01:23 +08:00
Pengtai Xu	a04337bfc3	[doc] put individual plugin explanation in front	2023-09-19 16:27:37 +08:00
Pengtai Xu	10513f203c	[doc] explain suitable use case for each plugin	2023-09-19 15:50:14 +08:00
Hongxin Liu	079bf3cb26	[misc] update pre-commit and run all files (#4752 ) * [misc] update pre-commit * [misc] run pre-commit * [misc] remove useless configuration files * [misc] ignore cuda for clang-format	2023-09-19 14:20:26 +08:00
github-actions[bot]	3c6b831c26	[format] applied code formatting on changed files in pull request 4743 (#4750 ) Co-authored-by: github-actions <github-actions@github.com>	2023-09-18 16:52:42 +08:00
Hongxin Liu	b5f9e37c70	[legacy] clean up legacy code (#4743 ) * [legacy] remove outdated codes of pipeline (#4692) * [legacy] remove cli of benchmark and update optim (#4690) * [legacy] remove cli of benchmark and update optim * [doc] fix cli doc test * [legacy] fix engine clip grad norm * [legacy] remove outdated colo tensor (#4694) * [legacy] remove outdated colo tensor * [test] fix test import * [legacy] move outdated zero to legacy (#4696) * [legacy] clean up utils (#4700) * [legacy] clean up utils * [example] update examples * [legacy] clean up amp * [legacy] fix amp module * [legacy] clean up gpc (#4742) * [legacy] clean up context * [legacy] clean core, constants and global vars * [legacy] refactor initialize * [example] fix examples ci * [example] fix examples ci * [legacy] fix tests * [example] fix gpt example * [example] fix examples ci * [devops] fix ci installation * [example] fix examples ci	2023-09-18 16:31:06 +08:00
Xuanlei Zhao	32e7f99416	[kernel] update triton init #4740 (#4740 )	2023-09-18 09:44:27 +08:00
Baizhou Zhang	d151dcab74	[doc] explaination of loading large pretrained models (#4741 )	2023-09-15 21:04:07 +08:00
flybird11111	4c4482f3ad	[example] llama2 add fine-tune example (#4673 ) * [shardformer] update shardformer readme [shardformer] update shardformer readme [shardformer] update shardformer readme * [shardformer] update llama2/opt finetune example and shardformer update to llama2 * [shardformer] update llama2/opt finetune example and shardformer update to llama2 * [shardformer] update llama2/opt finetune example and shardformer update to llama2 * [shardformer] change dataset * [shardformer] change dataset * [shardformer] fix CI * [shardformer] fix * [shardformer] fix * [shardformer] fix * [shardformer] fix * [shardformer] fix [example] update opt example [example] resolve comments fix fix * [example] llama2 add finetune example * [example] llama2 add finetune example * [example] llama2 add finetune example * [example] llama2 add finetune example * fix * update llama2 example * update llama2 example * fix * update llama2 example * update llama2 example * update llama2 example * update llama2 example * update llama2 example * update llama2 example * Update requirements.txt * update llama2 example * update llama2 example * update llama2 example	2023-09-15 18:45:44 +08:00
Xuanlei Zhao	ac2797996b	[shardformer] add custom policy in hybrid parallel plugin (#4718 ) * add custom policy * update assert	2023-09-15 17:53:13 +08:00
Baizhou Zhang	451c3465fb	[doc] polish shardformer doc (#4735 ) * arrange position of chapters * fix typos in seq parallel doc	2023-09-15 17:39:10 +08:00
ppt0011	73eb3e8862	Merge pull request #4738 from ppt0011/main [legacy] remove deterministic data loader test	2023-09-15 17:34:42 +08:00
Bin Jia	608cffaed3	[example] add gpt2 HybridParallelPlugin example (#4653 ) * add gpt2 HybridParallelPlugin example * update readme and testci * update test ci * fix test_ci bug * update requirements * add requirements * update requirements * add requirement * rename file	2023-09-15 17:12:46 +08:00
Bin Jia	6a03c933a0	[shardformer] update seq parallel document (#4730 ) * update doc of seq parallel * fix typo	2023-09-15 16:09:32 +08:00
Pengtai Xu	cd4e61d149	[legacy] remove deterministic data loader test	2023-09-15 15:52:18 +08:00
flybird11111	46162632e5	[shardformer] update pipeline parallel document (#4725 ) * [shardformer] update pipeline parallel document * [shardformer] update pipeline parallel document * [shardformer] update pipeline parallel document * [shardformer] update pipeline parallel document * [shardformer] update pipeline parallel document * [shardformer] update pipeline parallel document * [shardformer] update pipeline parallel document * [shardformer] update pipeline parallel document	2023-09-15 14:32:04 +08:00
digger yu	e4fc57c3de	Optimized some syntax errors in the documentation and code under applications/ (#4127 ) Co-authored-by: flybird11111 <1829166702@qq.com>	2023-09-15 14:18:22 +08:00
Baizhou Zhang	50e5602c2d	[doc] add shardformer support matrix/update tensor parallel documents (#4728 ) * add compatibility matrix for shardformer doc * update tp doc	2023-09-15 13:52:30 +08:00
github-actions[bot]	8c2dda7410	[format] applied code formatting on changed files in pull request 4726 (#4727 ) Co-authored-by: github-actions <github-actions@github.com>	2023-09-15 13:17:32 +08:00
Baizhou Zhang	f911d5b09d	[doc] Add user document for Shardformer (#4702 ) * create shardformer doc files * add docstring for seq-parallel * update ShardConfig docstring * add links to llama example * add outdated massage * finish introduction & supporting information * finish 'how shardformer works' * finish shardformer.md English doc * fix doctest fail * add Chinese document	2023-09-15 10:56:39 +08:00
binmakeswell	ce97790ed7	[doc] fix llama2 code link (#4726 ) * [doc] fix llama2 code link * [doc] fix llama2 code link * [doc] fix llama2 code link	2023-09-14 23:19:25 +08:00
flybird11111	20190b49a5	[shardformer] to fix whisper test failed due to significant accuracy differences. (#4710 ) * [shardformer] fix whisper test failed * [shardformer] fix whisper test failed * [shardformer] fix whisper test failed * [shardformer] fix whisper test failed	2023-09-14 21:34:20 +08:00
Yuanheng Zhao	e2c0e7f92a	[hotfix] Fix import error: colossal.kernel without triton installed (#4722 ) * [hotfix] remove triton kernels from kernel init * revise bloom/llama kernel imports for infer	2023-09-14 18:03:55 +08:00
flybird11111	c7d6975d29	[shardformer] fix GPT2DoubleHeadsModel (#4703 )	2023-09-13 15:57:16 +08:00
Baizhou Zhang	068372a738	[doc] add potential solution for OOM in llama2 example (#4699 )	2023-09-13 10:43:30 +08:00
digger yu	9c2feb2f0b	fix some typo with colossalai/device colossalai/tensor/ etc. (#4171 ) Co-authored-by: flybird11111 <1829166702@qq.com>	2023-09-12 17:41:52 +08:00
Baizhou Zhang	d8ceeac14e	[hotfix] fix typo in hybrid parallel io (#4697 )	2023-09-12 17:32:19 +08:00
flybird11111	8844691f4b	[shardformer] update shardformer readme (#4689 ) * [shardformer] update shardformer readme * [shardformer] update shardformer readme * [shardformer] update shardformer readme * [shardformer] update shardformer readme * [shardformer] update shardformer readme	2023-09-12 15:14:24 +08:00
Baizhou Zhang	1d454733c4	[doc] Update booster user documents. (#4669 ) * update booster_api.md * update booster_checkpoint.md * update booster_plugins.md * move transformers importing inside function * fix Dict typing * fix autodoc bug * small fix	2023-09-12 10:47:23 +08:00
Cuiqing Li	bce0f16702	[Feature] The first PR to Add TP inference engine, kv-cache manager and related kernels for our inference system (#4577 ) * [infer] Infer/llama demo (#4503) * add * add infer example * finish * finish * stash * fix * [Kernels] add inference token attention kernel (#4505) * add token forward * fix tests * fix comments * add try import triton * add adapted license * add tests check * [Kernels] add necessary kernels (llama & bloom) for attention forward and kv-cache manager (#4485) * added _vllm_rms_norm * change place * added tests * added tests * modify * adding kernels * added tests: * adding kernels * modify * added * updating kernels * adding tests * added tests * kernel change * submit * modify * added * edit comments * change name * change commnets and fix import * add * added * combine codes (#4509) * [feature] add KV cache manager for llama & bloom inference (#4495) * add kv cache memory manager * add stateinfo during inference * format * format * rename file * add kv cache test * revise on BatchInferState * file dir change * [Bug FIx] import llama context ops fix (#4524) * added _vllm_rms_norm * change place * added tests * added tests * modify * adding kernels * added tests: * adding kernels * modify * added * updating kernels * adding tests * added tests * kernel change * submit * modify * added * edit comments * change name * change commnets and fix import * add * added * fix * add ops into init.py * add * [Infer] Add TPInferEngine and fix file path (#4532) * add engine for TP inference * move file path * update path * fix TPInferEngine * remove unused file * add engine test demo * revise TPInferEngine * fix TPInferEngine, add test * fix * Add Inference test for llama (#4508) * add kv cache memory manager * add stateinfo during inference * add * add infer example * finish * finish * format * format * rename file * add kv cache test * revise on BatchInferState * add inference test for llama * fix conflict * feature: add some new features for llama engine * adapt colossalai triton interface * Change the parent class of llama policy * add nvtx * move llama inference code to tensor_parallel * fix __init__.py * rm tensor_parallel * fix: fix bugs in auto_policy.py * fix:rm some unused codes * mv colossalai/tpinference to colossalai/inference/tensor_parallel * change __init__.py * save change * fix engine * Bug fix: Fix hang * remove llama_infer_engine.py --------- Co-authored-by: yuanheng-zhao <jonathan.zhaoyh@gmail.com> Co-authored-by: CjhHa1 <cjh18671720497@outlook.com> * [infer] Add Bloom inference policy and replaced methods (#4512) * add bloom inference methods and policy * enable pass BatchInferState from model forward * revise bloom infer layers/policies * add engine for inference (draft) * add test for bloom infer * fix bloom infer policy and flow * revise bloom test * fix bloom file path * remove unused codes * fix bloom modeling * fix dir typo * fix trivial * fix policy * clean pr * trivial fix * Revert "[infer] Add Bloom inference policy and replaced methods (#4512)" (#4552) This reverts commit `17cfa57140`. * [Doc] Add colossal inference doc (#4549) * create readme * add readme.md * fix typos * [infer] Add Bloom inference policy and replaced methods (#4553) * add bloom inference methods and policy * enable pass BatchInferState from model forward * revise bloom infer layers/policies * add engine for inference (draft) * add test for bloom infer * fix bloom infer policy and flow * revise bloom test * fix bloom file path * remove unused codes * fix bloom modeling * fix dir typo * fix trivial * fix policy * clean pr * trivial fix * trivial * Fix Bugs In Llama Model Forward (#4550) * add kv cache memory manager * add stateinfo during inference * add * add infer example * finish * finish * format * format * rename file * add kv cache test * revise on BatchInferState * add inference test for llama * fix conflict * feature: add some new features for llama engine * adapt colossalai triton interface * Change the parent class of llama policy * add nvtx * move llama inference code to tensor_parallel * fix __init__.py * rm tensor_parallel * fix: fix bugs in auto_policy.py * fix:rm some unused codes * mv colossalai/tpinference to colossalai/inference/tensor_parallel * change __init__.py * save change * fix engine * Bug fix: Fix hang * remove llama_infer_engine.py * bug fix: fix bugs about infer_state.is_context_stage * remove pollcies * fix: delete unused code * fix: delete unused code * remove unused coda * fix conflict --------- Co-authored-by: yuanheng-zhao <jonathan.zhaoyh@gmail.com> Co-authored-by: CjhHa1 <cjh18671720497@outlook.com> * [doc] add colossal inference fig (#4554) * create readme * add readme.md * fix typos * upload fig * [NFC] fix docstring for colossal inference (#4555) Fix docstring and comments in kv cache manager and bloom modeling * fix docstring in llama modeling (#4557) * [Infer] check import vllm (#4559) * change import vllm * import apply_rotary_pos_emb * change import location * [DOC] add installation req (#4561) * add installation req * fix * slight change * remove empty * [Feature] rms-norm transfer into inference llama.py (#4563) * add installation req * fix * slight change * remove empty * add rmsnorm polciy * add * clean codes * [infer] Fix tp inference engine (#4564) * fix engine prepare data * add engine test * use bloom for testing * revise on test * revise on test * reset shardformer llama (#4569) * [infer] Fix engine - tensors on different devices (#4570) * fix diff device in engine * [codefactor] Feature/colossal inference (#4579) * code factors * remove * change coding (#4581) * [doc] complete README of colossal inference (#4585) * complete fig * Update README.md * [doc]update readme (#4586) * update readme * Update README.md * bug fix: fix bus in llama and bloom (#4588) * [BUG FIX]Fix test engine in CI and non-vllm kernels llama forward (#4592) * fix tests * clean * clean * fix bugs * add * fix llama non-vllm kernels bug * modify * clean codes * [Kernel]Rmsnorm fix (#4598) * fix tests * clean * clean * fix bugs * add * fix llama non-vllm kernels bug * modify * clean codes * add triton rmsnorm * delete vllm kernel flag * [Bug Fix]Fix bugs in llama (#4601) * fix tests * clean * clean * fix bugs * add * fix llama non-vllm kernels bug * modify * clean codes * bug fix: remove rotary_positions_ids --------- Co-authored-by: cuiqing.li <lixx3527@gmail.com> * [kernel] Add triton layer norm & replace norm for bloom (#4609) * add layernorm for inference * add test for layernorm kernel * add bloom layernorm replacement policy * trivial: path * [Infer] Bug fix rotary embedding in llama (#4608) * fix rotary embedding * delete print * fix init seq len bug * rename pytest * add benchmark for llama * refactor codes * delete useless code * [bench] Add bloom inference benchmark (#4621) * add bloom benchmark * readme - update benchmark res * trivial - uncomment for testing (#4622) * [Infer] add check triton and cuda version for tests (#4627) * fix rotary embedding * delete print * fix init seq len bug * rename pytest * add benchmark for llama * refactor codes * delete useless code * add check triton and cuda * Update sharder.py (#4629) * [Inference] Hot fix some bugs and typos (#4632) * fix * fix test * fix conflicts * [typo]Comments fix (#4633) * fallback * fix commnets * bug fix: fix some bugs in test_llama and test_bloom (#4635) * [Infer] delete benchmark in tests and fix bug for llama and bloom (#4636) * fix rotary embedding * delete print * fix init seq len bug * rename pytest * add benchmark for llama * refactor codes * delete useless code * add check triton and cuda * delete benchmark and fix infer bugs * delete benchmark for tests * delete useless code * delete bechmark function in utils * [Fix] Revise TPInferEngine, inference tests and benchmarks (#4642) * [Fix] revise TPInferEngine methods and inference tests * fix llama/bloom infer benchmarks * fix infer tests * trivial fix: benchmakrs * trivial * trivial: rm print * modify utils filename for infer ops test (#4657) * [Infer] Fix TPInferEngine init & inference tests, benchmarks (#4670) * fix engine funcs * TPInferEngine: receive shard config in init * benchmarks: revise TPInferEngine init * benchmarks: remove pytest decorator * trivial fix * use small model for tests * [NFC] use args for infer benchmarks (#4674) * revise infer default (#4683) * [Fix] optimize/shard model in TPInferEngine init (#4684) * remove using orig model in engine * revise inference tests * trivial: rename --------- Co-authored-by: Jianghai <72591262+CjhHa1@users.noreply.github.com> Co-authored-by: Xu Kai <xukai16@foxmail.com> Co-authored-by: Yuanheng Zhao <54058983+yuanheng-zhao@users.noreply.github.com> Co-authored-by: yuehuayingxueluo <867460659@qq.com> Co-authored-by: yuanheng-zhao <jonathan.zhaoyh@gmail.com> Co-authored-by: CjhHa1 <cjh18671720497@outlook.com>	2023-09-12 01:22:56 +08:00
flybird11111	eedaa3e1ef	[shardformer]fix gpt2 double head (#4663 ) * [shardformer]fix gpt2 test [shardformer]fix gpt2 test [shardformer]fix gpt2 test * fix * [shardformer] add todo * [shardformer] add todo	2023-09-11 18:35:03 +08:00
Hongxin Liu	554aa9592e	[legacy] move communication and nn to legacy and refactor logger (#4671 ) * [legacy] move communication to legacy (#4640) * [legacy] refactor logger and clean up legacy codes (#4654) * [legacy] make logger independent to gpc * [legacy] make optim independent to registry * [legacy] move test engine to legacy * [legacy] move nn to legacy (#4656) * [legacy] move nn to legacy * [checkpointio] fix save hf config * [test] remove useledd rpc pp test * [legacy] fix nn init * [example] skip tutorial hybriad parallel example * [devops] test doc check * [devops] test doc check	2023-09-11 16:24:28 +08:00
Hongxin Liu	536397cc95	[devops] fix concurrency group (#4667 )	2023-09-11 15:32:50 +08:00
flybird11111	7486ed7d3a	[shardformer] update llama2/opt finetune example and fix llama2 policy (#4645 ) * [shardformer] update shardformer readme [shardformer] update shardformer readme [shardformer] update shardformer readme * [shardformer] update llama2/opt finetune example and shardformer update to llama2 * [shardformer] update llama2/opt finetune example and shardformer update to llama2 * [shardformer] update llama2/opt finetune example and shardformer update to llama2 * [shardformer] change dataset * [shardformer] change dataset * [shardformer] fix CI * [shardformer] fix * [shardformer] fix * [shardformer] fix * [shardformer] fix * [shardformer] fix [example] update opt example [example] resolve comments fix fix	2023-09-09 22:45:36 +08:00
Hongxin Liu	a686f9ddc8	[devops] fix concurrency group and compatibility test (#4665 ) * [devops] fix concurrency group * [devops] fix compatibility test * [devops] fix tensornvme install * [devops] fix tensornvme install * [devops] fix colossalai install	2023-09-08 13:49:40 +08:00
Baizhou Zhang	295b38fecf	[example] update vit example for hybrid parallel plugin (#4641 ) * update vit example for hybrid plugin * reset tp/pp size * fix dataloader iteration bug * update optimizer passing in evaluation/add grad_accum * change criterion * wrap tqdm * change grad_accum to grad_checkpoint * fix pbar	2023-09-07 17:38:45 +08:00
Baizhou Zhang	660eed9124	[pipeline] set optimizer to optional in execute_pipeline (#4630 ) * set optimizer to optional in execute_pipeline * arrange device and mixed precision in booster init * fix execute_pipeline in booster.py	2023-09-07 10:42:59 +08:00
eric8607242	c3d5fa3bac	[shardformer] Support customized policy for llamav2 based model with HybridParallelPlugin (#4624 ) * Enable policy assignment in HybridPlugin and enable llama policy for llamav2 * Remove Policy from Plugin * revert changes of plugin HybridParallelModule * revert changes in plugin * upgrade transformers * revert transformers version --------- Co-authored-by: flybird11111 <1829166702@qq.com>	2023-09-07 10:15:13 +08:00
Hongxin Liu	9709b8f502	[release] update version (#4623 )	2023-09-06 23:41:04 +08:00

1 2 3 4 5 ...

2783 Commits (946ab56c486480875020252ad65f9a4618fb9a16) All Branches Search

2783 Commits (946ab56c486480875020252ad65f9a4618fb9a16)

All Branches