ColossalAI

Commit Graph

Author	SHA1	Message	Date
Baizhou Zhang	c040d70aa0	[hotfix] fix the bug of repeatedly storing param group (#4951 )	1 year ago
littsk	be82b5d4ca	[hotfix] Fix the bug where process groups were not being properly released. (#4940 ) * Fix the bug where process groups were not being properly released. * test * Revert "test" This reverts commit `479900c139`.	1 year ago
Cuiqing Li (李崔卿)	4f0234f236	[doc]Update doc for colossal-inference (#4989 ) * update doc * Update README.md --------- Co-authored-by: cuiqing.li <lixx336@gmail.com>	1 year ago
Cuiqing Li	459a88c806	[Kernels]Updated Triton kernels into 2.1.0 and adding flash-decoding for llama token attention (#4965 ) * adding flash-decoding * clean * adding kernel * adding flash-decoding * add integration * add * adding kernel * adding kernel * adding triton 2.1.0 features for inference * update bloom triton kernel * remove useless vllm kernels * clean codes * fix * adding files * fix readme * update llama flash-decoding --------- Co-authored-by: cuiqing.li <lixx336@gmail.com>	1 year ago
Jianghai	cf579ff46d	[Inference] Dynamic Batching Inference, online and offline (#4953 ) * [inference] Dynamic Batching for Single and Multiple GPUs (#4831) * finish batch manager * 1 * first * fix * fix dynamic batching * llama infer * finish test * support different lengths generating * del prints * del prints * fix * fix bug --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com> * [inference] Async dynamic batching (#4894) * finish input and output logic * add generate * test forward * 1 * [inference]Re push async dynamic batching (#4901) * adapt to ray server * finish async * finish test * del test --------- Co-authored-by: yuehuayingxueluo <867460659@qq.com> * Revert "[inference]Re push async dynamic batching (#4901)" (#4905) This reverts commit `fbf3c09e67`. * Revert "[inference] Async dynamic batching (#4894)" This reverts commit `fced140250`. * Revert "[inference] Async dynamic batching (#4894)" (#4909) This reverts commit `fced140250`. * Add Ray Distributed Environment Init Scripts * support DynamicBatchManager base function * revert _set_tokenizer version * add driver async generate * add async test * fix bugs in test_ray_dist.py * add get_tokenizer.py * fix code style * fix bugs about No module named 'pydantic' in ci test * fix bugs in ci test * fix bugs in ci test * fix bugs in ci test * [infer]Add Ray Distributed Environment Init Scripts (#4911) * Revert "[inference] Async dynamic batching (#4894)" This reverts commit `fced140250`. * Add Ray Distributed Environment Init Scripts * support DynamicBatchManager base function * revert _set_tokenizer version * add driver async generate * add async test * fix bugs in test_ray_dist.py * add get_tokenizer.py * fix code style * fix bugs about No module named 'pydantic' in ci test * fix bugs in ci test * fix bugs in ci test * fix bugs in ci test * support dynamic batch for bloom model and is_running function * [Inference]Test for new Async engine (#4935) * infer engine * infer engine * test engine * test engine * new manager * change step * add * test * fix * fix * finish test * finish test * finish test * finish test * add license --------- Co-authored-by: yuehuayingxueluo <867460659@qq.com> * add assertion for config (#4947) * [Inference] Finish dynamic batching offline test (#4948) * test * fix test * fix quant * add default * fix * fix some bugs * fix some bugs * fix * fix bug * fix bugs * reset param --------- Co-authored-by: yuehuayingxueluo <867460659@qq.com> Co-authored-by: Cuiqing Li <lixx3527@gmail.com> Co-authored-by: CjhHa1 <cjh18671720497outlook.com>	1 year ago
Bin Jia	1db6727678	[Pipeline inference] Combine kvcache with pipeline inference (#4938 ) * merge kvcache with pipeline inference and refactor the code structure * support ppsize > 2 * refactor pipeline code * do pre-commit * modify benchmark * fix bench mark * polish code * add docstring and update readme * refactor the code * fix some logic bug of ppinfer * polish readme * fix typo * skip infer test	1 year ago
Xu Kai	785802e809	[inference] add reference and fix some bugs (#4937 ) * add reference and fix some bugs * update gptq init --------- Co-authored-by: Xu Kai <xukai16@foxamil.com>	1 year ago
Hongxin Liu	b8e770c832	[test] merge old components to test to model zoo (#4945 ) * [test] add custom models in model zoo * [test] update legacy test * [test] update model zoo * [test] update gemini test * [test] remove components to test	1 year ago
Cuiqing Li	3a41e8304e	[Refactor] Integrated some lightllm kernels into token-attention (#4946 ) * add some req for inference * clean codes * add codes * add some lightllm deps * clean codes * hello * delete rms files * add some comments * add comments * add doc * add lightllm deps * add lightllm cahtglm2 kernels * add lightllm cahtglm2 kernels * replace rotary embedding with lightllm kernel * add some commnets * add some comments * add some comments * add * replace fwd kernel att1 * fix a arg * add * add * fix token attention * add some comments * clean codes * modify comments * fix readme * fix bug * fix bug --------- Co-authored-by: cuiqing.li <lixx336@gmail.com> Co-authored-by: CjhHa1 <cjh18671720497@outlook.com>	1 year ago
digger yu	11009103be	[nfc] fix some typo with colossalai/ docs/ etc. (#4920 )	1 year ago
github-actions[bot]	486d06a2d5	[format] applied code formatting on changed files in pull request 4820 (#4886 ) Co-authored-by: github-actions <github-actions@github.com>	1 year ago
Zhongkai Zhao	c7aa319ba0	[test] add no master test for low level zero plugin (#4934 )	1 year ago
Hongxin Liu	1f5d2e8062	[hotfix] fix torch 2.0 compatibility (#4936 ) * [hotfix] fix launch * [test] fix test gemini optim * [shardformer] fix vit	1 year ago
Baizhou Zhang	21ba89cab6	[gemini] support gradient accumulation (#4869 ) * add test * fix no_sync bug in low level zero plugin * fix test * add argument for grad accum * add grad accum in backward hook for gemini * finish implementation, rewrite tests * fix test * skip stuck model in low level zero test * update doc * optimize communication & fix gradient checkpoint * modify doc * cleaning codes * update cpu adam fp16 case	1 year ago
Hongxin Liu	4f68b3f10c	[kernel] support pure fp16 for cpu adam and update gemini optim tests (#4921 ) * [kernel] support pure fp16 for cpu adam (#4896) * [kernel] fix cpu adam kernel for pure fp16 and update tests (#4919) * [kernel] fix cpu adam * [test] update gemini optim test	1 year ago
Xu Kai	611a5a80ca	[inference] Add smmoothquant for llama (#4904 ) * [inference] add int8 rotary embedding kernel for smoothquant (#4843) * [inference] add smoothquant llama attention (#4850) * add smoothquant llama attention * remove uselss code * remove useless code * fix import error * rename file name * [inference] add silu linear fusion for smoothquant llama mlp (#4853) * add silu linear * update skip condition * catch smoothquant cuda lib exception * prcocess exception for tests * [inference] add llama mlp for smoothquant (#4854) * add llama mlp for smoothquant * fix down out scale * remove duplicate lines * add llama mlp check * delete useless code * [inference] add smoothquant llama (#4861) * add smoothquant llama * fix attention accuracy * fix accuracy * add kv cache and save pretrained * refactor example * delete smooth * refactor code * [inference] add smooth function and delete useless code for smoothquant (#4895) * add smooth function and delete useless code * update datasets * remove duplicate import * delete useless file * refactor codes (#4902) * rafactor code * add license * add torch-int and smoothquant license	1 year ago
Zhongkai Zhao	a0684e7bd6	[feature] support no master weights option for low level zero plugin (#4816 ) * [feature] support no master weights for low level zero plugin * [feature] support no master weights for low level zero plugin, remove data copy when no master weights * remove data copy and typecasting when no master weights * not load weights to cpu when using no master weights * fix grad: use fp16 grad when no master weights * only do not update working param when no master weights * fix: only do not update working param when no master weights * fix: passing params in dict format in hybrid plugin * fix: remove extra params (tp_process_group) in hybrid_parallel_plugin	1 year ago
Xu Kai	77a9328304	[inference] add llama2 support (#4898 ) * add llama2 support * fix multi group bug	1 year ago
Baizhou Zhang	39f2582e98	[hotfix] fix lr scheduler bug in torch 2.0 (#4864 )	1 year ago
littsk	83b52c56cd	[feature] Add clip_grad_norm for hybrid_parallel_plugin (#4837 ) * Add clip_grad_norm for hibrid_parallel_plugin * polish code * add unittests * Move tp to a higher-level optimizer interface. * bug fix * polish code	1 year ago
Hongxin Liu	df63564184	[gemini] support amp o3 for gemini (#4872 ) * [gemini] support no reuse fp16 chunk * [gemini] support no master weight for optim * [gemini] support no master weight for gemini ddp * [test] update gemini tests * [test] update gemini tests * [plugin] update gemini plugin * [test] fix gemini checkpointio test * [test] fix gemini checkpoint io	1 year ago
ppt0011	1dcaf249bd	[doc] add reminder for issue encountered with hybrid adam	1 year ago
Bin Jia	08a9f76b2f	[Pipeline Inference] Sync pipeline inference branch to main (#4820 ) * [pipeline inference] pipeline inference (#4492) * add pp stage manager as circle stage * fix a bug when create process group * add ppinfer basic framework * add micro batch manager and support kvcache-pp gpt2 fwd * add generate schedule * use mb size to control mb number * support generate with kv cache * add output, remove unused code * add test * reuse shardformer to build model * refactor some code and use the same attribute name of hf * fix review and add test for generation * remove unused file * fix CI * add cache clear * fix code error * fix typo * [Pipeline inference] Modify to tieweight (#4599) * add pp stage manager as circle stage * fix a bug when create process group * add ppinfer basic framework * add micro batch manager and support kvcache-pp gpt2 fwd * add generate schedule * use mb size to control mb number * support generate with kv cache * add output, remove unused code * add test * reuse shardformer to build model * refactor some code and use the same attribute name of hf * fix review and add test for generation * remove unused file * modify the way of saving newtokens * modify to tieweight * modify test * remove unused file * solve review * add docstring * [Pipeline inference] support llama pipeline inference (#4647) * support llama pipeline inference * remove tie weight operation * [pipeline inference] Fix the blocking of communication when ppsize is 2 (#4708) * add benchmark verbose * fix export tokens * fix benchmark verbose * add P2POp style to do p2p communication * modify schedule as p2p type when ppsize is 2 * remove unused code and add docstring * [Pipeline inference] Refactor code, add docsting, fix bug (#4790) * add benchmark script * update argparse * fix fp16 load * refactor code style * add docstring * polish code * fix test bug * [Pipeline inference] Add pipeline inference docs (#4817) * add readme doc * add a ico * Add performance * update table of contents * refactor code (#4873)	1 year ago
Camille Zhong	cd6a962e66	[NFC] polish code style (#4799 )	1 year ago
Michelle	07ed155e86	[NFC] polish colossalai/inference/quant/gptq/cai_gptq/__init__.py code style (#4792 )	1 year ago
littsk	eef96e0877	polish code for gptq (#4793 )	1 year ago
Hongxin Liu	cb3a25a062	[checkpointio] hotfix torch 2.0 compatibility (#4824 )	1 year ago
shaoyuw	c97a3523db	fix: typo in comment of low_level_zero plugin	1 year ago
Xu Kai	d1fcc0fa4d	[infer] fix test bug (#4838 ) * fix test bug * delete useless code * fix typo	1 year ago
Jianghai	013a4bedf0	[inference]fix import bug and delete down useless init (#4830 ) * fix import bug and release useless init * fix * fix * fix	1 year ago
Xu Kai	c3bef20478	add autotune (#4822 )	1 year ago
binmakeswell	822051d888	[doc] update slack link (#4823 )	1 year ago
littsk	11f1e426fe	[hotfix] Correct several erroneous code comments (#4794 )	1 year ago
littsk	54b3ad8924	[hotfix] fix norm type error in zero optimizer (#4795 )	1 year ago
Hongxin Liu	da15fdb9ca	[doc] add lazy init docs (#4808 )	1 year ago
Yan haixu	a22706337a	[misc] add last_epoch in CosineAnnealingWarmupLR (#4778 )	1 year ago
Hongxin Liu	4965c0dabd	[lazy] support from_pretrained (#4801 ) * [lazy] patch from pretrained * [lazy] fix from pretrained and add tests * [devops] update ci	1 year ago
Baizhou Zhang	64a08b2dc3	[checkpointio] support unsharded checkpointIO for hybrid parallel (#4774 ) * support unsharded saving/loading for model * support optimizer unsharded saving * update doc * support unsharded loading for optimizer * small fix	1 year ago
Baizhou Zhang	a2db75546d	[doc] polish shardformer doc (#4779 ) * fix example format in docstring * polish shardformer doc	1 year ago
Jianghai	ce7ade3882	[inference] chatglm2 infer demo (#4724 ) * add chatglm2 * add * gather needed kernels * fix some bugs * finish context forward * finish context stage * fix * add * pause * add * fix bugs * finish chatglm * fix bug * change some logic * fix bugs * change some logics * add * add * add * fix * fix tests * fix	1 year ago
Xu Kai	946ab56c48	[feature] add gptq for inference (#4754 ) * [gptq] add gptq kernel (#4416) * add gptq * refactor code * fix tests * replace auto-gptq * rname inferance/quant * refactor test * add auto-gptq as an option * reset requirements * change assert and check auto-gptq * add import warnings * change test flash attn version * remove example * change requirements of flash_attn * modify tests * [skip ci] change requirements-test * [gptq] faster gptq cuda kernel (#4494) * [skip ci] add cuda kernels * add license * [skip ci] fix max_input_len * format files & change test size * [skip ci] * [gptq] add gptq tensor parallel (#4538) * add gptq tensor parallel * add gptq tp * delete print * add test gptq check * add test auto gptq check * [gptq] combine gptq and kv cache manager (#4706) * combine gptq and kv cache manager * add init bits * delete useless code * add model path * delete usless print and update test * delete usless import * move option gptq to shard config * change replace linear to shardformer * update bloom policy * delete useless code * fix import bug and delete uselss code * change colossalai/gptq to colossalai/quant/gptq * update import linear for tests * delete useless code and mv gptq_kernel to kernel directory * fix triton kernel * add triton import	1 year ago
littsk	1e0e080837	[bug] Fix the version check bug in colossalai run when generating the cmd. (#4713 ) * Fix the version check bug in colossalai run when generating the cmd. * polish code	1 year ago
Hongxin Liu	3e05c07bb8	[lazy] support torch 2.0 (#4763 ) * [lazy] support _like methods and clamp * [lazy] pass transformers models * [lazy] fix device move and requires grad * [lazy] fix requires grad and refactor api * [lazy] fix requires grad	1 year ago
Baizhou Zhang	df66741f77	[bug] fix get_default_parser in examples (#4764 )	1 year ago
Baizhou Zhang	c0a033700c	[shardformer] fix master param sync for hybrid plugin/rewrite unwrapping logic (#4758 ) * fix master param sync for hybrid plugin * rewrite unwrap for ddp/fsdp * rewrite unwrap for zero/gemini * rewrite unwrap for hybrid plugin * fix geemini unwrap * fix bugs	1 year ago
Hongxin Liu	079bf3cb26	[misc] update pre-commit and run all files (#4752 ) * [misc] update pre-commit * [misc] run pre-commit * [misc] remove useless configuration files * [misc] ignore cuda for clang-format	1 year ago
Hongxin Liu	b5f9e37c70	[legacy] clean up legacy code (#4743 ) * [legacy] remove outdated codes of pipeline (#4692) * [legacy] remove cli of benchmark and update optim (#4690) * [legacy] remove cli of benchmark and update optim * [doc] fix cli doc test * [legacy] fix engine clip grad norm * [legacy] remove outdated colo tensor (#4694) * [legacy] remove outdated colo tensor * [test] fix test import * [legacy] move outdated zero to legacy (#4696) * [legacy] clean up utils (#4700) * [legacy] clean up utils * [example] update examples * [legacy] clean up amp * [legacy] fix amp module * [legacy] clean up gpc (#4742) * [legacy] clean up context * [legacy] clean core, constants and global vars * [legacy] refactor initialize * [example] fix examples ci * [example] fix examples ci * [legacy] fix tests * [example] fix gpt example * [example] fix examples ci * [devops] fix ci installation * [example] fix examples ci	1 year ago
Xuanlei Zhao	32e7f99416	[kernel] update triton init #4740 (#4740 )	1 year ago
flybird11111	4c4482f3ad	[example] llama2 add fine-tune example (#4673 ) * [shardformer] update shardformer readme [shardformer] update shardformer readme [shardformer] update shardformer readme * [shardformer] update llama2/opt finetune example and shardformer update to llama2 * [shardformer] update llama2/opt finetune example and shardformer update to llama2 * [shardformer] update llama2/opt finetune example and shardformer update to llama2 * [shardformer] change dataset * [shardformer] change dataset * [shardformer] fix CI * [shardformer] fix * [shardformer] fix * [shardformer] fix * [shardformer] fix * [shardformer] fix [example] update opt example [example] resolve comments fix fix * [example] llama2 add finetune example * [example] llama2 add finetune example * [example] llama2 add finetune example * [example] llama2 add finetune example * fix * update llama2 example * update llama2 example * fix * update llama2 example * update llama2 example * update llama2 example * update llama2 example * update llama2 example * update llama2 example * Update requirements.txt * update llama2 example * update llama2 example * update llama2 example	1 year ago
Xuanlei Zhao	ac2797996b	[shardformer] add custom policy in hybrid parallel plugin (#4718 ) * add custom policy * update assert	1 year ago
Baizhou Zhang	f911d5b09d	[doc] Add user document for Shardformer (#4702 ) * create shardformer doc files * add docstring for seq-parallel * update ShardConfig docstring * add links to llama example * add outdated massage * finish introduction & supporting information * finish 'how shardformer works' * finish shardformer.md English doc * fix doctest fail * add Chinese document	1 year ago
flybird11111	20190b49a5	[shardformer] to fix whisper test failed due to significant accuracy differences. (#4710 ) * [shardformer] fix whisper test failed * [shardformer] fix whisper test failed * [shardformer] fix whisper test failed * [shardformer] fix whisper test failed	1 year ago
Yuanheng Zhao	e2c0e7f92a	[hotfix] Fix import error: colossal.kernel without triton installed (#4722 ) * [hotfix] remove triton kernels from kernel init * revise bloom/llama kernel imports for infer	1 year ago
flybird11111	c7d6975d29	[shardformer] fix GPT2DoubleHeadsModel (#4703 )	1 year ago
digger yu	9c2feb2f0b	fix some typo with colossalai/device colossalai/tensor/ etc. (#4171 ) Co-authored-by: flybird11111 <1829166702@qq.com>	1 year ago
Baizhou Zhang	d8ceeac14e	[hotfix] fix typo in hybrid parallel io (#4697 )	1 year ago
flybird11111	8844691f4b	[shardformer] update shardformer readme (#4689 ) * [shardformer] update shardformer readme * [shardformer] update shardformer readme * [shardformer] update shardformer readme * [shardformer] update shardformer readme * [shardformer] update shardformer readme	1 year ago
Baizhou Zhang	1d454733c4	[doc] Update booster user documents. (#4669 ) * update booster_api.md * update booster_checkpoint.md * update booster_plugins.md * move transformers importing inside function * fix Dict typing * fix autodoc bug * small fix	1 year ago
Cuiqing Li	bce0f16702	[Feature] The first PR to Add TP inference engine, kv-cache manager and related kernels for our inference system (#4577 ) * [infer] Infer/llama demo (#4503) * add * add infer example * finish * finish * stash * fix * [Kernels] add inference token attention kernel (#4505) * add token forward * fix tests * fix comments * add try import triton * add adapted license * add tests check * [Kernels] add necessary kernels (llama & bloom) for attention forward and kv-cache manager (#4485) * added _vllm_rms_norm * change place * added tests * added tests * modify * adding kernels * added tests: * adding kernels * modify * added * updating kernels * adding tests * added tests * kernel change * submit * modify * added * edit comments * change name * change commnets and fix import * add * added * combine codes (#4509) * [feature] add KV cache manager for llama & bloom inference (#4495) * add kv cache memory manager * add stateinfo during inference * format * format * rename file * add kv cache test * revise on BatchInferState * file dir change * [Bug FIx] import llama context ops fix (#4524) * added _vllm_rms_norm * change place * added tests * added tests * modify * adding kernels * added tests: * adding kernels * modify * added * updating kernels * adding tests * added tests * kernel change * submit * modify * added * edit comments * change name * change commnets and fix import * add * added * fix * add ops into init.py * add * [Infer] Add TPInferEngine and fix file path (#4532) * add engine for TP inference * move file path * update path * fix TPInferEngine * remove unused file * add engine test demo * revise TPInferEngine * fix TPInferEngine, add test * fix * Add Inference test for llama (#4508) * add kv cache memory manager * add stateinfo during inference * add * add infer example * finish * finish * format * format * rename file * add kv cache test * revise on BatchInferState * add inference test for llama * fix conflict * feature: add some new features for llama engine * adapt colossalai triton interface * Change the parent class of llama policy * add nvtx * move llama inference code to tensor_parallel * fix __init__.py * rm tensor_parallel * fix: fix bugs in auto_policy.py * fix:rm some unused codes * mv colossalai/tpinference to colossalai/inference/tensor_parallel * change __init__.py * save change * fix engine * Bug fix: Fix hang * remove llama_infer_engine.py --------- Co-authored-by: yuanheng-zhao <jonathan.zhaoyh@gmail.com> Co-authored-by: CjhHa1 <cjh18671720497@outlook.com> * [infer] Add Bloom inference policy and replaced methods (#4512) * add bloom inference methods and policy * enable pass BatchInferState from model forward * revise bloom infer layers/policies * add engine for inference (draft) * add test for bloom infer * fix bloom infer policy and flow * revise bloom test * fix bloom file path * remove unused codes * fix bloom modeling * fix dir typo * fix trivial * fix policy * clean pr * trivial fix * Revert "[infer] Add Bloom inference policy and replaced methods (#4512)" (#4552) This reverts commit `17cfa57140`. * [Doc] Add colossal inference doc (#4549) * create readme * add readme.md * fix typos * [infer] Add Bloom inference policy and replaced methods (#4553) * add bloom inference methods and policy * enable pass BatchInferState from model forward * revise bloom infer layers/policies * add engine for inference (draft) * add test for bloom infer * fix bloom infer policy and flow * revise bloom test * fix bloom file path * remove unused codes * fix bloom modeling * fix dir typo * fix trivial * fix policy * clean pr * trivial fix * trivial * Fix Bugs In Llama Model Forward (#4550) * add kv cache memory manager * add stateinfo during inference * add * add infer example * finish * finish * format * format * rename file * add kv cache test * revise on BatchInferState * add inference test for llama * fix conflict * feature: add some new features for llama engine * adapt colossalai triton interface * Change the parent class of llama policy * add nvtx * move llama inference code to tensor_parallel * fix __init__.py * rm tensor_parallel * fix: fix bugs in auto_policy.py * fix:rm some unused codes * mv colossalai/tpinference to colossalai/inference/tensor_parallel * change __init__.py * save change * fix engine * Bug fix: Fix hang * remove llama_infer_engine.py * bug fix: fix bugs about infer_state.is_context_stage * remove pollcies * fix: delete unused code * fix: delete unused code * remove unused coda * fix conflict --------- Co-authored-by: yuanheng-zhao <jonathan.zhaoyh@gmail.com> Co-authored-by: CjhHa1 <cjh18671720497@outlook.com> * [doc] add colossal inference fig (#4554) * create readme * add readme.md * fix typos * upload fig * [NFC] fix docstring for colossal inference (#4555) Fix docstring and comments in kv cache manager and bloom modeling * fix docstring in llama modeling (#4557) * [Infer] check import vllm (#4559) * change import vllm * import apply_rotary_pos_emb * change import location * [DOC] add installation req (#4561) * add installation req * fix * slight change * remove empty * [Feature] rms-norm transfer into inference llama.py (#4563) * add installation req * fix * slight change * remove empty * add rmsnorm polciy * add * clean codes * [infer] Fix tp inference engine (#4564) * fix engine prepare data * add engine test * use bloom for testing * revise on test * revise on test * reset shardformer llama (#4569) * [infer] Fix engine - tensors on different devices (#4570) * fix diff device in engine * [codefactor] Feature/colossal inference (#4579) * code factors * remove * change coding (#4581) * [doc] complete README of colossal inference (#4585) * complete fig * Update README.md * [doc]update readme (#4586) * update readme * Update README.md * bug fix: fix bus in llama and bloom (#4588) * [BUG FIX]Fix test engine in CI and non-vllm kernels llama forward (#4592) * fix tests * clean * clean * fix bugs * add * fix llama non-vllm kernels bug * modify * clean codes * [Kernel]Rmsnorm fix (#4598) * fix tests * clean * clean * fix bugs * add * fix llama non-vllm kernels bug * modify * clean codes * add triton rmsnorm * delete vllm kernel flag * [Bug Fix]Fix bugs in llama (#4601) * fix tests * clean * clean * fix bugs * add * fix llama non-vllm kernels bug * modify * clean codes * bug fix: remove rotary_positions_ids --------- Co-authored-by: cuiqing.li <lixx3527@gmail.com> * [kernel] Add triton layer norm & replace norm for bloom (#4609) * add layernorm for inference * add test for layernorm kernel * add bloom layernorm replacement policy * trivial: path * [Infer] Bug fix rotary embedding in llama (#4608) * fix rotary embedding * delete print * fix init seq len bug * rename pytest * add benchmark for llama * refactor codes * delete useless code * [bench] Add bloom inference benchmark (#4621) * add bloom benchmark * readme - update benchmark res * trivial - uncomment for testing (#4622) * [Infer] add check triton and cuda version for tests (#4627) * fix rotary embedding * delete print * fix init seq len bug * rename pytest * add benchmark for llama * refactor codes * delete useless code * add check triton and cuda * Update sharder.py (#4629) * [Inference] Hot fix some bugs and typos (#4632) * fix * fix test * fix conflicts * [typo]Comments fix (#4633) * fallback * fix commnets * bug fix: fix some bugs in test_llama and test_bloom (#4635) * [Infer] delete benchmark in tests and fix bug for llama and bloom (#4636) * fix rotary embedding * delete print * fix init seq len bug * rename pytest * add benchmark for llama * refactor codes * delete useless code * add check triton and cuda * delete benchmark and fix infer bugs * delete benchmark for tests * delete useless code * delete bechmark function in utils * [Fix] Revise TPInferEngine, inference tests and benchmarks (#4642) * [Fix] revise TPInferEngine methods and inference tests * fix llama/bloom infer benchmarks * fix infer tests * trivial fix: benchmakrs * trivial * trivial: rm print * modify utils filename for infer ops test (#4657) * [Infer] Fix TPInferEngine init & inference tests, benchmarks (#4670) * fix engine funcs * TPInferEngine: receive shard config in init * benchmarks: revise TPInferEngine init * benchmarks: remove pytest decorator * trivial fix * use small model for tests * [NFC] use args for infer benchmarks (#4674) * revise infer default (#4683) * [Fix] optimize/shard model in TPInferEngine init (#4684) * remove using orig model in engine * revise inference tests * trivial: rename --------- Co-authored-by: Jianghai <72591262+CjhHa1@users.noreply.github.com> Co-authored-by: Xu Kai <xukai16@foxmail.com> Co-authored-by: Yuanheng Zhao <54058983+yuanheng-zhao@users.noreply.github.com> Co-authored-by: yuehuayingxueluo <867460659@qq.com> Co-authored-by: yuanheng-zhao <jonathan.zhaoyh@gmail.com> Co-authored-by: CjhHa1 <cjh18671720497@outlook.com>	1 year ago
flybird11111	eedaa3e1ef	[shardformer]fix gpt2 double head (#4663 ) * [shardformer]fix gpt2 test [shardformer]fix gpt2 test [shardformer]fix gpt2 test * fix * [shardformer] add todo * [shardformer] add todo	1 year ago
Hongxin Liu	554aa9592e	[legacy] move communication and nn to legacy and refactor logger (#4671 ) * [legacy] move communication to legacy (#4640) * [legacy] refactor logger and clean up legacy codes (#4654) * [legacy] make logger independent to gpc * [legacy] make optim independent to registry * [legacy] move test engine to legacy * [legacy] move nn to legacy (#4656) * [legacy] move nn to legacy * [checkpointio] fix save hf config * [test] remove useledd rpc pp test * [legacy] fix nn init * [example] skip tutorial hybriad parallel example * [devops] test doc check * [devops] test doc check	1 year ago
flybird11111	7486ed7d3a	[shardformer] update llama2/opt finetune example and fix llama2 policy (#4645 ) * [shardformer] update shardformer readme [shardformer] update shardformer readme [shardformer] update shardformer readme * [shardformer] update llama2/opt finetune example and shardformer update to llama2 * [shardformer] update llama2/opt finetune example and shardformer update to llama2 * [shardformer] update llama2/opt finetune example and shardformer update to llama2 * [shardformer] change dataset * [shardformer] change dataset * [shardformer] fix CI * [shardformer] fix * [shardformer] fix * [shardformer] fix * [shardformer] fix * [shardformer] fix [example] update opt example [example] resolve comments fix fix	1 year ago
Baizhou Zhang	295b38fecf	[example] update vit example for hybrid parallel plugin (#4641 ) * update vit example for hybrid plugin * reset tp/pp size * fix dataloader iteration bug * update optimizer passing in evaluation/add grad_accum * change criterion * wrap tqdm * change grad_accum to grad_checkpoint * fix pbar	1 year ago
Baizhou Zhang	660eed9124	[pipeline] set optimizer to optional in execute_pipeline (#4630 ) * set optimizer to optional in execute_pipeline * arrange device and mixed precision in booster init * fix execute_pipeline in booster.py	1 year ago
eric8607242	c3d5fa3bac	[shardformer] Support customized policy for llamav2 based model with HybridParallelPlugin (#4624 ) * Enable policy assignment in HybridPlugin and enable llama policy for llamav2 * Remove Policy from Plugin * revert changes of plugin HybridParallelModule * revert changes in plugin * upgrade transformers * revert transformers version --------- Co-authored-by: flybird11111 <1829166702@qq.com>	1 year ago
Hongxin Liu	fae6c92ead	Merge branch 'main' into feature/shardformer	1 year ago
Hongxin Liu	ac178ca5c1	[legacy] move builder and registry to legacy (#4603 )	1 year ago
Hongxin Liu	8accecd55b	[legacy] move engine to legacy (#4560 ) * [legacy] move engine to legacy * [example] fix seq parallel example * [example] fix seq parallel example * [test] test gemini pluging hang * [test] test gemini pluging hang * [test] test gemini pluging hang * [test] test gemini pluging hang * [test] test gemini pluging hang * [example] update seq parallel requirements	1 year ago
Hongxin Liu	89fe027787	[legacy] move trainer to legacy (#4545 ) * [legacy] move trainer to legacy * [doc] update docs related to trainer * [test] ignore legacy test	1 year ago
Hongxin Liu	807e01a4ba	[zero] hotfix master param sync (#4618 ) * [zero] add method to update master params * [zero] update zero plugin * [plugin] update low level zero plugin	1 year ago
flybird11111	ec0866804c	[shardformer] update shardformer readme (#4617 ) [shardformer] update shardformer readme [shardformer] update shardformer readme	1 year ago
Bin Jia	86d22581e4	[shardformer] Add overlap optional for HybridParallelPlugin (#4615 ) * add optional overlap for plugin * remove fixed todo	1 year ago
Hongxin Liu	a39a5c66fe	Merge branch 'main' into feature/shardformer	1 year ago
Baizhou Zhang	e79b1e80e2	[checkpointio] support huggingface from_pretrained for all plugins (#4606 )	1 year ago
flybird11111	0a94fcd351	[shardformer] update bert finetune example with HybridParallelPlugin (#4584 ) * [shardformer] fix opt test hanging * fix * test * test * test * fix test * fix test * remove print * add fix * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] fix epoch change * [shardformer] broadcast add pp group * [shardformer] fix opt test hanging * fix * test * test * [shardformer] zero1+pp and the corresponding tests (#4517) * pause * finish pp+zero1 * Update test_shard_vit.py * [shardformer/fix overlap bug] fix overlap bug, add overlap as an option in shardco… (#4516) * fix overlap bug and support bert, add overlap as an option in shardconfig * support overlap for chatglm and bloom * [shardformer] fix emerged bugs after updating transformers (#4526) * test * fix test * fix test * remove print * add fix * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] Add overlap support for gpt2 (#4535) * add overlap support for gpt2 * remove unused code * remove unused code * [shardformer] support pp+tp+zero1 tests (#4531) * [shardformer] fix opt test hanging * fix * test * test * test * fix test * fix test * remove print * add fix * [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 * [shardformer] pp+tp+zero1 * [shardformer] pp+tp+zero1 * [shardformer] pp+tp+zero1 * [shardformer] pp+tp+zero1 * [shardformer] fix submodule replacement bug when enabling pp (#4544) * [shardformer] support sharded optimizer checkpointIO of HybridParallelPlugin (#4540) * implement sharded optimizer saving * add more param info * finish implementation of sharded optimizer saving * fix bugs in optimizer sharded saving * add pp+zero test * param group loading * greedy loading of optimizer * fix bug when loading * implement optimizer sharded saving * add optimizer test & arrange checkpointIO utils * fix gemini sharding state_dict * add verbose option * add loading of master params * fix typehint * fix master/working mapping in fp16 amp * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] fix epoch change * [shardformer] broadcast add pp group * rebase feature/shardformer * update pipeline * [shardformer] fix * [shardformer] fix * [shardformer] bert finetune fix * [shardformer] add all_reduce operation to loss add all_reduce operation to loss * [shardformer] make compatible with pytree. make compatible with pytree. * [shardformer] disable tp disable tp * [shardformer] add 3d plugin to ci test * [shardformer] update num_microbatches to None * [shardformer] update microbatchsize * [shardformer] update assert * update scheduler * update scheduler --------- Co-authored-by: Jianghai <72591262+CjhHa1@users.noreply.github.com> Co-authored-by: Bin Jia <45593998+FoolPlayer@users.noreply.github.com> Co-authored-by: Baizhou Zhang <eddiezhang@pku.edu.cn>	1 year ago
Jianghai	24c0768795	[shardformer] Pytree fix (#4533 ) * pytree test * test bert * test bert * test bert * revise * add register * add register	1 year ago
Hongxin Liu	63ecafb1fb	[checkpointio] optimize zero optim checkpoint io (#4591 ) * [zero] update checkpoint io to save memory * [checkpointio] add device map to save memory	1 year ago
Hongxin Liu	508ca36fe3	[pipeline] 1f1b schedule receive microbatch size (#4589 )	1 year ago
LuGY	cbac782254	[zero]fix zero ckptIO with offload (#4529 ) * fix zero ckptio with offload * fix load device * saved tensors in ckpt should be on CPU * fix unit test * fix unit test * add clear cache * save memory for CI	1 year ago
Baizhou Zhang	38ccb8b1a3	[shardformer] support from_pretrained when loading model with HybridParallelPlugin (#4575 ) * hybrid plugin support huggingface from_pretrained * add huggingface compatibility tests * add folder cleaning * fix bugs	1 year ago
Baizhou Zhang	c9625dbb63	[shardformer] support sharded optimizer checkpointIO of HybridParallelPlugin (#4540 ) * implement sharded optimizer saving * add more param info * finish implementation of sharded optimizer saving * fix bugs in optimizer sharded saving * add pp+zero test * param group loading * greedy loading of optimizer * fix bug when loading * implement optimizer sharded saving * add optimizer test & arrange checkpointIO utils * fix gemini sharding state_dict * add verbose option * add loading of master params * fix typehint * fix master/working mapping in fp16 amp	1 year ago
Baizhou Zhang	2c787d7f47	[shardformer] fix submodule replacement bug when enabling pp (#4544 )	1 year ago
flybird11111	ec18fc7340	[shardformer] support pp+tp+zero1 tests (#4531 ) * [shardformer] fix opt test hanging * fix * test * test * test * fix test * fix test * remove print * add fix * [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 * [shardformer] pp+tp+zero1 * [shardformer] pp+tp+zero1 * [shardformer] pp+tp+zero1 * [shardformer] pp+tp+zero1	1 year ago
Lufang Chen	12c95a9fed	fix runtime prepare pass (#4502 ) Co-authored-by: lufang.chen <lufang.chen@nio.com>	1 year ago
flybird11111	d367b88785	[shardformer] fix opt test hanging (#4521 ) * [shardformer] fix opt test hanging * fix * test * test * test * fix test * fix test * remove print * add fix	1 year ago
Bin Jia	e241b74f24	[shardformer] Add overlap support for gpt2 (#4535 ) * add overlap support for gpt2 * remove unused code * remove unused code	1 year ago
Baizhou Zhang	0387a47e63	[shardformer] fix emerged bugs after updating transformers (#4526 )	1 year ago
Hongxin Liu	0b00def881	[example] add llama2 example (#4527 ) * [example] transfer llama-1 example * [example] fit llama-2 * [example] refactor scripts folder * [example] fit new gemini plugin * [cli] fix multinode runner * [example] fit gemini optim checkpoint * [example] refactor scripts * [example] update requirements * [example] update requirements * [example] rename llama to llama2 * [example] update readme and pretrain script * [example] refactor scripts	1 year ago
Bin Jia	c554b7f559	[shardformer/fix overlap bug] fix overlap bug, add overlap as an option in shardco… (#4516 ) * fix overlap bug and support bert, add overlap as an option in shardconfig * support overlap for chatglm and bloom	1 year ago
Jianghai	376533a564	[shardformer] zero1+pp and the corresponding tests (#4517 ) * pause * finish pp+zero1 * Update test_shard_vit.py	1 year ago
Baizhou Zhang	44eab2b27f	[shardformer] support sharded checkpoint IO for models of HybridParallelPlugin (#4506 ) * add APIs * implement save_sharded_model * add test for hybrid checkpointio * implement naive loading for sharded model * implement efficient sharded model loading * open a new file for hybrid checkpoint_io * small fix * fix circular importing * fix docstring * arrange arguments and apis * small fix	1 year ago
flybird11111	de8a65babc	[shardformer] opt fix. (#4514 ) * [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel * fix fix fix fix * [shardformer] jit fused fix * [shardformer] jit fused fix * [shardformer] jit fused fix * [shardformer] jit fused fix * [shardformer] jit fused fix * [shardformer] jit fused fix * [shardformer] jit fused fix * activate checks * [Test] test ci * test ci * test ci * test ci * test ci * test ci * test ci * fix	1 year ago
LuGY	839847b7d7	[zero]support zero2 with gradient accumulation (#4511 ) * support gradient accumulation with zero2 * fix type	1 year ago
flybird11111	3353e55c80	[shardformer] vit/llama/t5 ignore the sequence parallelism flag and some fix. (#4498 ) * [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel * fix fix fix fix * [shardformer] jit fused fix * [shardformer] jit fused fix * [shardformer] jit fused fix * [shardformer] jit fused fix * [shardformer] jit fused fix * [shardformer] jit fused fix * [shardformer] jit fused fix * activate checks	1 year ago
Hongxin Liu	27061426f7	[gemini] improve compatibility and add static placement policy (#4479 ) * [gemini] remove distributed-related part from colotensor (#4379) * [gemini] remove process group dependency * [gemini] remove tp part from colo tensor * [gemini] patch inplace op * [gemini] fix param op hook and update tests * [test] remove useless tests * [test] remove useless tests * [misc] fix requirements * [test] fix model zoo * [test] fix model zoo * [test] fix model zoo * [test] fix model zoo * [test] fix model zoo * [misc] update requirements * [gemini] refactor gemini optimizer and gemini ddp (#4398) * [gemini] update optimizer interface * [gemini] renaming gemini optimizer * [gemini] refactor gemini ddp class * [example] update gemini related example * [example] update gemini related example * [plugin] fix gemini plugin args * [test] update gemini ckpt tests * [gemini] fix checkpoint io * [example] fix opt example requirements * [example] fix opt example * [example] fix opt example * [example] fix opt example * [gemini] add static placement policy (#4443) * [gemini] add static placement policy * [gemini] fix param offload * [test] update gemini tests * [plugin] update gemini plugin * [plugin] update gemini plugin docstr * [misc] fix flash attn requirement * [test] fix gemini checkpoint io test * [example] update resnet example result (#4457) * [example] update bert example result (#4458) * [doc] update gemini doc (#4468) * [example] update gemini related examples (#4473) * [example] update gpt example * [example] update dreambooth example * [example] update vit * [example] update opt * [example] update palm * [example] update vit and opt benchmark * [hotfix] fix bert in model zoo (#4480) * [hotfix] fix bert in model zoo * [test] remove chatglm gemini test * [test] remove sam gemini test * [test] remove vit gemini test * [hotfix] fix opt tutorial example (#4497) * [hotfix] fix opt tutorial example * [hotfix] fix opt tutorial example	1 year ago
flybird11111	59e252ecdb	[shardformer] chatglm support sequence parallel (#4482 ) * [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel * fix fix fix fix	1 year ago
Bin Jia	351351a36e	[shardformer/sequence parallel] not support opt of seq-parallel, add warning and fix a bug in gpt2 pp (#4488 )	1 year ago
Jianghai	5545114fd8	rename chatglm to chatglm2 (#4484 )	1 year ago
Baizhou Zhang	1c7df566e2	[shardformer] support tp+zero for shardformer (#4472 ) * support tp+zero/input type cast for hybridplugin * add tp+zero tests * fix bucket arguments	1 year ago
Jianghai	8739aa7fa0	[shardformer] Pipeline/whisper (#4456 ) * add some base tests and policies * finish whisper base model * add conditional generation * finish basic tests * whisper * finish whisper * finish whisper * del useless whisper test * fix * add argmin to replace * finish revision	1 year ago

1 2 3 4 5 ...

1718 Commits (bce9499ed33f7a8359bbb568c7ee18d72e8aa731)