1813 Commits (e57812c6727e325971cb0d8769c0789c088f62ae)

Author SHA1 Message Date
Hongxin Liu 4f68b3f10c
[kernel] support pure fp16 for cpu adam and update gemini optim tests (#4921) 1 year ago
Xu Kai 611a5a80ca
[inference] Add smmoothquant for llama (#4904) 1 year ago
Zhongkai Zhao a0684e7bd6
[feature] support no master weights option for low level zero plugin (#4816) 1 year ago
Xu Kai 77a9328304
[inference] add llama2 support (#4898) 1 year ago
Baizhou Zhang 39f2582e98
[hotfix] fix lr scheduler bug in torch 2.0 (#4864) 1 year ago
littsk 83b52c56cd
[feature] Add clip_grad_norm for hybrid_parallel_plugin (#4837) 1 year ago
Hongxin Liu df63564184
[gemini] support amp o3 for gemini (#4872) 1 year ago
ppt0011 1dcaf249bd [doc] add reminder for issue encountered with hybrid adam 1 year ago
Bin Jia 08a9f76b2f
[Pipeline Inference] Sync pipeline inference branch to main (#4820) 1 year ago
Camille Zhong cd6a962e66 [NFC] polish code style (#4799) 1 year ago
Michelle 07ed155e86 [NFC] polish colossalai/inference/quant/gptq/cai_gptq/__init__.py code style (#4792) 1 year ago
littsk eef96e0877 polish code for gptq (#4793) 1 year ago
Hongxin Liu cb3a25a062
[checkpointio] hotfix torch 2.0 compatibility (#4824) 1 year ago
shaoyuw c97a3523db fix: typo in comment of low_level_zero plugin 1 year ago
Xu Kai d1fcc0fa4d
[infer] fix test bug (#4838) 1 year ago
Jianghai 013a4bedf0
[inference]fix import bug and delete down useless init (#4830) 1 year ago
Xu Kai c3bef20478
add autotune (#4822) 1 year ago
binmakeswell 822051d888
[doc] update slack link (#4823) 1 year ago
littsk 11f1e426fe
[hotfix] Correct several erroneous code comments (#4794) 1 year ago
littsk 54b3ad8924
[hotfix] fix norm type error in zero optimizer (#4795) 1 year ago
Hongxin Liu da15fdb9ca
[doc] add lazy init docs (#4808) 1 year ago
Yan haixu a22706337a
[misc] add last_epoch in CosineAnnealingWarmupLR (#4778) 1 year ago
Hongxin Liu 4965c0dabd
[lazy] support from_pretrained (#4801) 1 year ago
Baizhou Zhang 64a08b2dc3
[checkpointio] support unsharded checkpointIO for hybrid parallel (#4774) 1 year ago
Baizhou Zhang a2db75546d
[doc] polish shardformer doc (#4779) 1 year ago
Jianghai ce7ade3882
[inference] chatglm2 infer demo (#4724) 1 year ago
Xu Kai 946ab56c48
[feature] add gptq for inference (#4754) 1 year ago
littsk 1e0e080837
[bug] Fix the version check bug in colossalai run when generating the cmd. (#4713) 1 year ago
Hongxin Liu 3e05c07bb8
[lazy] support torch 2.0 (#4763) 1 year ago
Baizhou Zhang df66741f77
[bug] fix get_default_parser in examples (#4764) 1 year ago
Baizhou Zhang c0a033700c
[shardformer] fix master param sync for hybrid plugin/rewrite unwrapping logic (#4758) 1 year ago
Hongxin Liu 079bf3cb26
[misc] update pre-commit and run all files (#4752) 1 year ago
Hongxin Liu b5f9e37c70
[legacy] clean up legacy code (#4743) 1 year ago
Xuanlei Zhao 32e7f99416
[kernel] update triton init #4740 (#4740) 1 year ago
flybird11111 4c4482f3ad
[example] llama2 add fine-tune example (#4673) 1 year ago
Xuanlei Zhao ac2797996b
[shardformer] add custom policy in hybrid parallel plugin (#4718) 1 year ago
Baizhou Zhang f911d5b09d
[doc] Add user document for Shardformer (#4702) 1 year ago
flybird11111 20190b49a5
[shardformer] to fix whisper test failed due to significant accuracy differences. (#4710) 1 year ago
Yuanheng Zhao e2c0e7f92a
[hotfix] Fix import error: colossal.kernel without triton installed (#4722) 1 year ago
flybird11111 c7d6975d29
[shardformer] fix GPT2DoubleHeadsModel (#4703) 1 year ago
digger yu 9c2feb2f0b
fix some typo with colossalai/device colossalai/tensor/ etc. (#4171) 1 year ago
Baizhou Zhang d8ceeac14e
[hotfix] fix typo in hybrid parallel io (#4697) 1 year ago
flybird11111 8844691f4b
[shardformer] update shardformer readme (#4689) 1 year ago
Baizhou Zhang 1d454733c4
[doc] Update booster user documents. (#4669) 1 year ago
Cuiqing Li bce0f16702
[Feature] The first PR to Add TP inference engine, kv-cache manager and related kernels for our inference system (#4577) 1 year ago
flybird11111 eedaa3e1ef
[shardformer]fix gpt2 double head (#4663) 1 year ago
Hongxin Liu 554aa9592e
[legacy] move communication and nn to legacy and refactor logger (#4671) 1 year ago
flybird11111 7486ed7d3a
[shardformer] update llama2/opt finetune example and fix llama2 policy (#4645) 1 year ago
Baizhou Zhang 295b38fecf
[example] update vit example for hybrid parallel plugin (#4641) 1 year ago
Baizhou Zhang 660eed9124
[pipeline] set optimizer to optional in execute_pipeline (#4630) 1 year ago