Camille Zhong
3043d5d676
Update modelscope link in README.md
...
add modelscope link
1 year ago
flybird11111
6a21f96a87
[doc] update advanced tutorials, training gpt with hybrid parallelism ( #4866 )
...
* [doc]update advanced tutorials, training gpt with hybrid parallelism
* [doc]update advanced tutorials, training gpt with hybrid parallelism
* update vit tutorials
* update vit tutorials
* update vit tutorials
* update vit tutorials
* update en/train_vit_with_hybrid_parallel.py
* fix
* resolve comments
* fix
1 year ago
Blagoy Simandoff
8aed02b957
[nfc] fix minor typo in README ( #4846 )
1 year ago
Camille Zhong
cd6a962e66
[NFC] polish code style ( #4799 )
1 year ago
Michelle
07ed155e86
[NFC] polish colossalai/inference/quant/gptq/cai_gptq/__init__.py code style ( #4792 )
1 year ago
littsk
eef96e0877
polish code for gptq ( #4793 )
1 year ago
Hongxin Liu
cb3a25a062
[checkpointio] hotfix torch 2.0 compatibility ( #4824 )
1 year ago
ppt0011
ad23460cf8
Merge pull request #4856 from KKZ20/test/model_support_for_low_level_zero
...
[test] remove the redundant code of model output transformation in torchrec
1 year ago
ppt0011
81ee91f2ca
Merge pull request #4858 from Shawlleyw/main
...
[doc]: typo in document of booster low_level_zero plugin
1 year ago
shaoyuw
c97a3523db
fix: typo in comment of low_level_zero plugin
1 year ago
Zhongkai Zhao
db40e086c8
[test] modify model supporting part of low_level_zero plugin (including correspoding docs)
1 year ago
Xu Kai
d1fcc0fa4d
[infer] fix test bug ( #4838 )
...
* fix test bug
* delete useless code
* fix typo
1 year ago
Jianghai
013a4bedf0
[inference]fix import bug and delete down useless init ( #4830 )
...
* fix import bug and release useless init
* fix
* fix
* fix
1 year ago
Yuanheng Zhao
573f270537
[Infer] Serving example w/ ray-serve (multiple GPU case) ( #4841 )
...
* fix imports
* add ray-serve with Colossal-Infer tp
* trivial: send requests script
* add README
* fix worker port
* fix readme
* use app builder and autoscaling
* trivial: input args
* clean code; revise readme
* testci (skip example test)
* use auto model/tokenizer
* revert imports fix (fixed in other PRs)
1 year ago
Yuanheng Zhao
3a74eb4b3a
[Infer] Colossal-Inference serving example w/ TorchServe (single GPU case) ( #4771 )
...
* add Colossal-Inference serving example w/ TorchServe
* add dockerfile
* fix dockerfile
* fix dockerfile: fix commit hash, install curl
* refactor file structure
* revise readme
* trivial
* trivial: dockerfile format
* clean dir; revise readme
* fix comments: fix imports and configs
* fix formats
* remove unused requirements
1 year ago
Tong Li
ed06731e00
update Colossal ( #4832 )
1 year ago
Xu Kai
c3bef20478
add autotune ( #4822 )
1 year ago
binmakeswell
822051d888
[doc] update slack link ( #4823 )
1 year ago
Yuanchen
1fa8c5e09f
Update Qwen-7B results ( #4821 )
...
Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
1 year ago
flybird11111
be400a0936
[chat] fix gemini strategy ( #4698 )
...
* [chat] fix gemini strategy
* [chat] fix gemini strategy
* [chat] fix gemini strategy
* [chat] fix gemini strategy
* g# This is a combination of 2 commits.
[chat] fix gemini strategy
fox
* [chat] fix gemini strategy
update llama2 example
[chat] fix gemini strategy
* [fix] fix gemini strategy
* [fix] fix gemini strategy
* [fix] fix gemini strategy
* [fix] fix gemini strategy
* [fix] fix gemini strategy
* [fix] fix gemini strategy
* [fix] fix gemini strategy
* [fix] fix gemini strategy
* [fix] fix gemini strategy
* [fix] fix gemini strategy
* fix
* fix
* fix
* fix
* fix
* Update train_prompts.py
1 year ago
Tong Li
bbbcac26e8
fix format ( #4815 )
1 year ago
github-actions[bot]
fb46d05cdf
[format] applied code formatting on changed files in pull request 4595 ( #4602 )
...
Co-authored-by: github-actions <github-actions@github.com>
1 year ago
littsk
11f1e426fe
[hotfix] Correct several erroneous code comments ( #4794 )
1 year ago
littsk
54b3ad8924
[hotfix] fix norm type error in zero optimizer ( #4795 )
1 year ago
Hongxin Liu
da15fdb9ca
[doc] add lazy init docs ( #4808 )
1 year ago
Yan haixu
a22706337a
[misc] add last_epoch in CosineAnnealingWarmupLR ( #4778 )
1 year ago
Chandler-Bing
b6cf0aca55
[hotfix] change llama2 Colossal-LLaMA-2 script filename ( #4800 )
...
change filename:
pretraining.py -> trainin.py
there is no file named pretraing.py. wrong writing
1 year ago
Desperado-Jia
62b6af1025
Merge pull request #4805 from TongLi3701/docs/fix
...
[doc] Update TODO in README of Colossal-LLaMA-2
1 year ago
Tong Li
8cbce6184d
update
1 year ago
Hongxin Liu
4965c0dabd
[lazy] support from_pretrained ( #4801 )
...
* [lazy] patch from pretrained
* [lazy] fix from pretrained and add tests
* [devops] update ci
1 year ago
Tong Li
bd014673b0
update readme
1 year ago
Baizhou Zhang
64a08b2dc3
[checkpointio] support unsharded checkpointIO for hybrid parallel ( #4774 )
...
* support unsharded saving/loading for model
* support optimizer unsharded saving
* update doc
* support unsharded loading for optimizer
* small fix
1 year ago
Baizhou Zhang
a2db75546d
[doc] polish shardformer doc ( #4779 )
...
* fix example format in docstring
* polish shardformer doc
1 year ago
flybird11111
26cd6d850c
[fix] fix weekly runing example ( #4787 )
...
* [fix] fix weekly runing example
* [fix] fix weekly runing example
1 year ago
binmakeswell
d512a4d38d
[doc] add llama2 domain-specific solution news ( #4789 )
...
* [doc] add llama2 domain-specific solution news
1 year ago
Yuanchen
ce777853ae
[feature] ColossalEval: Evaluation Pipeline for LLMs ( #4786 )
...
* Add ColossalEval
* Delete evaluate in Chat
---------
Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
Co-authored-by: Tong Li <tong.li352711588@gmail.com>
1 year ago
Tong Li
74aa7d964a
initial commit: add colossal llama 2 ( #4784 )
1 year ago
Hongxin Liu
4146f1c0ce
[release] update version ( #4775 )
...
* [release] update version
* [doc] revert versions
1 year ago
Jianghai
ce7ade3882
[inference] chatglm2 infer demo ( #4724 )
...
* add chatglm2
* add
* gather needed kernels
* fix some bugs
* finish context forward
* finish context stage
* fix
* add
* pause
* add
* fix bugs
* finish chatglm
* fix bug
* change some logic
* fix bugs
* change some logics
* add
* add
* add
* fix
* fix tests
* fix
1 year ago
Xu Kai
946ab56c48
[feature] add gptq for inference ( #4754 )
...
* [gptq] add gptq kernel (#4416 )
* add gptq
* refactor code
* fix tests
* replace auto-gptq
* rname inferance/quant
* refactor test
* add auto-gptq as an option
* reset requirements
* change assert and check auto-gptq
* add import warnings
* change test flash attn version
* remove example
* change requirements of flash_attn
* modify tests
* [skip ci] change requirements-test
* [gptq] faster gptq cuda kernel (#4494 )
* [skip ci] add cuda kernels
* add license
* [skip ci] fix max_input_len
* format files & change test size
* [skip ci]
* [gptq] add gptq tensor parallel (#4538 )
* add gptq tensor parallel
* add gptq tp
* delete print
* add test gptq check
* add test auto gptq check
* [gptq] combine gptq and kv cache manager (#4706 )
* combine gptq and kv cache manager
* add init bits
* delete useless code
* add model path
* delete usless print and update test
* delete usless import
* move option gptq to shard config
* change replace linear to shardformer
* update bloom policy
* delete useless code
* fix import bug and delete uselss code
* change colossalai/gptq to colossalai/quant/gptq
* update import linear for tests
* delete useless code and mv gptq_kernel to kernel directory
* fix triton kernel
* add triton import
1 year ago
littsk
1e0e080837
[bug] Fix the version check bug in colossalai run when generating the cmd. ( #4713 )
...
* Fix the version check bug in colossalai run when generating the cmd.
* polish code
1 year ago
Hongxin Liu
3e05c07bb8
[lazy] support torch 2.0 ( #4763 )
...
* [lazy] support _like methods and clamp
* [lazy] pass transformers models
* [lazy] fix device move and requires grad
* [lazy] fix requires grad and refactor api
* [lazy] fix requires grad
1 year ago
Wenhao Chen
901ab1eedd
[chat]: add lora merge weights config ( #4766 )
...
* feat: modify lora merge weights fn
* feat: add lora merge weights config
1 year ago
Baizhou Zhang
493a5efeab
[doc] add shardformer doc to sidebar ( #4768 )
1 year ago
Hongxin Liu
66f3926019
[doc] clean up outdated docs ( #4765 )
...
* [doc] clean up outdated docs
* [doc] fix linking
* [doc] fix linking
1 year ago
Baizhou Zhang
df66741f77
[bug] fix get_default_parser in examples ( #4764 )
1 year ago
Baizhou Zhang
c0a033700c
[shardformer] fix master param sync for hybrid plugin/rewrite unwrapping logic ( #4758 )
...
* fix master param sync for hybrid plugin
* rewrite unwrap for ddp/fsdp
* rewrite unwrap for zero/gemini
* rewrite unwrap for hybrid plugin
* fix geemini unwrap
* fix bugs
1 year ago
Wenhao Chen
7b9b86441f
[chat]: update rm, add wandb and fix bugs ( #4471 )
...
* feat: modify forward fn of critic and reward model
* feat: modify calc_action_log_probs
* to: add wandb in sft and rm trainer
* feat: update train_sft
* feat: update train_rm
* style: modify type annotation and add warning
* feat: pass tokenizer to ppo trainer
* to: modify trainer base and maker base
* feat: add wandb in ppo trainer
* feat: pass tokenizer to generate
* test: update generate fn tests
* test: update train tests
* fix: remove action_mask
* feat: remove unused code
* fix: fix wrong ignore_index
* fix: fix mock tokenizer
* chore: update requirements
* revert: modify make_experience
* fix: fix inference
* fix: add padding side
* style: modify _on_learn_batch_end
* test: use mock tokenizer
* fix: use bf16 to avoid overflow
* fix: fix workflow
* [chat] fix gemini strategy
* [chat] fix
* sync: update colossalai strategy
* fix: fix args and model dtype
* fix: fix checkpoint test
* fix: fix requirements
* fix: fix missing import and wrong arg
* fix: temporarily skip gemini test in stage 3
* style: apply pre-commit
* fix: temporarily skip gemini test in stage 1&2
---------
Co-authored-by: Mingyan Jiang <1829166702@qq.com>
1 year ago
ppt0011
07c2e3d09c
Merge pull request #4757 from ppt0011/main
...
[doc] explain suitable use case for each plugin
1 year ago
Pengtai Xu
4d7537ba25
[doc] put native colossalai plugins first in description section
1 year ago