littsk
54b3ad8924
[hotfix] fix norm type error in zero optimizer ( #4795 )
1 year ago
Hongxin Liu
da15fdb9ca
[doc] add lazy init docs ( #4808 )
1 year ago
Yan haixu
a22706337a
[misc] add last_epoch in CosineAnnealingWarmupLR ( #4778 )
1 year ago
Chandler-Bing
b6cf0aca55
[hotfix] change llama2 Colossal-LLaMA-2 script filename ( #4800 )
...
change filename:
pretraining.py -> trainin.py
there is no file named pretraing.py. wrong writing
1 year ago
Desperado-Jia
62b6af1025
Merge pull request #4805 from TongLi3701/docs/fix
...
[doc] Update TODO in README of Colossal-LLaMA-2
1 year ago
Tong Li
8cbce6184d
update
1 year ago
Hongxin Liu
4965c0dabd
[lazy] support from_pretrained ( #4801 )
...
* [lazy] patch from pretrained
* [lazy] fix from pretrained and add tests
* [devops] update ci
1 year ago
Tong Li
bd014673b0
update readme
1 year ago
Baizhou Zhang
64a08b2dc3
[checkpointio] support unsharded checkpointIO for hybrid parallel ( #4774 )
...
* support unsharded saving/loading for model
* support optimizer unsharded saving
* update doc
* support unsharded loading for optimizer
* small fix
1 year ago
Baizhou Zhang
a2db75546d
[doc] polish shardformer doc ( #4779 )
...
* fix example format in docstring
* polish shardformer doc
1 year ago
flybird11111
26cd6d850c
[fix] fix weekly runing example ( #4787 )
...
* [fix] fix weekly runing example
* [fix] fix weekly runing example
1 year ago
binmakeswell
d512a4d38d
[doc] add llama2 domain-specific solution news ( #4789 )
...
* [doc] add llama2 domain-specific solution news
1 year ago
Yuanchen
ce777853ae
[feature] ColossalEval: Evaluation Pipeline for LLMs ( #4786 )
...
* Add ColossalEval
* Delete evaluate in Chat
---------
Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
Co-authored-by: Tong Li <tong.li352711588@gmail.com>
1 year ago
Tong Li
74aa7d964a
initial commit: add colossal llama 2 ( #4784 )
1 year ago
Hongxin Liu
4146f1c0ce
[release] update version ( #4775 )
...
* [release] update version
* [doc] revert versions
1 year ago
Jianghai
ce7ade3882
[inference] chatglm2 infer demo ( #4724 )
...
* add chatglm2
* add
* gather needed kernels
* fix some bugs
* finish context forward
* finish context stage
* fix
* add
* pause
* add
* fix bugs
* finish chatglm
* fix bug
* change some logic
* fix bugs
* change some logics
* add
* add
* add
* fix
* fix tests
* fix
1 year ago
Xu Kai
946ab56c48
[feature] add gptq for inference ( #4754 )
...
* [gptq] add gptq kernel (#4416 )
* add gptq
* refactor code
* fix tests
* replace auto-gptq
* rname inferance/quant
* refactor test
* add auto-gptq as an option
* reset requirements
* change assert and check auto-gptq
* add import warnings
* change test flash attn version
* remove example
* change requirements of flash_attn
* modify tests
* [skip ci] change requirements-test
* [gptq] faster gptq cuda kernel (#4494 )
* [skip ci] add cuda kernels
* add license
* [skip ci] fix max_input_len
* format files & change test size
* [skip ci]
* [gptq] add gptq tensor parallel (#4538 )
* add gptq tensor parallel
* add gptq tp
* delete print
* add test gptq check
* add test auto gptq check
* [gptq] combine gptq and kv cache manager (#4706 )
* combine gptq and kv cache manager
* add init bits
* delete useless code
* add model path
* delete usless print and update test
* delete usless import
* move option gptq to shard config
* change replace linear to shardformer
* update bloom policy
* delete useless code
* fix import bug and delete uselss code
* change colossalai/gptq to colossalai/quant/gptq
* update import linear for tests
* delete useless code and mv gptq_kernel to kernel directory
* fix triton kernel
* add triton import
1 year ago
littsk
1e0e080837
[bug] Fix the version check bug in colossalai run when generating the cmd. ( #4713 )
...
* Fix the version check bug in colossalai run when generating the cmd.
* polish code
1 year ago
Hongxin Liu
3e05c07bb8
[lazy] support torch 2.0 ( #4763 )
...
* [lazy] support _like methods and clamp
* [lazy] pass transformers models
* [lazy] fix device move and requires grad
* [lazy] fix requires grad and refactor api
* [lazy] fix requires grad
1 year ago
Wenhao Chen
901ab1eedd
[chat]: add lora merge weights config ( #4766 )
...
* feat: modify lora merge weights fn
* feat: add lora merge weights config
1 year ago
Baizhou Zhang
493a5efeab
[doc] add shardformer doc to sidebar ( #4768 )
1 year ago
Hongxin Liu
66f3926019
[doc] clean up outdated docs ( #4765 )
...
* [doc] clean up outdated docs
* [doc] fix linking
* [doc] fix linking
1 year ago
Baizhou Zhang
df66741f77
[bug] fix get_default_parser in examples ( #4764 )
1 year ago
Baizhou Zhang
c0a033700c
[shardformer] fix master param sync for hybrid plugin/rewrite unwrapping logic ( #4758 )
...
* fix master param sync for hybrid plugin
* rewrite unwrap for ddp/fsdp
* rewrite unwrap for zero/gemini
* rewrite unwrap for hybrid plugin
* fix geemini unwrap
* fix bugs
1 year ago
Wenhao Chen
7b9b86441f
[chat]: update rm, add wandb and fix bugs ( #4471 )
...
* feat: modify forward fn of critic and reward model
* feat: modify calc_action_log_probs
* to: add wandb in sft and rm trainer
* feat: update train_sft
* feat: update train_rm
* style: modify type annotation and add warning
* feat: pass tokenizer to ppo trainer
* to: modify trainer base and maker base
* feat: add wandb in ppo trainer
* feat: pass tokenizer to generate
* test: update generate fn tests
* test: update train tests
* fix: remove action_mask
* feat: remove unused code
* fix: fix wrong ignore_index
* fix: fix mock tokenizer
* chore: update requirements
* revert: modify make_experience
* fix: fix inference
* fix: add padding side
* style: modify _on_learn_batch_end
* test: use mock tokenizer
* fix: use bf16 to avoid overflow
* fix: fix workflow
* [chat] fix gemini strategy
* [chat] fix
* sync: update colossalai strategy
* fix: fix args and model dtype
* fix: fix checkpoint test
* fix: fix requirements
* fix: fix missing import and wrong arg
* fix: temporarily skip gemini test in stage 3
* style: apply pre-commit
* fix: temporarily skip gemini test in stage 1&2
---------
Co-authored-by: Mingyan Jiang <1829166702@qq.com>
1 year ago
ppt0011
07c2e3d09c
Merge pull request #4757 from ppt0011/main
...
[doc] explain suitable use case for each plugin
1 year ago
Pengtai Xu
4d7537ba25
[doc] put native colossalai plugins first in description section
1 year ago
Pengtai Xu
e10d9f087e
[doc] add model examples for each plugin
1 year ago
Pengtai Xu
a04337bfc3
[doc] put individual plugin explanation in front
1 year ago
Pengtai Xu
10513f203c
[doc] explain suitable use case for each plugin
1 year ago
Hongxin Liu
079bf3cb26
[misc] update pre-commit and run all files ( #4752 )
...
* [misc] update pre-commit
* [misc] run pre-commit
* [misc] remove useless configuration files
* [misc] ignore cuda for clang-format
1 year ago
github-actions[bot]
3c6b831c26
[format] applied code formatting on changed files in pull request 4743 ( #4750 )
...
Co-authored-by: github-actions <github-actions@github.com>
1 year ago
Hongxin Liu
b5f9e37c70
[legacy] clean up legacy code ( #4743 )
...
* [legacy] remove outdated codes of pipeline (#4692 )
* [legacy] remove cli of benchmark and update optim (#4690 )
* [legacy] remove cli of benchmark and update optim
* [doc] fix cli doc test
* [legacy] fix engine clip grad norm
* [legacy] remove outdated colo tensor (#4694 )
* [legacy] remove outdated colo tensor
* [test] fix test import
* [legacy] move outdated zero to legacy (#4696 )
* [legacy] clean up utils (#4700 )
* [legacy] clean up utils
* [example] update examples
* [legacy] clean up amp
* [legacy] fix amp module
* [legacy] clean up gpc (#4742 )
* [legacy] clean up context
* [legacy] clean core, constants and global vars
* [legacy] refactor initialize
* [example] fix examples ci
* [example] fix examples ci
* [legacy] fix tests
* [example] fix gpt example
* [example] fix examples ci
* [devops] fix ci installation
* [example] fix examples ci
1 year ago
Xuanlei Zhao
32e7f99416
[kernel] update triton init #4740 ( #4740 )
1 year ago
Baizhou Zhang
d151dcab74
[doc] explaination of loading large pretrained models ( #4741 )
1 year ago
flybird11111
4c4482f3ad
[example] llama2 add fine-tune example ( #4673 )
...
* [shardformer] update shardformer readme
[shardformer] update shardformer readme
[shardformer] update shardformer readme
* [shardformer] update llama2/opt finetune example and shardformer update to llama2
* [shardformer] update llama2/opt finetune example and shardformer update to llama2
* [shardformer] update llama2/opt finetune example and shardformer update to llama2
* [shardformer] change dataset
* [shardformer] change dataset
* [shardformer] fix CI
* [shardformer] fix
* [shardformer] fix
* [shardformer] fix
* [shardformer] fix
* [shardformer] fix
[example] update opt example
[example] resolve comments
fix
fix
* [example] llama2 add finetune example
* [example] llama2 add finetune example
* [example] llama2 add finetune example
* [example] llama2 add finetune example
* fix
* update llama2 example
* update llama2 example
* fix
* update llama2 example
* update llama2 example
* update llama2 example
* update llama2 example
* update llama2 example
* update llama2 example
* Update requirements.txt
* update llama2 example
* update llama2 example
* update llama2 example
1 year ago
Xuanlei Zhao
ac2797996b
[shardformer] add custom policy in hybrid parallel plugin ( #4718 )
...
* add custom policy
* update assert
1 year ago
Baizhou Zhang
451c3465fb
[doc] polish shardformer doc ( #4735 )
...
* arrange position of chapters
* fix typos in seq parallel doc
1 year ago
ppt0011
73eb3e8862
Merge pull request #4738 from ppt0011/main
...
[legacy] remove deterministic data loader test
1 year ago
Bin Jia
608cffaed3
[example] add gpt2 HybridParallelPlugin example ( #4653 )
...
* add gpt2 HybridParallelPlugin example
* update readme and testci
* update test ci
* fix test_ci bug
* update requirements
* add requirements
* update requirements
* add requirement
* rename file
1 year ago
Bin Jia
6a03c933a0
[shardformer] update seq parallel document ( #4730 )
...
* update doc of seq parallel
* fix typo
1 year ago
Pengtai Xu
cd4e61d149
[legacy] remove deterministic data loader test
1 year ago
flybird11111
46162632e5
[shardformer] update pipeline parallel document ( #4725 )
...
* [shardformer] update pipeline parallel document
* [shardformer] update pipeline parallel document
* [shardformer] update pipeline parallel document
* [shardformer] update pipeline parallel document
* [shardformer] update pipeline parallel document
* [shardformer] update pipeline parallel document
* [shardformer] update pipeline parallel document
* [shardformer] update pipeline parallel document
1 year ago
digger yu
e4fc57c3de
Optimized some syntax errors in the documentation and code under applications/ ( #4127 )
...
Co-authored-by: flybird11111 <1829166702@qq.com>
1 year ago
Baizhou Zhang
50e5602c2d
[doc] add shardformer support matrix/update tensor parallel documents ( #4728 )
...
* add compatibility matrix for shardformer doc
* update tp doc
1 year ago
github-actions[bot]
8c2dda7410
[format] applied code formatting on changed files in pull request 4726 ( #4727 )
...
Co-authored-by: github-actions <github-actions@github.com>
1 year ago
Baizhou Zhang
f911d5b09d
[doc] Add user document for Shardformer ( #4702 )
...
* create shardformer doc files
* add docstring for seq-parallel
* update ShardConfig docstring
* add links to llama example
* add outdated massage
* finish introduction & supporting information
* finish 'how shardformer works'
* finish shardformer.md English doc
* fix doctest fail
* add Chinese document
1 year ago
binmakeswell
ce97790ed7
[doc] fix llama2 code link ( #4726 )
...
* [doc] fix llama2 code link
* [doc] fix llama2 code link
* [doc] fix llama2 code link
1 year ago
flybird11111
20190b49a5
[shardformer] to fix whisper test failed due to significant accuracy differences. ( #4710 )
...
* [shardformer] fix whisper test failed
* [shardformer] fix whisper test failed
* [shardformer] fix whisper test failed
* [shardformer] fix whisper test failed
1 year ago
Yuanheng Zhao
e2c0e7f92a
[hotfix] Fix import error: colossal.kernel without triton installed ( #4722 )
...
* [hotfix] remove triton kernels from kernel init
* revise bloom/llama kernel imports for infer
1 year ago