Wenhao Chen
da4f7b855f
[chat] fix bugs and add unit tests ( #4213 )
...
* style: rename replay buffer
Experience replay is typically for off policy algorithms.
Use this name in PPO maybe misleading.
* fix: fix wrong zero2 default arg
* test: update experience tests
* style: rename zero_pad fn
* fix: defer init in CycledDataLoader
* test: add benchmark test
* style: rename internal fn of generation
* style: rename internal fn of lora
* fix: remove unused loss fn
* fix: remove unused utils fn
* refactor: remove generate_with_actor fn
* fix: fix type annotation
* test: add models tests
* fix: skip llama due to long execution time
* style: modify dataset
* style: apply formatter
* perf: update reward dataset
* fix: fix wrong IGNORE_INDEX in sft dataset
* fix: remove DataCollatorForSupervisedDataset
* test: add dataset tests
* style: apply formatter
* style: rename test_ci to test_train
* feat: add llama in inference
* test: add inference tests
* test: change test scripts directory
* fix: update ci
* fix: fix typo
* fix: skip llama due to oom
* fix: fix file mod
* style: apply formatter
* refactor: remove duplicated llama_gptq
* style: apply formatter
* to: update rm test
* feat: add tokenizer arg
* feat: add download model script
* test: update train tests
* fix: modify gemini load and save pretrained
* test: update checkpoint io test
* to: modify nproc_per_node
* fix: do not remove existing dir
* fix: modify save path
* test: add random choice
* fix: fix sft path
* fix: enlarge nproc_per_node to avoid oom
* fix: add num_retry
* fix: make lora config of rm and critic consistent
* fix: add warning about lora weights
* fix: skip some gpt2 tests
* fix: remove grad ckpt in rm and critic due to errors
* refactor: directly use Actor in train_sft
* test: add more arguments
* fix: disable grad ckpt when using lora
* fix: fix save_pretrained and related tests
* test: enable zero2 tests
* revert: remove useless fn
* style: polish code
* test: modify test args
2023-08-02 10:17:36 +08:00
Zirui Zhu
9e512938f6
[NFC] polish applications/Chat/coati/trainer/strategies/base.py code style ( #4278 )
2023-07-26 14:12:57 +08:00
アマデウス
caa4433072
[NFC] fix format of application/Chat/coati/trainer/utils.py ( #4273 )
2023-07-26 14:12:57 +08:00
shenggan
798cb72907
[NFC] polish applications/Chat/coati/trainer/base.py code style ( #4260 )
2023-07-26 14:12:57 +08:00
Frank Lee
f447ca1811
[chat] removed cache file ( #4155 )
2023-07-04 16:05:01 +08:00
wukong1992
c1c672d0f0
[shardformer] shardformer support t5 model ( #3994 )
...
test t5
2023-07-04 16:05:01 +08:00
Wenhao Chen
edd75a59ea
[chat] remove naive strategy and split colossalai strategy ( #4094 )
...
* feat: remove on_learn_epoch fn as not used
* revert: add _on_learn_epoch fn
* to: remove the use of NaiveStrategy
* test: remove NaiveStrategy tests
* feat: remove NaiveStrategy
* style: modify comments and params
* feat: split ColossalAIStrategy into LowLevelZeroStrategy and GeminiStrategy
* fix: remove naive
* fix: align with modified colossal strategy
* fix: fix ddp _try_init_dist arg
2023-06-29 18:11:00 +08:00
Wenhao Chen
b03d64d010
[chat] refactor trainer class ( #4080 )
...
* to: add SLTrainer
* refactor: refactor RMTrainer and SFTTrainer
* fix: fix init file
* feat: remove on_learn_epoch fn as not used
* fix: align with modified gemini arguments
* to: add OnPolicyTrainer
* revert: add _on_learn_epoch fn
* refactor: refactor PPOTrainer
* style: rename PPOTrainer argument
* fix: align with modified PPO arguments
* test: align with modified train_prompts arguments
* chore: modify train_prompts
* docs: align with modified arguments
* fix: remove unnecessary output
* fix: move dataloader to fit fn of SLTrainer
* fix: move dataloader to fit fn of OnPolicyTrainer
* fix: modify usage of prompt and pretrain dataloader
2023-06-29 10:48:09 +08:00
Baizhou Zhang
4da324cd60
[hotfix]fix argument naming in docs and examples ( #4083 )
2023-06-26 23:50:04 +08:00
Wenhao Chen
153b957a1b
[chat] refactor strategy class with booster api ( #3987 )
...
* refactor: adapt boost API in base and naive strategies
* fix: initialize plugin after setup_distributed
* fix: fix save_pretrained fn
* refactor: adapt boost API in DDPStrategy
* to: add _post_init check
* to: fix ddp backward, modify ddp dataloader and unwrap
* feat: adapt boost API in ColossalAIStrategy
* fix: call setup_distributed before use get_current_device
* fix: fix save_model and save_optimizer
* test: remove save_sharded_optimizer test
* style: apply formatter
* fix: fix stage check and add comments
* feat: allow dict type arg in strategy.prepare
* to: temporarily remove lr_scheduler for testing
* style: simplify init of ColossalAIStrategy
* fix: fix lr_scheduler in sft and rm
* style: modify comments
* test: add train_prompts tests
* fix: fix inference only case and use in train_prompts
* test: skip failed tests in ci
* style: fix CodeFactor check
* fix: do not use model.to('cpu') with GeminiPlugin
* test: enable colossalai_gemini tests
* test: set CUDA_VISIBLE_DEVICES in ci
* docs: add note
2023-06-25 17:36:21 +08:00
digger yu
d4fb7bfda7
fix typo applications/Chat/coati/ ( #3947 )
2023-06-15 10:43:11 +08:00
Wenhao Chen
9d02590c9a
[chat] refactor actor class ( #3968 )
...
* refactor: separate log_probs fn from Actor forward fn
* refactor: separate generate fn from Actor class
* feat: update unwrap_model and get_base_model
* unwrap_model returns model not wrapped by Strategy
* get_base_model returns HF model for Actor, Critic and RewardModel
* feat: simplify Strategy.prepare
* style: remove get_base_model method of Actor
* perf: tokenize text in batches
* refactor: move calc_action_log_probs to utils of model
* test: update test with new forward fn
* style: rename forward fn args
* fix: do not unwrap model in save_model fn of naive strategy
* test: add gemini test for train_prompts
* fix: fix _set_default_generate_kwargs
2023-06-13 13:31:56 +08:00
Hongxin Liu
b5f0566363
[chat] add distributed PPO trainer ( #3740 )
...
* Detached ppo (#9 )
* run the base
* working on dist ppo
* sync
* detached trainer
* update detached trainer. no maker update function
* facing init problem
* 1 maker 1 trainer detached run. but no model update
* facing cuda problem
* fix save functions
* verified maker update
* nothing
* add ignore
* analyize loss issue
* remove some debug codes
* facing 2m1t stuck issue
* 2m1t verified
* do not use torchrun
* working on 2m2t
* working on 2m2t
* initialize strategy in ray actor env
* facing actor's init order issue
* facing ddp model update issue (need unwarp ddp)
* unwrap ddp actor
* checking 1m2t stuck problem
* nothing
* set timeout for trainer choosing. It solves the stuck problem!
* delete some debug output
* rename to sync with upstream
* rename to sync with upstream
* coati rename
* nothing
* I am going to detach the replaybuffer from trainer and make it a Ray Actor. Two benefits: 1. support TP trainer. 2. asynchronized buffer operations
* experience_maker_holder performs target-revolving _send_experience() instead of length comparison.
* move code to ray subfolder
* working on pipeline inference
* apply comments
* working on pipeline strategy. in progress.
* remove pipeline code. clean this branch
* update remote parameters by state_dict. no test
* nothing
* state_dict sharding transfer
* merge debug branch
* gemini _unwrap_model fix
* simplify code
* simplify code & fix LoRALinear AttributeError
* critic unwrapped state_dict
---------
Co-authored-by: csric <richcsr256@gmail.com>
* [chat] add perfomance evaluator and fix bugs (#10 )
* [chat] add performance evaluator for ray
* [chat] refactor debug arg
* [chat] support hf config
* [chat] fix generation
* [chat] add 1mmt dummy example
* [chat] fix gemini ckpt
* split experience to send (#11 )
Co-authored-by: csric <richcsr256@gmail.com>
* [chat] refactor trainer and maker (#12 )
* [chat] refactor experience maker holder
* [chat] refactor model init
* [chat] refactor trainer args
* [chat] refactor model init
* [chat] refactor trainer
* [chat] refactor experience sending logic and training loop args (#13 )
* [chat] refactor experience send logic
* [chat] refactor trainer
* [chat] refactor trainer
* [chat] refactor experience maker
* [chat] refactor pbar
* [chat] refactor example folder (#14 )
* [chat] support quant (#15 )
* [chat] add quant
* [chat] add quant example
* prompt example (#16 )
* prompt example
* prompt load csv data
* remove legacy try
---------
Co-authored-by: csric <richcsr256@gmail.com>
* [chat] add mmmt dummy example and refactor experience sending (#17 )
* [chat] add mmmt dummy example
* [chat] refactor naive strategy
* [chat] fix struck problem
* [chat] fix naive strategy
* [chat] optimize experience maker sending logic
* [chat] refactor sending assignment
* [chat] refactor performance evaluator (#18 )
* Prompt Example & requires_grad state_dict & sharding state_dict (#19 )
* prompt example
* prompt load csv data
* remove legacy try
* maker models require_grad set to False
* working on zero redundancy update
* mmmt_prompt example; naive strategy requires_grad state_dict & sharding; maker model requires_no_grad.
* remove legacy examples
* remove legacy examples
* remove replay buffer tp state. bad design
---------
Co-authored-by: csric <richcsr256@gmail.com>
* state_dict sending adapts to new unwrap function (#20 )
* prompt example
* prompt load csv data
* remove legacy try
* maker models require_grad set to False
* working on zero redundancy update
* mmmt_prompt example; naive strategy requires_grad state_dict & sharding; maker model requires_no_grad.
* remove legacy examples
* remove legacy examples
* remove replay buffer tp state. bad design
* opt benchmark
* better script
* nothing
* [chat] strategy refactor unwrap model
* [chat] strategy refactor save model
* [chat] add docstr
* [chat] refactor trainer save model
* [chat] fix strategy typing
* [chat] refactor trainer save model
* [chat] update readme
* [chat] fix unit test
* working on lora reconstruction
* state_dict sending adapts to new unwrap function
* remove comments
---------
Co-authored-by: csric <richcsr256@gmail.com>
Co-authored-by: ver217 <lhx0217@gmail.com>
* [chat-ray] add readme (#21 )
* add readme
* transparent graph
* add note background
---------
Co-authored-by: csric <richcsr256@gmail.com>
* [chat] get images from url (#22 )
* Refactor/chat ray (#23 )
* [chat] lora add todo
* [chat] remove unused pipeline strategy
* [chat] refactor example structure
* [chat] setup ci for ray
* [chat-ray] Support LoRA trainer. LoRA weights reconstruction. (#24 )
* lora support prototype
* lora support
* 1mmt lora & remove useless code
---------
Co-authored-by: csric <richcsr256@gmail.com>
* [chat] fix test ci for ray
* [chat] fix test ci requirements for ray
* [chat] fix ray runtime env
* [chat] fix ray runtime env
* [chat] fix example ci docker args
* [chat] add debug info in trainer
* [chat] add nccl debug info
* [chat] skip ray test
* [doc] fix typo
---------
Co-authored-by: csric <59389055+CsRic@users.noreply.github.com>
Co-authored-by: csric <richcsr256@gmail.com>
2023-06-07 10:41:16 +08:00
tanitna
1a60dc07a8
[chat] typo accimulation_steps -> accumulation_steps ( #3662 )
2023-04-28 15:42:57 +08:00
Hongxin Liu
842768a174
[chat] refactor model save/load logic ( #3654 )
...
* [chat] strategy refactor unwrap model
* [chat] strategy refactor save model
* [chat] add docstr
* [chat] refactor trainer save model
* [chat] fix strategy typing
* [chat] refactor trainer save model
* [chat] update readme
* [chat] fix unit test
2023-04-27 18:41:49 +08:00
Hongxin Liu
6ef7011462
[chat] remove lm model class ( #3653 )
...
* [chat] refactor lora
* [chat] remove lm class
* [chat] refactor save model
* [chat] refactor train sft
* [chat] fix ci
* [chat] fix ci
2023-04-27 15:37:38 +08:00
Hongxin Liu
2a951955ad
[chat] refactor trainer ( #3648 )
...
* [chat] ppo trainer remove useless args
* [chat] update examples
* [chat] update benchmark
* [chat] update examples
* [chat] fix sft training with wandb
* [chat] polish docstr
2023-04-26 18:11:49 +08:00
Hongxin Liu
f8288315d9
[chat] polish performance evaluator ( #3647 )
2023-04-26 17:34:59 +08:00
Hongxin Liu
50793b35f4
[gemini] accelerate inference ( #3641 )
...
* [gemini] support don't scatter after inference
* [chat] update colossalai strategy
* [chat] fix opt benchmark
* [chat] update opt benchmark
* [gemini] optimize inference
* [test] add gemini inference test
* [chat] fix unit test ci
* [chat] fix ci
* [chat] fix ci
* [chat] skip checkpoint test
2023-04-26 16:32:40 +08:00
ddobokki
df309fc6ab
[Chat] Remove duplicate functions ( #3625 )
2023-04-24 12:23:15 +08:00
digger-yu
d7bf284706
[chat] polish code note typo ( #3612 )
2023-04-20 17:22:15 +08:00
Yuanchen
1ec0d386a9
reconstruct chat trainer and fix training script ( #3588 )
...
Co-authored-by: Yuanchen Xu <yuanchen.xu00@gmail.com>
2023-04-18 16:44:03 +08:00
tingfeng cao
7788e0b0a5
fix: fix sft ( #3568 )
2023-04-17 16:47:44 +08:00
csric
e355144375
[chatgpt] Detached PPO Training ( #3195 )
...
* run the base
* working on dist ppo
* sync
* detached trainer
* update detached trainer. no maker update function
* facing init problem
* 1 maker 1 trainer detached run. but no model update
* facing cuda problem
* fix save functions
* verified maker update
* nothing
* add ignore
* analyize loss issue
* remove some debug codes
* facing 2m1t stuck issue
* 2m1t verified
* do not use torchrun
* working on 2m2t
* working on 2m2t
* initialize strategy in ray actor env
* facing actor's init order issue
* facing ddp model update issue (need unwarp ddp)
* unwrap ddp actor
* checking 1m2t stuck problem
* nothing
* set timeout for trainer choosing. It solves the stuck problem!
* delete some debug output
* rename to sync with upstream
* rename to sync with upstream
* coati rename
* nothing
* I am going to detach the replaybuffer from trainer and make it a Ray Actor. Two benefits: 1. support TP trainer. 2. asynchronized buffer operations
* experience_maker_holder performs target-revolving _send_experience() instead of length comparison.
* move code to ray subfolder
* working on pipeline inference
* apply comments
---------
Co-authored-by: csric <richcsr256@gmail.com>
2023-04-17 14:46:50 +08:00
zhang-yi-chi
e6a132a449
[chat]: add vf_coef argument for PPOTrainer ( #3318 )
2023-04-11 09:54:59 +08:00
YY Lin
62f4e2eb07
[Chat]Add Peft support & fix the ptx bug ( #3433 )
...
* Update ppo.py
Fix the bug of fetching wrong batch data
* Add peft model support in SFT and Prompts training
In stage-1 and stage-3, the peft model supports are added. So the trained artifacts will be only a small lora additions instead of the whole bunch of files.
* Delete test_prompts.txt
* Delete test_pretrained.txt
* Move the peft stuffs to a community folder.
* Move the demo sft to community
* delete dirty files
* Add instructions to install peft using source
* Remove Chinese comments
* remove the Chinese comments
2023-04-06 11:54:52 +08:00
Dr-Corgi
73afb63594
[chat]fix save_model( #3377 )
...
The function save_model should be a part of PPOTrainer.
2023-04-06 11:19:14 +08:00
Yuanchen
b92313903f
fix save_model indent error in ppo trainer ( #3450 )
...
Co-authored-by: Yuanchen Xu <yuanchen.xu00@gmail.com>
2023-04-05 09:45:42 +08:00
Yuanchen
773955abfa
fix save_model inin naive and ddp strategy ( #3436 )
...
Co-authored-by: Yuanchen Xu <yuanchen.xu00@gmail.com>
2023-04-04 15:30:01 +08:00
ver217
26b7aac0be
[zero] reorganize zero/gemini folder structure ( #3424 )
...
* [zero] refactor low-level zero folder structure
* [zero] fix legacy zero import path
* [zero] fix legacy zero import path
* [zero] remove useless import
* [zero] refactor gemini folder structure
* [zero] refactor gemini folder structure
* [zero] refactor legacy zero import path
* [zero] refactor gemini folder structure
* [zero] refactor gemini folder structure
* [zero] refactor gemini folder structure
* [zero] refactor legacy zero import path
* [zero] fix test import path
* [zero] fix test
* [zero] fix circular import
* [zero] update import
2023-04-04 13:48:16 +08:00
Fazzie-Maqianli
b0ce5a1032
[Coati] first commit ( #3283 )
2023-03-28 20:25:36 +08:00