Wenhao Chen
da4f7b855f
[chat] fix bugs and add unit tests ( #4213 )
...
* style: rename replay buffer
Experience replay is typically for off policy algorithms.
Use this name in PPO maybe misleading.
* fix: fix wrong zero2 default arg
* test: update experience tests
* style: rename zero_pad fn
* fix: defer init in CycledDataLoader
* test: add benchmark test
* style: rename internal fn of generation
* style: rename internal fn of lora
* fix: remove unused loss fn
* fix: remove unused utils fn
* refactor: remove generate_with_actor fn
* fix: fix type annotation
* test: add models tests
* fix: skip llama due to long execution time
* style: modify dataset
* style: apply formatter
* perf: update reward dataset
* fix: fix wrong IGNORE_INDEX in sft dataset
* fix: remove DataCollatorForSupervisedDataset
* test: add dataset tests
* style: apply formatter
* style: rename test_ci to test_train
* feat: add llama in inference
* test: add inference tests
* test: change test scripts directory
* fix: update ci
* fix: fix typo
* fix: skip llama due to oom
* fix: fix file mod
* style: apply formatter
* refactor: remove duplicated llama_gptq
* style: apply formatter
* to: update rm test
* feat: add tokenizer arg
* feat: add download model script
* test: update train tests
* fix: modify gemini load and save pretrained
* test: update checkpoint io test
* to: modify nproc_per_node
* fix: do not remove existing dir
* fix: modify save path
* test: add random choice
* fix: fix sft path
* fix: enlarge nproc_per_node to avoid oom
* fix: add num_retry
* fix: make lora config of rm and critic consistent
* fix: add warning about lora weights
* fix: skip some gpt2 tests
* fix: remove grad ckpt in rm and critic due to errors
* refactor: directly use Actor in train_sft
* test: add more arguments
* fix: disable grad ckpt when using lora
* fix: fix save_pretrained and related tests
* test: enable zero2 tests
* revert: remove useless fn
* style: polish code
* test: modify test args
2023-08-02 10:17:36 +08:00
Wenhao Chen
75c5389037
[chat] fix compute_approx_kl ( #4338 )
2023-08-01 10:21:45 +08:00
yuxuan-lou
0991405361
[NFC] polish applications/Chat/coati/models/utils.py codestyle ( #4277 )
...
* [NFC] polish colossalai/context/random/__init__.py code style
* [NFC] polish applications/Chat/coati/models/utils.py code style
2023-07-26 14:12:57 +08:00
Wenhao Chen
9d02590c9a
[chat] refactor actor class ( #3968 )
...
* refactor: separate log_probs fn from Actor forward fn
* refactor: separate generate fn from Actor class
* feat: update unwrap_model and get_base_model
* unwrap_model returns model not wrapped by Strategy
* get_base_model returns HF model for Actor, Critic and RewardModel
* feat: simplify Strategy.prepare
* style: remove get_base_model method of Actor
* perf: tokenize text in batches
* refactor: move calc_action_log_probs to utils of model
* test: update test with new forward fn
* style: rename forward fn args
* fix: do not unwrap model in save_model fn of naive strategy
* test: add gemini test for train_prompts
* fix: fix _set_default_generate_kwargs
2023-06-13 13:31:56 +08:00
Fazzie-Maqianli
b0ce5a1032
[Coati] first commit ( #3283 )
2023-03-28 20:25:36 +08:00