* refactor: separate log_probs fn from Actor forward fn
* refactor: separate generate fn from Actor class
* feat: update unwrap_model and get_base_model
* unwrap_model returns model not wrapped by Strategy
* get_base_model returns HF model for Actor, Critic and RewardModel
* feat: simplify Strategy.prepare
* style: remove get_base_model method of Actor
* perf: tokenize text in batches
* refactor: move calc_action_log_probs to utils of model
* test: update test with new forward fn
* style: rename forward fn args
* fix: do not unwrap model in save_model fn of naive strategy
* test: add gemini test for train_prompts
* fix: fix _set_default_generate_kwargs
* [chat] strategy refactor unwrap model
* [chat] strategy refactor save model
* [chat] add docstr
* [chat] refactor trainer save model
* [chat] fix strategy typing
* [chat] refactor trainer save model
* [chat] update readme
* [chat] fix unit test
* Update ppo.py
Fix the bug of fetching wrong batch data
* Add peft model support in SFT and Prompts training
In stage-1 and stage-3, the peft model supports are added. So the trained artifacts will be only a small lora additions instead of the whole bunch of files.
* Delete test_prompts.txt
* Delete test_pretrained.txt
* Move the peft stuffs to a community folder.
* Move the demo sft to community
* delete dirty files
* Add instructions to install peft using source
* Remove Chinese comments
* remove the Chinese comments