* feat: remove on_learn_epoch fn as not used
* revert: add _on_learn_epoch fn
* feat: remove NaiveStrategy
* test: update train_prompts tests
* fix: remove prepare_llama_tokenizer_and_embedding
* test: add lora arg
* feat: remove roberta support in train_prompts due to runtime errs
* feat: remove deberta & roberta in rm as not used
* test: remove deberta and roberta tests
* feat: remove deberta and roberta models as not used
* fix: remove calls to roberta
* fix: remove prepare_llama_tokenizer_and_embedding
* chore: update transformers version
* docs: update transformers version
* fix: fix actor inference
* fix: fix ci
* feat: change llama pad token to unk
* revert: revert ddp setup_distributed
* fix: change llama pad token to unk
* revert: undo unnecessary changes
* fix: use pip to install transformers
* Add RoBERTa for RLHF Stage 2 & 3 (test)
RoBERTa for RLHF Stage 2 & 3 (still in testing)
* Revert "Add RoBERTa for RLHF Stage 2 & 3 (test)"
This reverts commit 06741d894d.
* Add RoBERTa for RLHF stage 2 & 3
1. add roberta folder under model folder
2. add roberta option in train_reward_model.py
3. add some test in testci
* add test for reward model training
* Update test_ci.sh
* Revert "Update test_ci.sh"
This reverts commit 9c7352b81766f3177d31eeec0ec178a301df966a.
* Add RoBERTa for RLHF Stage 2 & 3 (test)
RoBERTa for RLHF Stage 2 & 3 (still in testing)
* Revert "Add RoBERTa for RLHF Stage 2 & 3 (test)"
This reverts commit 06741d894d.
* Add RoBERTa for RLHF stage 2 & 3
1. add roberta folder under model folder
2. add roberta option in train_reward_model.py
3. add some test in testci
* Update test_ci.sh
* Revert "Update test_ci.sh"
This reverts commit 9c7352b81766f3177d31eeec0ec178a301df966a.
* update roberta with coati