* add normalize function to value_head in bloom rm
* add normalization to value_function in gpt_rm
* add normalization to value_head of opt_rm
* add Anthropic/hh-rlhf dataset
* Update __init__.py
* Add LogExpLoss in RM training
* Update __init__.py
* update rm trainer to use acc as target
* update example/train_rm
* Update train_rm.sh
* code style
* Update README.md
* Update README.md
* add rm test to ci
* fix tokenier
* fix typo
* change batchsize to avoid oom in ci
* Update test_ci.sh