mirror of https://github.com/hpcaitech/ColossalAI
parent
bbac6760e5
commit
9bc702ab48
|
@ -22,10 +22,10 @@ torchrun --standalone --nproc_per_node=2 train_reward_model.py --pretrain "faceb
|
|||
- We add special token to the end of the sequence to get better result.
|
||||
- We use cosine-reducing lr-scheduler for RM training.
|
||||
- We set value_head as 1 liner layer and initialize the weight of value_head using N(0,1/(d_model + 1)) distribution.
|
||||
- We train a Bloom-560m reward model for 1 epoch and find the test acc of the model achieve the performance mentions in [Anthropics paper](https://arxiv.org/abs/2112.00861).
|
||||
- We train a Bloom-560m reward model for 1 epoch and find the test acc of the model achieve the performance mentions in [Anthropics paper](https://arxiv.org/abs/2204.05862).
|
||||
|
||||
### Experiment result
|
||||
Model performance in [Anthropics paper](https://arxiv.org/abs/2112.00861):
|
||||
Model performance in [Anthropics paper](https://arxiv.org/abs/2204.05862):
|
||||
|
||||
<div align=center> <img width="512" alt="image" src="https://user-images.githubusercontent.com/70618399/225263321-8d64c3a8-6877-4cc8-9b61-0e1c52d3d94f.png">
|
||||
|
||||
|
|
Loading…
Reference in New Issue