[doc] update chatgpt doc paper link (#3229)

#issue 3189
pull/3230/head
Camille Zhong 2 years ago committed by GitHub
parent bbac6760e5
commit 9bc702ab48
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -22,10 +22,10 @@ torchrun --standalone --nproc_per_node=2 train_reward_model.py --pretrain "faceb
- We add special token to the end of the sequence to get better result.
- We use cosine-reducing lr-scheduler for RM training.
- We set value_head as 1 liner layer and initialize the weight of value_head using N(01/(d_model + 1)) distribution.
- We train a Bloom-560m reward model for 1 epoch and find the test acc of the model achieve the performance mentions in [Anthropics paper](https://arxiv.org/abs/2112.00861).
- We train a Bloom-560m reward model for 1 epoch and find the test acc of the model achieve the performance mentions in [Anthropics paper](https://arxiv.org/abs/2204.05862).
### Experiment result
Model performance in [Anthropics paper](https://arxiv.org/abs/2112.00861):
Model performance in [Anthropics paper](https://arxiv.org/abs/2204.05862):
<div align=center> <img width="512" alt="image" src="https://user-images.githubusercontent.com/70618399/225263321-8d64c3a8-6877-4cc8-9b61-0e1c52d3d94f.png">

Loading…
Cancel
Save