diff --git a/applications/ChatGPT/examples/README.md b/applications/ChatGPT/examples/README.md index ce73a5407..60e6d68bd 100644 --- a/applications/ChatGPT/examples/README.md +++ b/applications/ChatGPT/examples/README.md @@ -22,10 +22,10 @@ torchrun --standalone --nproc_per_node=2 train_reward_model.py --pretrain "faceb - We add special token to the end of the sequence to get better result. - We use cosine-reducing lr-scheduler for RM training. - We set value_head as 1 liner layer and initialize the weight of value_head using N(0,1/(d_model + 1)) distribution. -- We train a Bloom-560m reward model for 1 epoch and find the test acc of the model achieve the performance mentions in [Anthropics paper](https://arxiv.org/abs/2112.00861). +- We train a Bloom-560m reward model for 1 epoch and find the test acc of the model achieve the performance mentions in [Anthropics paper](https://arxiv.org/abs/2204.05862). ### Experiment result -Model performance in [Anthropics paper](https://arxiv.org/abs/2112.00861): +Model performance in [Anthropics paper](https://arxiv.org/abs/2204.05862):
image