From c1a355940ea4c5ec203bc9295e057fd3c8ca5efb Mon Sep 17 00:00:00 2001 From: Tong Li Date: Fri, 28 Apr 2023 11:56:35 +0800 Subject: [PATCH] update readme --- applications/Chat/evaluate/README.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/applications/Chat/evaluate/README.md b/applications/Chat/evaluate/README.md index d776a3e1f..7ace4bfe6 100644 --- a/applications/Chat/evaluate/README.md +++ b/applications/Chat/evaluate/README.md @@ -7,13 +7,13 @@ In this directory, we introduce how you can evaluate your model with GPT-4. The whole evaluation process undergoes the following three steps: 1. Prepare the questions following the internal data structure in the data format section (described below). 2. Generate answers from different models: - * Generate answers using GPT-3.5: [generate_gpt35_answers.py](generate_gpt35_answers.py). - * Generate answers using your own models: [generate_answers.py](generate_answers.py). -3. Evaluate models using GPT-4: [evaluate.py](evaluate.py). + * Generate answers using GPT-3.5: [`generate_gpt35_answers.py`](generate_gpt35_answers.py). + * Generate answers using your own models: [`generate_answers.py`](generate_answers.py). +3. Evaluate models using GPT-4: [`evaluate.py`](evaluate.py). ### Generate Answers #### Generate Answers Using GPT-3.5 -You can provide your own OpenAI key to generate answers from GPT-3.5 using [generate_gpt35_answers.py](./generate_gpt35_answers.py). +You can provide your own OpenAI key to generate answers from GPT-3.5 using [`generate_gpt35_answers.py`](./generate_gpt35_answers.py). An example script is provided as follows: ```shell @@ -27,8 +27,8 @@ python generate_gpt35_answers.py \ #### Generate Answers Using our Own Model You can also generate answers using your own models. The generation process is divided into two stages: -1. Generate answers using multiple GPUs (optional) with batch processing: [generate_answers.py](./generate_answers.py). -2. Merge multiple shards and output a single file: [merge.py](./merge.py). +1. Generate answers using multiple GPUs (optional) with batch processing: [`generate_answers.py`](./generate_answers.py). +2. Merge multiple shards and output a single file: [`merge.py`](./merge.py). An example script is given as follows: @@ -63,7 +63,7 @@ done ### Evaluate Answers -In [evaluate.py](./evaluate.py), GPT-4 helps to review and score answers of two different models. Here `Model 1` refers to the first model you specify in the `--answer_file_list` and `Model 2` refers to the second model. The script shows several metrics and output the corresponding JSON files. +In [`evaluate.py`](./evaluate.py), GPT-4 helps to review and score answers of two different models. Here `Model 1` refers to the first model you specify in the `--answer_file_list` and `Model 2` refers to the second model. The script shows several metrics and output the corresponding JSON files. The metrics include: @@ -105,7 +105,7 @@ We would like to mention that the evaluation of model answers using the GPT-3.5 ## Data Format ### Questions -The file [questions.json](./sample/questions.json) shows the example questions used to evaluate the performance of the model. Each question record has the following field: +The file [`questions.json`](./sample/questions.json) shows the example questions used to evaluate the performance of the model. Each question record has the following field: * `id` (id, compulsory): The ID of the instruction / question. * `instruction` (str, compulsory): The instruction / question for the LLM. * `input` (str, optional): The additional context of the instruction / question. @@ -163,11 +163,11 @@ A record has the following field: ### Prompts -The data format is the same with [FastChat's](https://github.com/lm-sys/FastChat/blob/main/fastchat/eval/table/prompt.jsonl) prompts. +The data format is the same with [`FastChat's`](https://github.com/lm-sys/FastChat/blob/main/fastchat/eval/table/prompt.jsonl) prompts. ### Reviewer -The data format is the same with [FastChat's](https://github.com/lm-sys/FastChat/blob/main/fastchat/eval/table/reviewer.jsonl) reviewers. +The data format is the same with [`FastChat's`](https://github.com/lm-sys/FastChat/blob/main/fastchat/eval/table/reviewer.jsonl) reviewers. ## Citations