From bd014673b07fdc561be8c84fe78e085f9af1897c Mon Sep 17 00:00:00 2001 From: Tong Li Date: Tue, 26 Sep 2023 10:58:05 +0800 Subject: [PATCH 1/2] update readme --- applications/Colossal-LLaMA-2/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/applications/Colossal-LLaMA-2/README.md b/applications/Colossal-LLaMA-2/README.md index f0a027d83..3470e8494 100644 --- a/applications/Colossal-LLaMA-2/README.md +++ b/applications/Colossal-LLaMA-2/README.md @@ -73,7 +73,7 @@ The generation config for all dataset is greedy search. > > For other models and other dataset, we calculate logits over "A", "B", "C" and "D". -❗️ More details of the evaluation methods and reproduction of the results, please refer to [TODO: ColossalEval](). +❗️ More details of the evaluation methods and reproduction of the results, please refer to [ColossalEval](https://github.com/hpcaitech/ColossalAI/tree/main/applications/ColossalEval). ### Examples | Question Type | Question |
Colossal-LLaMA-2-7b-base
| From 8cbce6184d831a6d58761ad4a46e6c28137b8047 Mon Sep 17 00:00:00 2001 From: Tong Li Date: Tue, 26 Sep 2023 11:36:53 +0800 Subject: [PATCH 2/2] update --- applications/Colossal-LLaMA-2/README.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/applications/Colossal-LLaMA-2/README.md b/applications/Colossal-LLaMA-2/README.md index 3470e8494..71d1c7bcd 100644 --- a/applications/Colossal-LLaMA-2/README.md +++ b/applications/Colossal-LLaMA-2/README.md @@ -32,6 +32,10 @@ The [Colossal-AI](https://github.com/hpcaitech/ColossalAI) team has introduced t Colossal-LLaMA-2-7B-base is designed to accommodate both the Chinese and English languages, featuring an expansive context window spanning 4096 tokens. Remarkably, it has exhibited exceptional performance when benchmarked against models of equivalent scale in standard Chinese and English evaluation metrics, including C-Eval and MMLU, among others. +❗️**Important notice**: +* All training data used for this project is collected from well-known public dataset. +* We do not use any testing data from the evaluation benchmarks for training. + ### Performance Evaluation We conducted comprehensive evaluation on 4 dataset and compare our Colossal-Llama-2-7b-base model with various models.