You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There is a detailed explanation of the final answer you should extract:
3
+
1. You should extract the final answer option like 'A', 'B', 'C', 'D' ... from the given output sentences.
4
+
2. The question is a single choice question, so the final answer option should be one of the options, not a combination of options.
5
+
"""# noqa
6
+
7
+
MATH_NAVIE_PROMPT_TEMPLATE="""
8
+
This is a detailed explanation of the final answer you should extract:
9
+
1. The question type is a math question, so the final answer should be a number, set, vector, matrix, interval, expression, function, equation, or inequality and any combination of them.
10
+
2. If the final answer includes additional symbols, such as units, you should exclude them and only extract the pure final answer.
### Step 2: Add Naive Model Postprocessor to the configuration file
12
+
13
+
Take GSM8K as an example, you can add the following lines to the configuration file and replace the `api_url` with the correct address of the API server.
14
+
15
+
```python
16
+
...
17
+
from opencompass.utils.model_postprocessors import navie_model_postprocess
18
+
from opencompass.utils.postprocessors.naive importMATH_NAVIE_PROMPT_TEMPLATE
The prompt for extraction can also be customized by changing the `custom_instruction` parameter. Now support two default templates: `MATH_NAVIE_PROMPT_TEMPLATE` for math problems extraction like GSM8K and MATH, and `OPTION_NAVIE_PROMPT_TEMPLATE` for option problems extraction like MMLU. You can also write your own prompt template, like:
38
+
39
+
```python
40
+
OPTION_NAVIE_PROMPT_TEMPLATE="""
41
+
There is a detailed explanation of the final answer you should extract:
42
+
1. You should extract the final answer option like 'A', 'B', 'C', 'D' ... from the given output sentences.
43
+
2. The question is a single choice question, so the final answer option should be one of the options, not a combination of options.
44
+
"""
45
+
```
46
+
47
+
Your prompt should start with `There is a detailed explanation of the final answer you should extract:` and following with your customized instructions.
48
+
49
+
### Step 3: Run the Evaluation as Usual
50
+
51
+
Now you can run the evaluation as usual with the configuration file you modified. The evaluation will use the custom model as the post-process model to get the final result. The final result will be the `model_postprocess_accuracy` in the evaluation result, like:
52
+
53
+
```Markdown
54
+
dataset version metric mode llama-3-8b-instruct-turbomind
We have tested the model postprocess method with different models (Qwen2-72B-Chat, Llama3-8b-Chat) as post-process model on the GSM8K, MMLU datasets for `Meta-Llama-3-8B-Instruct` with above settings, and the results are as follows:
63
+
64
+
```Markdown
65
+
| Dataset | Type | Config ID | Regex Postprocess Score | Model Postprocess Score (Llama3-8b-Instruct) | Model Postprocess Score (Qwen2-72B-Chat) |
# Naive model extractor for OpenCompass, modified from xFinder: https://github.com/IAAR-Shanghai/xFinder # noqa
2
+
importjson
3
+
importtime
4
+
fromloggingimportgetLogger
5
+
6
+
fromopenaiimportOpenAI
7
+
8
+
Meta_Instruction="""I will provide you with a question, output sentences along with an answer range. The output sentences are the response of the question provided. The answer range could either describe the type of answer expected or list all possible valid answers. Using the information provided, you must accurately and precisely determine and extract the intended key answer from the output sentences. Please don't have your subjective thoughts about the question.
9
+
First, you need to determine whether the content of the output sentences is relevant to the given question. If the entire output sentences are unrelated to the question (meaning the output sentences are not addressing the question), then output [No valid answer].
10
+
Otherwise, ignore the parts of the output sentences that have no relevance to the question and then extract the key answer that matches the answer range.
11
+
Below are some special cases you need to be aware of:
12
+
(1) If the output sentences present multiple different answers, carefully determine if the later provided answer is a correction or modification of a previous one. If so, extract this corrected or modified answer as the final response. Conversely, if the output sentences fluctuate between multiple answers without a clear final answer, you should output [No valid answer].
13
+
(2) If the answer range is a list and the key answer in the output sentences is not explicitly listed among the candidate options in the answer range, also output [No valid answer].
14
+
(3) You should only return the precise answer you extract, without processing the answer. Please return only the answer and do not add any additional content.
SYSTEM='You are a help assistant tasked with extracting the precise key answer from given output sentences. You must only provide the extracted key answer without including any additional text.', # noqa
0 commit comments