Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: open-compass/opencompass
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: 0.1.9
Choose a base ref
...
head repository: open-compass/opencompass
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: 0.2.0
Choose a head ref

Commits on Nov 28, 2023

  1. Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
    Copy the full SHA
    e9e75fb View commit details

Commits on Nov 29, 2023

  1. fix hellaswag_ppl_47bff9 (#648)

    Fengzhe Zhou authored Nov 29, 2023

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
    Copy the full SHA
    5933c04 View commit details

Commits on Nov 30, 2023

  1. [Feature] Support chat style inferencer. (#643)

    * [Feature] Support chat style inferencer.
    
    * [Fix] use new prompt
    
    * [Fix] use new prompt
    
    ---------
    
    Co-authored-by: yingfhu <yingfhu@gmail.com>
    mzr1996 and yingfhu authored Nov 30, 2023

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
    Copy the full SHA
    6aaf3b9 View commit details
  2. [Feature] Add Chinese version: commonsenseqa, crowspairs and nq (#144)

    * add Chinese version: csqa crowspairs nq
    
    * Update cn_data
    
    * Update cn_data
    
    * update format
    
    ---------
    
    Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
    Co-authored-by: Leymore <zfz-960727@163.com>
    3 people authored Nov 30, 2023

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
    Copy the full SHA
    e019c83 View commit details

Commits on Dec 1, 2023

  1. [Feature] Add wikibench dataset (#655)

    * Add WikiBench
    
    * Add WikiBench
    
    * format
    
    ---------
    
    Co-authored-by: Leymore <zfz-960727@163.com>
    liushz and Leymore authored Dec 1, 2023

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
    Copy the full SHA
    a331c9a View commit details
  2. [Feat] update gsm8k and math agent config (#652)

    * [Feat] update gsm8k and math agent config
    
    * minor fix
    yingfhu authored Dec 1, 2023

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
    Copy the full SHA
    9eb5cad View commit details
  3. [Feature] Update MathBench CodeInterpreter & fix MathBench Bug (#657)

    * Update MathBench CodeInterpreter & fix MathBench Bug
    
    * Fix errors
    
    * update
    
    ---------
    
    Co-authored-by: liuhongwei <liuhongwei@pjlab.org.cn>
    Co-authored-by: Fengzhe Zhou <zfz-960727@163.com>
    3 people authored Dec 1, 2023

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
    Copy the full SHA
    f4bbff6 View commit details
  4. added rolebench dataset. (#633)

    * added rolebench
    
    * 修改了不合理的变量名
    
    * 修改了评论中的变量名
    rolellm authored Dec 1, 2023

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
    Copy the full SHA
    e10f1c9 View commit details

Commits on Dec 6, 2023

  1. New subjective judgement (#660)

    * TabMWP
    
    * TabMWP
    
    * fixed
    
    * fixed
    
    * fixed
    
    * done
    
    * done
    
    * done
    
    * add new subjective judgement
    
    * add new subjective judgement
    
    * add new subjective judgement
    
    * add new subjective judgement
    
    * add new subjective judgement
    
    * modified to a more general way
    
    * modified to a more general way
    
    * final
    
    * final
    
    * add summarizer
    
    * add new summarize
    
    * fixed
    
    * fixed
    
    * fixed
    
    ---------
    
    Co-authored-by: caomaosong <caomaosong@pjlab.org.cn>
    bittersweet1999 and caomaosong authored Dec 6, 2023

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
    Copy the full SHA
    1c95790 View commit details

Commits on Dec 7, 2023

  1. add qwen and deepseek configs (#672)

    Fengzhe Zhou authored Dec 7, 2023

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
    Copy the full SHA
    3a354bd View commit details

Commits on Dec 8, 2023

  1. [Feature] Add Data Contamination Analysis (#639)

    * add contamination analysis to ceval
    
    * fix bugs
    
    * add contamination docs
    
    * to pass CI check
    
    * update
    
    ---------
    
    Co-authored-by: zhangyifan1 <zhangyifan1@pjlab.org.cn>
    Co-authored-by: Leymore <zfz-960727@163.com>
    3 people authored Dec 8, 2023

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
    Copy the full SHA
    05bbce8 View commit details
  2. Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
    Copy the full SHA
    7cb53a9 View commit details

Commits on Dec 9, 2023

  1. [Feature] Add medbench (#678)

    * update medbench
    
    * medbench update
    
    * format medbench
    
    * format
    
    ---------
    
    Co-authored-by: 施晓明 <PJLAB\shixiaoming@pjnl104220118l.pjlab.org>
    Co-authored-by: Leymore <zfz-960727@163.com>
    3 people authored Dec 9, 2023

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
    Copy the full SHA
    1bf8594 View commit details

Commits on Dec 10, 2023

  1. [Enhancement] Update API Interface and Mixtral (#681)

    * [Enhancement] Update API interface
    
    * [Enhancement] Update API interface
    
    * Update mixtral
    
    * Update readme
    tonysy authored Dec 10, 2023

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
    Copy the full SHA
    e25c5f9 View commit details
  2. Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
    Copy the full SHA
    6a928b9 View commit details

Commits on Dec 11, 2023

  1. [Feat] support pr merge test ci (#669)

    * [Feat] support ci
    
    * [Feat] support ci
    
    * [Feat] support ci
    
    * [Feat] support ci
    
    * init docs
    
    * init docs
    
    * init docs
    yingfhu authored Dec 11, 2023

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
    Copy the full SHA
    1029119 View commit details
  2. [Feature] enhance the ability of humaneval_postprocess (#676)

    * [Feature] enhance the ability of humaneval_postprocess
    
    * refactor
    
    * [Feature] Keep the old version of the function and realize the new function in humaneval_postprocess_v2.
    
    * Update opencompass/datasets/humaneval.py
    
    ---------
    
    Co-authored-by: Leymore <zfz-960727@163.com>
    Co-authored-by: Hubert <42952108+yingfhu@users.noreply.github.com>
    3 people authored Dec 11, 2023

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
    Copy the full SHA
    dd4318f View commit details
  3. [Sync] minor test (#683)

    yingfhu authored Dec 11, 2023

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
    Copy the full SHA
    e78857a View commit details
  4. [Fix] fix docstring (#684)

    yingfhu authored Dec 11, 2023

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
    Copy the full SHA
    4f0b373 View commit details
  5. [Feature] Add Subjective Evaluation (#680)

    * new version of subject
    
    * fixed draw
    
    * fixed draw
    
    * fixed draw
    
    * done
    
    * done
    
    * done
    
    * done
    
    * fixed lint
    bittersweet1999 authored Dec 11, 2023

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
    Copy the full SHA
    465308e View commit details

Commits on Dec 12, 2023

  1. Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
    Copy the full SHA
    3e77175 View commit details
  2. [Sync] format (#690)

    Co-authored-by: Leymore <zfz-960727@163.com>
    yingfhu and Leymore authored Dec 12, 2023

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
    Copy the full SHA
    4780b39 View commit details
Showing with 7,205 additions and 1,310 deletions.
  1. +121 −0 .github/workflows/pr-stage-check.yml
  2. +1 −0 .gitignore
  3. +2 −1 .pre-commit-config-zh-cn.yaml
  4. +2 −1 .pre-commit-config.yaml
  5. +2 −0 README.md
  6. +2 −0 README_zh-CN.md
  7. +7 −0 configs/api_examples/eval_api_360.py
  8. +6 −0 configs/api_examples/eval_api_baichuan.py
  9. +5 −1 configs/api_examples/eval_api_baidu.py
  10. +5 −0 configs/api_examples/eval_api_bytedance.py
  11. +3 −0 configs/api_examples/eval_api_moonshot.py
  12. +1 −1 configs/datasets/CIBench/CIBench_gen.py
  13. +4 −12 configs/datasets/CIBench/{CIBench_gen_eb42f9.py → CIBench_gen_8ab0dc.py}
  14. +128 −0 configs/datasets/MathBench/mathbench_agent_gen_568903.py
  15. +11 −11 configs/datasets/MathBench/{mathbench_gen_ccd638.py → mathbench_arith_gen_ccd638.py}
  16. +2 −2 configs/datasets/MathBench/mathbench_gen_ad37c1.py
  17. +4 −0 configs/datasets/MedBench/medbench_gen.py
  18. +160 −0 configs/datasets/MedBench/medbench_gen_d44f24.py
  19. +107 −0 configs/datasets/ceval/ceval_clean_ppl.py
  20. +4 −0 configs/datasets/commonsenseqa_cn/commonsenseqacn_gen.py
  21. +50 −0 configs/datasets/commonsenseqa_cn/commonsenseqacn_gen_d380d0.py
  22. +4 −0 configs/datasets/commonsenseqa_cn/commonsenseqacn_ppl.py
  23. +52 −0 configs/datasets/commonsenseqa_cn/commonsenseqacn_ppl_971f48.py
  24. +4 −0 configs/datasets/crowspairs_cn/crowspairscn_gen.py
  25. +64 −0 configs/datasets/crowspairs_cn/crowspairscn_gen_556dc9.py
  26. +4 −0 configs/datasets/crowspairs_cn/crowspairscn_ppl.py
  27. +39 −0 configs/datasets/crowspairs_cn/crowspairscn_ppl_f53575.py
  28. +69 −0 configs/datasets/ds1000/ds1000_compl_gen_cbc84f.py
  29. +68 −0 configs/datasets/ds1000/ds1000_compl_service_eval_gen_cbc84f.py
  30. +55 −0 configs/datasets/gsm8k/gsm8k_agent_gen_3ac57d.py
  31. +39 −0 configs/datasets/gsm8k/gsm8k_gen_3309bd.py
  32. +57 −0 configs/datasets/gsm8k_contamination/gsm8k_contamination_ppl_ecdd22.py
  33. +1 −3 configs/datasets/hellaswag/hellaswag_ppl_47bff9.py
  34. +89 −0 configs/datasets/math/math_agent_gen_861b4f.py
  35. +4 −0 configs/datasets/nq_cn/nqcn_gen.py
  36. +34 −0 configs/datasets/nq_cn/nqcn_gen_141737.py
  37. +41 −0 configs/datasets/rolebench/instruction_generalization_eng.py
  38. +41 −0 configs/datasets/rolebench/instruction_generalization_zh.py
  39. +41 −0 configs/datasets/rolebench/role_generalization_eng.py
  40. +2 −2 configs/datasets/subjective_cmp/subjective_cmp.py
  41. +62 −0 configs/datasets/subjective_cmp/subjective_corev2.py
  42. +60 −0 configs/datasets/subjective_cmp/subjective_creation.py
  43. +4 −0 configs/datasets/wikibench/wikibench_gen.py
  44. +56 −0 configs/datasets/wikibench/wikibench_gen_f96ece.py
  45. +1 −1 configs/datasets/winogrande/winogrande_ppl.py
  46. +4 −0 configs/datasets/winogrande/winogrande_ppl_55a66e.py
  47. +33 −0 configs/datasets/winogrande/winogrande_ppl_8be6c3.py
  48. +4 −0 configs/datasets/winogrande/winogrande_ppl_9307fd.py
  49. +64 −0 configs/eval_chat_agent.py
  50. +93 −0 configs/eval_chat_cibench.py
  51. +35 −0 configs/eval_chat_last.py
  52. +12 −0 configs/eval_contamination.py
  53. +8 −0 configs/eval_mixtral_8x7b.py
  54. +0 −148 configs/eval_openai_agent.py
  55. +43 −0 configs/eval_with_model_dataset_combinations.py
  56. +1 −1 configs/models/baichuan/hf_baichuan2_7b_chat.py
  57. +1 −1 configs/models/chatglm/hf_chatglm3_6b.py
  58. +24 −0 configs/models/deepseek/hf_deepseek_67b_base.py
  59. +32 −0 configs/models/deepseek/hf_deepseek_67b_chat.py
  60. +24 −0 configs/models/deepseek/hf_deepseek_7b_base.py
  61. +32 −0 configs/models/deepseek/hf_deepseek_7b_chat.py
  62. +2 −1 configs/models/hf_internlm/hf_internlm_chat_20b.py
  63. +2 −1 configs/models/hf_internlm/hf_internlm_chat_7b.py
  64. +1 −0 configs/models/hf_internlm/hf_internlm_chat_7b_8k.py
  65. +19 −0 configs/models/mixtral/mixtral_8x7b_32k.py
  66. +4 −2 configs/models/qwen/hf_qwen_14b_chat.py
  67. +25 −0 configs/models/qwen/hf_qwen_1_8b.py
  68. +32 −0 configs/models/qwen/hf_qwen_1_8b_chat.py
  69. +25 −0 configs/models/qwen/hf_qwen_72b.py
  70. +32 −0 configs/models/qwen/hf_qwen_72b_chat.py
  71. +4 −2 configs/models/qwen/hf_qwen_7b_chat.py
  72. +0 −49 configs/subjective.py
  73. +97 −0 configs/subjective_compare.py
  74. +96 −0 configs/subjective_score.py
  75. +157 −0 configs/summarizers/contamination.py
  76. +4 −0 configs/summarizers/groups/cibench.py
  77. +75 −0 configs/summarizers/groups/mathbench.py
  78. +28 −0 configs/summarizers/math_agent.py
  79. +56 −0 docs/en/advanced_guides/contamination_eval.md
  80. +54 −103 docs/en/advanced_guides/subjective_evaluation.md
  81. +1 −0 docs/en/index.rst
  82. +50 −0 docs/zh_cn/advanced_guides/contamination_eval.md
  83. +57 −105 docs/zh_cn/advanced_guides/subjective_evaluation.md
  84. +1 −0 docs/zh_cn/index.rst
  85. +1 −1 opencompass/__init__.py
  86. +7 −0 opencompass/datasets/__init__.py
  87. +48 −0 opencompass/datasets/ceval.py
  88. +156 −66 opencompass/datasets/cibench.py
  89. +2 −0 opencompass/datasets/cmnli.py
  90. +30 −0 opencompass/datasets/commonsenseqa_cn.py
  91. +23 −0 opencompass/datasets/crowspairs_cn.py
  92. +11 −0 opencompass/datasets/ds1000.py
  93. +15 −6 opencompass/datasets/gsm8k.py
  94. +47 −0 opencompass/datasets/humaneval.py
  95. +3 −0 opencompass/datasets/medbench/__init__.py
  96. +104 −0 opencompass/datasets/medbench/constructions.py
  97. +338 −0 opencompass/datasets/medbench/dataset_loader.py
  98. +43 −0 opencompass/datasets/medbench/evaluation.py
  99. +161 −0 opencompass/datasets/medbench/math_equivalence.py
  100. +646 −0 opencompass/datasets/medbench/medbench.py
  101. +198 −0 opencompass/datasets/medbench/post_process.py
  102. +43 −0 opencompass/datasets/medbench/utils.py
  103. +54 −0 opencompass/datasets/natural_question_cn.py
  104. +84 −0 opencompass/datasets/rolebench.py
  105. +274 −0 opencompass/datasets/subject_corev2.py
  106. +92 −0 opencompass/datasets/subject_creationv01.py
  107. +21 −204 opencompass/datasets/subjective_cmp.py
  108. +62 −0 opencompass/datasets/wikibench.py
  109. +14 −16 opencompass/datasets/winogrande.py
  110. +28 −7 opencompass/lagent/actions/ipython_interpreter.py
  111. +3 −131 opencompass/lagent/agents/react.py
  112. +1 −0 opencompass/models/__init__.py
  113. +11 −8 opencompass/models/ai360_api.py
  114. +9 −2 opencompass/models/baichuan_api.py
  115. +15 −13 opencompass/models/baidu_api.py
  116. +14 −0 opencompass/models/base.py
  117. +17 −2 opencompass/models/base_api.py
  118. +18 −13 opencompass/models/bytedance_api.py
  119. +135 −8 opencompass/models/huggingface.py
  120. +63 −70 opencompass/models/lagent.py
  121. +5 −3 opencompass/models/llama2.py
  122. +109 −0 opencompass/models/mixtral.py
  123. +9 −6 opencompass/models/moonshot_api.py
  124. +2 −0 opencompass/models/openai_api.py
  125. +1 −0 opencompass/openicl/icl_evaluator/__init__.py
  126. +48 −0 opencompass/openicl/icl_evaluator/icl_hf_evaluator.py
  127. +11 −0 opencompass/openicl/icl_evaluator/icl_misc_evaluator.py
  128. +32 −53 opencompass/openicl/icl_evaluator/lm_evaluator.py
  129. +3 −0 opencompass/openicl/icl_inferencer/__init__.py
  130. +117 −119 opencompass/openicl/icl_inferencer/icl_agent_inferencer.py
  131. +372 −0 opencompass/openicl/icl_inferencer/icl_chat_inferencer.py
  132. +8 −1 opencompass/openicl/icl_inferencer/icl_gen_inferencer.py
  133. +215 −0 opencompass/openicl/icl_inferencer/icl_loglikelihood_inferencer.py
  134. +188 −0 opencompass/openicl/icl_inferencer/icl_ppl_only_inferencer.py
  135. +44 −7 opencompass/partitioners/base.py
  136. +21 −19 opencompass/partitioners/naive.py
  137. +46 −43 opencompass/partitioners/size.py
  138. +47 −13 opencompass/partitioners/sub_naive.py
  139. +17 −10 opencompass/runners/slurm_sequential.py
  140. +5 −5 opencompass/summarizers/__init__.py
  141. +172 −0 opencompass/summarizers/corev2.py
  142. +124 −0 opencompass/summarizers/creationv01.py
  143. +34 −15 opencompass/summarizers/default.py
  144. +9 −2 opencompass/tasks/openicl_eval.py
  145. +8 −15 opencompass/tasks/subjective_eval.py
  146. +8 −0 opencompass/utils/text_postprocessors.py
  147. +7 −0 requirements/agent.txt
  148. +0 −3 requirements/extra.txt
  149. +2 −1 requirements/runtime.txt
121 changes: 121 additions & 0 deletions .github/workflows/pr-stage-check.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
name: pr_stage_test

on:
pull_request:
paths-ignore:
- 'README.md'
- 'README_zh-CN.md'
- 'docs/**'
- 'configs/**'
- 'tools/**'

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
build:
runs-on: ubuntu-22.04
strategy:
matrix:
python-version: ['3.10']
include:
- torch: 2.0.0
steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Upgrade pip
run: python -m pip install --upgrade pip
- name: Install PyTorch
run: pip install torch==${{matrix.torch}}+cpu -f https://download.pytorch.org/whl/cpu/torch_stable.html
- name: Install system dependencies
run: |
sudo sed -i '$ a deb http://th.archive.ubuntu.com/ubuntu jammy main' /etc/apt/sources.list
sudo apt-get update && sudo apt-get install -y libc6 libffi-dev libncursesw6 wget unzip
- name: Upgrade pip
run: python -m pip install pip --upgrade
- name: Install opencompass dependencies
run: |
python -m pip install -r requirements.txt
- name: Build and install
run: python -m pip install -e .
- name: Prepare dataset
run: |
wget https://github.com/open-compass/opencompass/releases/download/0.1.8.rc1/OpenCompassData-core-20231110.zip
unzip OpenCompassData-core-20231110.zip
- name: Dry run test
run: |
python run.py --models hf_opt_125m --datasets siqa_gen winograd_ppl --dry-run
build_cu117:
runs-on: ubuntu-22.04
container:
image: pytorch/pytorch:2.0.0-cuda11.7-cudnn8-devel
strategy:
matrix:
python-version: ['3.10']
steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Fetch GPG keys
run: |
apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub
apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/7fa2af80.pub
- name: Install Python-dev
run: apt-get update && apt-get install -y python${{matrix.python-version}}-dev
if: ${{matrix.python-version != 3.10}}
- name: Install system dependencies
run: |
apt-get update
apt-get install -y ffmpeg libsm6 libxext6 git ninja-build libglib2.0-0 libxrender-dev libc6 libc6-dev
sed -i '$ a deb http://th.archive.ubuntu.com/ubuntu jammy main' /etc/apt/sources.list
apt-get update && apt-get install -y libc6 libffi-dev libncursesw6 wget unzip
- name: Upgrade pip
run: python -m pip install pip --upgrade
- name: Install opencompass dependencies
run: |
python -m pip install -r requirements.txt
- name: Build and install
run: python -m pip install -e .
- name: Prepare dataset
run: |
wget https://github.com/open-compass/opencompass/releases/download/0.1.8.rc1/OpenCompassData-core-20231110.zip
unzip OpenCompassData-core-20231110.zip
- name: Dry run test
run: |
python run.py --models hf_opt_125m --datasets siqa_gen winograd_ppl --dry-run
build_windows:
runs-on: windows-2022
strategy:
matrix:
python-version: ['3.10']
platform: [cpu]
steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Upgrade pip
run: python -m pip install pip --upgrade
- name: Install PyTorch
run: pip install torch==2.0.0+${{matrix.platform}} -f https://download.pytorch.org/whl/${{matrix.platform}}/torch_stable.html
- name: Install opencompass dependencies
run: |
pip install -r requirements.txt
- name: Build and install
run: pip install -e .
- name: Prepare dataset
run: |
Invoke-WebRequest -Uri https://github.com/open-compass/opencompass/releases/download/0.1.8.rc1/OpenCompassData-core-20231110.zip -OutFile OpenCompassData-core-20231110.zip
unzip OpenCompassData-core-20231110.zip
- name: Dry run test
run: |
python run.py --models hf_opt_125m --datasets siqa_gen winograd_ppl --dry-run
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -11,6 +11,7 @@ configs/eval_debug*.py
configs/viz_*.py
data
work_dirs
models
configs/internal/
# Byte-compiled / optimized / DLL files
__pycache__/
3 changes: 2 additions & 1 deletion .pre-commit-config-zh-cn.yaml
Original file line number Diff line number Diff line change
@@ -5,7 +5,8 @@ exclude: |
opencompass/utils/internal/|
opencompass/openicl/icl_evaluator/hf_metrics/|
opencompass/datasets/lawbench/utils|
opencompass/datasets/lawbench/evaluation_functions/
opencompass/datasets/lawbench/evaluation_functions/|
opencompass/datasets/medbench
)
repos:
- repo: https://gitee.com/openmmlab/mirrors-flake8
3 changes: 2 additions & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -5,7 +5,8 @@ exclude: |
opencompass/utils/internal/|
opencompass/openicl/icl_evaluator/hf_metrics/|
opencompass/datasets/lawbench/utils|
opencompass/datasets/lawbench/evaluation_functions/
opencompass/datasets/lawbench/evaluation_functions/|
opencompass/datasets/medbench/
)
repos:
- repo: https://github.com/PyCQA/flake8
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -50,6 +50,8 @@ Just like a compass guides us on our journey, OpenCompass will guide you through
## 🚀 What's New <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>

- **\[2023.12.10\]** We have released [VLMEvalKit](https://github.com/open-compass/VLMEvalKit), a toolkit for evaluating vision-language models (VLMs), currently support 20+ VLMs and 7 multi-modal benchmarks (including MMBench series). 🔥🔥🔥.
- **\[2023.12.10\]** We have supported Mistral AI's MoE LLM: **Mixtral-8x7B-32K**. Welcome to [MixtralKit](https://github.com/open-compass/MixtralKit) for more details about inference and evaluation. 🔥🔥🔥.
- **\[2023.11.22\]** We have supported many API-based models, include **Baidu, ByteDance, Huawei, 360**. Welcome to [Models](https://opencompass.readthedocs.io/en/latest/user_guides/models.html) section for more details. 🔥🔥🔥.
- **\[2023.11.20\]** Thanks [helloyongyang](https://github.com/helloyongyang) for supporting the evaluation with [LightLLM](https://github.com/ModelTC/lightllm) as backent. Welcome to [Evaluation With LightLLM](https://opencompass.readthedocs.io/en/latest/advanced_guides/evaluation_lightllm.html) for more details. 🔥🔥🔥.
- **\[2023.11.13\]** We are delighted to announce the release of OpenCompass v0.1.8. This version enables local loading of evaluation benchmarks, thereby eliminating the need for an internet connection. Please note that with this update, **you must re-download all evaluation datasets** to ensure accurate and up-to-date results.🔥🔥🔥.
2 changes: 2 additions & 0 deletions README_zh-CN.md
Original file line number Diff line number Diff line change
@@ -50,6 +50,8 @@
## 🚀 最新进展 <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>

- **\[2023.12.10\]** 我们开源了多模评测框架 [VLMEvalKit](https://github.com/open-compass/VLMEvalKit),目前已支持 20+ 个多模态大模型与包括 MMBench 系列在内的 7 个多模态评测集. 🔥🔥🔥.
- **\[2023.12.10\]** 我们已经支持了Mistral AI的MoE模型 **Mixtral-8x7B-32K**。欢迎查阅[MixtralKit](https://github.com/open-compass/MixtralKit)以获取更多关于推理和评测的详细信息。🔥🔥🔥。
- **\[2023.11.22\]** 我们已经支持了多个于API的模型,包括**百度、字节跳动、华为、360**。欢迎查阅[模型](https://opencompass.readthedocs.io/en/latest/user_guides/models.html)部分以获取更多详细信息。🔥🔥🔥。
- **\[2023.11.20\]** 感谢[helloyongyang](https://github.com/helloyongyang)支持使用[LightLLM](https://github.com/ModelTC/lightllm)作为后端进行评估。欢迎查阅[使用LightLLM进行评估](https://opencompass.readthedocs.io/en/latest/advanced_guides/evaluation_lightllm.html)以获取更多详细信息。🔥🔥🔥。
- **\[2023.11.13\]** 我们很高兴地宣布发布 OpenCompass v0.1.8 版本。此版本支持本地加载评估基准,从而无需连接互联网。请注意,随着此更新的发布,**您需要重新下载所有评估数据集**,以确保结果准确且最新。🔥🔥🔥。
7 changes: 7 additions & 0 deletions configs/api_examples/eval_api_360.py
Original file line number Diff line number Diff line change
@@ -18,6 +18,13 @@
type=AI360GPT,
path='360GPT_S2_V9',
key="xxxxxxxxxxxx",
generation_kwargs={
'temperature': 0.9,
'max_tokens': 2048,
'top_p': 0.5,
'tok_k': 0,
'repetition_penalty': 1.05,
},
query_per_second=1,
max_out_len=2048,
max_seq_len=2048,
6 changes: 6 additions & 0 deletions configs/api_examples/eval_api_baichuan.py
Original file line number Diff line number Diff line change
@@ -20,6 +20,12 @@
api_key='xxxxxx',
secret_key="xxxxx",
url="xxxxx",
generation_kwargs={
'temperature': 0.3,
'top_p': 0.85,
'top_k': 5,
'with_search_enhance': False,
},
query_per_second=1,
max_out_len=2048,
max_seq_len=2048,
6 changes: 5 additions & 1 deletion configs/api_examples/eval_api_baidu.py
Original file line number Diff line number Diff line change
@@ -20,10 +20,14 @@
key='xxxxxx', # please give you key
secretkey='xxxxxxxxx', # please give your group_id
url='xxxxxxxxx',
generation_kwargs = {
'temperature': 0.8,
},
query_per_second=1,
max_out_len=2048,
max_seq_len=2048,
batch_size=8),
batch_size=8
),
]

infer = dict(
5 changes: 5 additions & 0 deletions configs/api_examples/eval_api_bytedance.py
Original file line number Diff line number Diff line change
@@ -21,6 +21,11 @@
accesskey="xxxxxxx",
secretkey="xxxxxxx",
url='xxxxxx',
generation_kwargs={
'temperature': 0.7,
'top_p': 0.9,
'top_k': 0,
},
query_per_second=1,
max_out_len=2048,
max_seq_len=2048,
3 changes: 3 additions & 0 deletions configs/api_examples/eval_api_moonshot.py
Original file line number Diff line number Diff line change
@@ -19,6 +19,9 @@
path='moonshot-v1-32k',
key='xxxxxxx',
url= 'xxxxxxxx',
system_prompt= '你是 Kimi,由 Moonshot AI 提供的人工智能助手,你更擅长中文和英文的对话。'
'你会为用户提供安全,有帮助,准确的回答。同时,你会拒绝一些涉及恐怖主义,种族歧视,'
'黄色暴力等问题的回答。Moonshot AI 为专有名词,不可翻译成其他语言。',
query_per_second=1,
max_out_len=2048,
max_seq_len=2048,
2 changes: 1 addition & 1 deletion configs/datasets/CIBench/CIBench_gen.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from mmengine.config import read_base

with read_base():
from .CIBench_gen_eb42f9 import ci_datasets # noqa: F401, F403
from .CIBench_gen_8ab0dc import ci_datasets # noqa: F401, F403
Original file line number Diff line number Diff line change
@@ -16,28 +16,20 @@
template="""{questions}""",
),
retriever=dict(type=ZeroRetriever),
inferencer=dict(type=AgentInferencer),
inferencer=dict(type=AgentInferencer, infer_mode='every'),
)


libs = ['Pandas', 'Matplotlib', 'Opencv', 'SciPy', 'Seaborn', 'PyTorch']
cibench_eval_cfg = {
lib: dict(
evaluator=dict(
type=CIBenchEvaluator,
output_dir=f'output_data/cibench/{lib}'),
pred_role="BOT",
)
for lib in libs
}
cibench_eval_cfg = dict(evaluator=dict(type=CIBenchEvaluator), pred_role="BOT")

cibench_datasets = [
dict(
abbr=f"cibench_{lib}",
abbr=f"cibench_generation_{lib}",
type=CIBenchDataset,
path=f"./data/cibench/{lib}",
reader_cfg=cibench_reader_cfg,
infer_cfg=cibench_infer_cfg,
eval_cfg=cibench_eval_cfg[lib],
eval_cfg=cibench_eval_cfg,
) for lib in libs
]
Loading