本篇介绍如何在 Jupyter Notebook 中使用 OpenCompass 对大模型进行评测。

评测对象

主要评测对象为语言大模型与多模态大模型:

  • 基座模型:一般是经过海量的文本数据以自监督学习的方式进行训练获得的模型(如OpenAI的GPT-3,Meta的LLaMA),往往具有强大的文字续写能力。
  • 对话模型:一般是在的基座模型的基础上,经过指令微调或人类偏好对齐获得的模型(如OpenAI的ChatGPT、上海人工智能实验室的书生·浦语),能理解人类指令,具有较强的对话能力。

工具架构

  • 模型层:大模型评测所涉及的主要模型种类,OpenCompass以基座模型和对话模型作为重点评测对象。
  • 能力层:OpenCompass从本方案从通用能力和特色能力两个方面来进行评测维度设计。在模型通用能力方面,从语言、知识、理解、推理、安全等多个能力维度进行评测。在特色能力方面,从长文本、代码、工具、知识增强等维度进行评测。
  • 方法层:OpenCompass采用客观评测与主观评测两种评测方式。客观评测能便捷地评估模型在具有确定答案(如选择,填空,封闭式问答等)的任务上的能力,主观评测能评估用户对模型回复的真实满意度,OpenCompass采用基于模型辅助的主观评测和基于人类反馈的主观评测两种方式。
  • 工具层:OpenCompass提供丰富的功能支持自动化地开展大语言模型的高效评测。包括分布式评测技术,提示词工程,对接评测数据库,评测榜单发布,评测报告生成等诸多功能。

能力维度

通用能力涵盖学科综合能力、知识能力、语言能力、理解能力、推理能力、安全能力,共计六大维度构造立体全面的模型能力评价体系。

评测方法

OpenCompass采取客观评测与主观评测相结合的方法。针对具有确定性答案的能力维度和场景,通过构造丰富完善的评测集,对模型能力进行综合评价。针对体现模型能力的开放式或半开放式的问题、模型安全问题等,采用主客观相结合的评测方式。

客观评测

针对具有标准答案的客观问题,我们可以我们可以通过使用定量指标比较模型的输出与标准答案的差异,并根据结果衡量模型的性能。同时,由于大语言模型输出自由度较高,在评测阶段,我们需要对其输入和输出作一定的规范和设计,尽可能减少噪声输出在评测阶段的影响,才能对模型的能力有更加完整和客观的评价。

为了更好地激发出模型在题目测试领域的能力,并引导模型按照一定的模板输出答案,OpenCompass采用提示词工程 (prompt engineering)和语境学习(in-context learning)进行客观评测。

在客观评测的具体实践中,我们通常采用下列两种方式进行模型输出结果的评测:

  • 判别式评测:该评测方式基于将问题与候选答案组合在一起,计算模型在所有组合上的困惑度(perplexity),并选择困惑度最小的答案作为模型的最终输出。例如,若模型在 问题? 答案1 上的困惑度为 0.1,在 问题? 答案2 上的困惑度为 0.2,最终我们会选择 答案1 作为模型的输出。
  • 生成式评测:该评测方式主要用于生成类任务,如语言翻译、程序生成、逻辑分析题等。具体实践时,使用问题作为模型的原始输入,并留白答案区域待模型进行后续补全。我们通常还需要对其输出进行后处理,以保证输出满足数据集的要求。

主观评测

语言表达生动精彩,变化丰富,大量的场景和能力无法凭借客观指标进行评测。针对如模型安全和模型语言能力的评测,以人的主观感受为主的评测更能体现模型的真实能力,并更符合大模型的实际使用场景。

OpenCompass采取的主观评测方案是指借助受试者的主观判断对具有对话能力的大语言模型进行能力评测。在具体实践中,我们提前基于模型的能力维度构建主观测试问题集合,并将不同模型对于同一问题的不同回复展现给受试者,收集受试者基于主观感受的评分。由于主观测试成本高昂,本方案同时也采用使用性能优异的大语言模拟人类进行主观打分。在实际评测中,本文将采用真实人类专家的主观评测与基于模型打分的主观评测相结合的方式开展模型能力评估。

在具体开展主观评测时,OpenComapss采用单模型回复满意度统计和多模型满意度比较两种方式开展具体的评测工作。

评测流程

在 OpenCompass 中评估一个模型通常包括以下几个阶段:配置 -> 推理 -> 评估 -> 可视化。

  • 配置:这是整个工作流的起点。您需要配置整个评估过程,选择要评估的模型和数据集。此外,还可以选择评估策略、计算后端等,并定义显示结果的方式。
  • 推理与评估:在这个阶段,OpenCompass 将会开始对模型和数据集进行并行推理和评估。推理阶段主要是让模型从数据集产生输出,而评估阶段则是衡量这些输出与标准答案的匹配程度。这两个过程会被拆分为多个同时运行的“任务”以提高效率,但请注意,如果计算资源有限,这种策略可能会使评测变得更慢。
  • 可视化:评估完成后,OpenCompass 将结果整理成易读的表格,并将其保存为 CSV 和 TXT 文件。你也可以激活飞书状态上报功能,此后可以在飞书客户端中及时获得评测状态报告。

接下来,我们将展示 OpenCompass 的基础用法,展示书生浦语在 C-Eval 基准任务上的评估。它们的配置文件可以在 configs/eval_demo.py 中找到。

所有命令均在 jupyter notebook 中执行

环境配置

以下命令在 terminal 中执行:

1
2
3
4
conda create --name opencompass --clone=/root/share/conda_envs/internlm-base
conda activate opencompass
pip install ipykernel
python -m ipykernel install --user --name opencompass --display-name opencompass

上面基础环境配置好后,就可以打开 jupyter,创建 notebook,选择内核 opencompass 进行代码编写

1
2
3
4
5
6
7
8
9
10
11
12
# jupyter notebook 环境配置
# 设置notebook环境
import os, sys

PATH = os.environ['PATH']
basedir = os.path.dirname(os.path.dirname(sys.exec_prefix))

# 这里的 $PATH 也可以替换为 {os.environ['PATH']}。这里只是为了展示 $变量 的形式也是可行的
%env CONDA_EXE={os.path.join(basedir, 'bin/conda')}
%env CONDA_PREFIX={sys.exec_prefix}
%env CONDA_PYTHON_EXE={os.path.join(basedir, 'bin/python')}
%env PATH={os.path.join(sys.exec_prefix, 'bin')}:$PATH
env: CONDA_EXE=/root/.conda/bin/conda
env: CONDA_PREFIX=/root/.conda/envs/opencompass
env: CONDA_PYTHON_EXE=/root/.conda/bin/python
env: PATH=/root/.conda/envs/opencompass/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
1
%pip install -q pickleshare
1
2
3
# 安装opencompass
%cd ~
!git clone https://github.com/open-compass/opencompass
1
2
%cd opencompass
%pip install -e -q .

数据准备

1
%cp /share/temp/datasets/OpenCompassData-core-20231110.zip /root/opencompass/
1
!unzip -q OpenCompassData-core-20231110.zip
1
2
# 列出所有跟 internlm 及 ceval 相关的配置
!python tools/list_configs.py internlm ceval
+--------------------------+--------------------------------------------------------+
| Model                    | Config Path                                            |
|--------------------------+--------------------------------------------------------|
| hf_internlm2_20b         | configs/models/hf_internlm/hf_internlm2_20b.py         |
| hf_internlm2_7b          | configs/models/hf_internlm/hf_internlm2_7b.py          |
| hf_internlm2_chat_20b    | configs/models/hf_internlm/hf_internlm2_chat_20b.py    |
| hf_internlm2_chat_7b     | configs/models/hf_internlm/hf_internlm2_chat_7b.py     |
| hf_internlm_20b          | configs/models/hf_internlm/hf_internlm_20b.py          |
| hf_internlm_7b           | configs/models/hf_internlm/hf_internlm_7b.py           |
| hf_internlm_chat_20b     | configs/models/hf_internlm/hf_internlm_chat_20b.py     |
| hf_internlm_chat_7b      | configs/models/hf_internlm/hf_internlm_chat_7b.py      |
| hf_internlm_chat_7b_8k   | configs/models/hf_internlm/hf_internlm_chat_7b_8k.py   |
| hf_internlm_chat_7b_v1_1 | configs/models/hf_internlm/hf_internlm_chat_7b_v1_1.py |
| internlm_7b              | configs/models/internlm/internlm_7b.py                 |
| ms_internlm_chat_7b_8k   | configs/models/ms_internlm/ms_internlm_chat_7b_8k.py   |
+--------------------------+--------------------------------------------------------+
+----------------------------+------------------------------------------------------+
| Dataset                    | Config Path                                          |
|----------------------------+------------------------------------------------------|
| ceval_clean_ppl            | configs/datasets/ceval/ceval_clean_ppl.py            |
| ceval_gen                  | configs/datasets/ceval/ceval_gen.py                  |
| ceval_gen_2daf24           | configs/datasets/ceval/ceval_gen_2daf24.py           |
| ceval_gen_5f30c7           | configs/datasets/ceval/ceval_gen_5f30c7.py           |
| ceval_ppl                  | configs/datasets/ceval/ceval_ppl.py                  |
| ceval_ppl_578f8d           | configs/datasets/ceval/ceval_ppl_578f8d.py           |
| ceval_ppl_93e5ce           | configs/datasets/ceval/ceval_ppl_93e5ce.py           |
| ceval_zero_shot_gen_bd40ef | configs/datasets/ceval/ceval_zero_shot_gen_bd40ef.py |
+----------------------------+------------------------------------------------------+

启动评测

1
2
3
4
5
6
!python run.py --datasets ceval_gen --hf-path /share/temp/model_repos/internlm-chat-7b/ \
--tokenizer-path /share/temp/model_repos/internlm-chat-7b/ \
--tokenizer-kwargs padding_side='left' truncation='left' trust_remote_code=True \
--model-kwargs trust_remote_code=True device_map='auto' \
--max-seq-len 2048 --max-out-len 16 --batch-size 4 --num-gpus 1 \
--debug
01/21 12:54:59 - OpenCompass - INFO - Loading ceval_gen: configs/datasets/ceval/ceval_gen.py
01/21 12:54:59 - OpenCompass - INFO - Loading example: configs/summarizers/example.py
01/21 12:54:59 - OpenCompass - WARNING - SlurmRunner is not used, so the partition argument is ignored.
01/21 12:54:59 - OpenCompass - DEBUG - Modules of opencompass's partitioner registry have been automatically imported from opencompass.partitioners
01/21 12:54:59 - OpenCompass - DEBUG - Get class `SizePartitioner` from "partitioner" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `SizePartitioner` instance is built from registry, and its implementation can be found in opencompass.partitioners.size
01/21 12:54:59 - OpenCompass - DEBUG - Key eval.runner.task.judge_cfg not found in config, ignored.
01/21 12:54:59 - OpenCompass - DEBUG - Key eval.runner.task.dump_details not found in config, ignored.
01/21 12:54:59 - OpenCompass - DEBUG - Additional config: {}
01/21 12:54:59 - OpenCompass - DEBUG - Modules of opencompass's load_dataset registry have been automatically imported from opencompass.datasets
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - INFO - Partitioned into 1 tasks.
01/21 12:55:00 - OpenCompass - DEBUG - Task 0: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-college_economics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-accountant,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-tax_accountant,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-physician,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-civil_servant,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-urban_and_rural_planner,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-teacher_qualification,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-college_programming,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-electrical_engineer,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-business_administration,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-art_studies,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-fire_engineer,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-environmental_impact_assessment_engineer,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-education_science,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-professional_tour_guide,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-college_chemistry,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-metrology_engineer,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-mao_zedong_thought,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-law,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-veterinary_medicine,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-modern_chinese_history,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-chinese_language_and_literature,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-legal_professional,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-logic,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_history,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-plant_protection,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-clinical_medicine,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-computer_architecture,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_biology,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_politics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_chemistry,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_history,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-computer_network,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-operating_system,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-college_physics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-advanced_mathematics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_physics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_chemistry,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_biology,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_mathematics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_physics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-marxism,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_politics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_geography,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-ideological_and_moral_cultivation,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_chinese,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-sports_science,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-basic_medicine,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-probability_and_statistics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_mathematics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-discrete_mathematics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_geography]
01/21 12:55:00 - OpenCompass - DEBUG - Modules of opencompass's runner registry have been automatically imported from opencompass.runners
01/21 12:55:00 - OpenCompass - DEBUG - Get class `LocalRunner` from "runner" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `LocalRunner` instance is built from registry, and its implementation can be found in opencompass.runners.local
01/21 12:55:00 - OpenCompass - DEBUG - Modules of opencompass's task registry have been automatically imported from opencompass.tasks
01/21 12:55:00 - OpenCompass - DEBUG - Get class `OpenICLInferTask` from "task" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `OpenICLInferTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_infer
01/21 12:55:11 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-college_economics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-accountant,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-tax_accountant,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-physician,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-civil_servant,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-urban_and_rural_planner,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-teacher_qualification,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-college_programming,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-electrical_engineer,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-business_administration,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-art_studies,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-fire_engineer,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-environmental_impact_assessment_engineer,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-education_science,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-professional_tour_guide,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-college_chemistry,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-metrology_engineer,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-mao_zedong_thought,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-law,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-veterinary_medicine,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-modern_chinese_history,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-chinese_language_and_literature,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-legal_professional,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-logic,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_history,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-plant_protection,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-clinical_medicine,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-computer_architecture,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_biology,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_politics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_chemistry,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_history,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-computer_network,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-operating_system,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-college_physics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-advanced_mathematics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_physics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_chemistry,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_biology,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_mathematics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_physics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-marxism,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_politics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_geography,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-ideological_and_moral_cultivation,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_chinese,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-sports_science,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-basic_medicine,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-probability_and_statistics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_mathematics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-discrete_mathematics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_geography]
Loading checkpoint shards: 100%|██████████████████| 8/8 [00:12<00:00,  1.52s/it]
01/21 12:55:39 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-college_economics]
100%|██████████████████████████████████████| 55/55 [00:00<00:00, 1548233.02it/s]
[2024-01-21 12:55:39,531] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|███████████████████████████████████████████| 14/14 [00:38<00:00,  2.75s/it]
01/21 12:56:18 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-accountant]
100%|██████████████████████████████████████| 49/49 [00:00<00:00, 1478567.60it/s]
[2024-01-21 12:56:18,207] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|███████████████████████████████████████████| 13/13 [00:38<00:00,  2.93s/it]
01/21 12:56:56 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-tax_accountant]
100%|██████████████████████████████████████| 49/49 [00:00<00:00, 1500152.53it/s]
[2024-01-21 12:56:56,496] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|███████████████████████████████████████████| 13/13 [00:34<00:00,  2.69s/it]
01/21 12:57:31 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-physician]
100%|██████████████████████████████████████| 49/49 [00:00<00:00, 1533738.03it/s]
[2024-01-21 12:57:31,552] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|███████████████████████████████████████████| 13/13 [00:22<00:00,  1.71s/it]
01/21 12:57:53 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-civil_servant]
100%|██████████████████████████████████████| 47/47 [00:00<00:00, 1552222.74it/s]
[2024-01-21 12:57:53,942] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|███████████████████████████████████████████| 12/12 [00:42<00:00,  3.50s/it]
01/21 12:58:35 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-urban_and_rural_planner]
100%|██████████████████████████████████████| 46/46 [00:00<00:00, 1162277.01it/s]
[2024-01-21 12:58:36,114] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|███████████████████████████████████████████| 12/12 [00:25<00:00,  2.17s/it]
01/21 12:59:02 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-teacher_qualification]
100%|██████████████████████████████████████| 44/44 [00:00<00:00, 1476395.01it/s]
[2024-01-21 12:59:02,249] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|███████████████████████████████████████████| 11/11 [00:20<00:00,  1.89s/it]
01/21 12:59:23 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-college_programming]
100%|██████████████████████████████████████| 37/37 [00:00<00:00, 1241513.98it/s]
[2024-01-21 12:59:23,203] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|███████████████████████████████████████████| 10/10 [00:24<00:00,  2.46s/it]
01/21 12:59:47 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-electrical_engineer]
100%|██████████████████████████████████████| 37/37 [00:00<00:00, 1241513.98it/s]
[2024-01-21 12:59:47,894] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|███████████████████████████████████████████| 10/10 [00:24<00:00,  2.45s/it]
01/21 13:00:12 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-business_administration]
100%|██████████████████████████████████████| 33/33 [00:00<00:00, 1134524.85it/s]
[2024-01-21 13:00:12,543] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 9/9 [00:18<00:00,  2.08s/it]
01/21 13:00:31 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-art_studies]
100%|██████████████████████████████████████| 33/33 [00:00<00:00, 1163126.32it/s]
[2024-01-21 13:00:31,424] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 9/9 [00:13<00:00,  1.51s/it]
01/21 13:00:45 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-fire_engineer]
100%|██████████████████████████████████████| 31/31 [00:00<00:00, 1150649.77it/s]
[2024-01-21 13:00:45,156] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 8/8 [00:19<00:00,  2.47s/it]
01/21 13:01:04 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-environmental_impact_assessment_engineer]
100%|██████████████████████████████████████| 31/31 [00:00<00:00, 1015808.00it/s]
[2024-01-21 13:01:05,057] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 8/8 [00:17<00:00,  2.19s/it]
01/21 13:01:22 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-education_science]
100%|██████████████████████████████████████| 29/29 [00:00<00:00, 1030803.53it/s]
[2024-01-21 13:01:22,695] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 8/8 [00:11<00:00,  1.40s/it]
01/21 13:01:33 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-professional_tour_guide]
100%|██████████████████████████████████████| 29/29 [00:00<00:00, 1039613.81it/s]
[2024-01-21 13:01:34,053] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 8/8 [00:12<00:00,  1.62s/it]
01/21 13:01:47 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-college_chemistry]
100%|███████████████████████████████████████| 24/24 [00:00<00:00, 883011.37it/s]
[2024-01-21 13:01:47,116] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 6/6 [00:14<00:00,  2.38s/it]
01/21 13:02:01 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-metrology_engineer]
100%|███████████████████████████████████████| 24/24 [00:00<00:00, 915120.87it/s]
[2024-01-21 13:02:01,487] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 6/6 [00:14<00:00,  2.36s/it]
01/21 13:02:15 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-mao_zedong_thought]
100%|███████████████████████████████████████| 24/24 [00:00<00:00, 713924.09it/s]
[2024-01-21 13:02:15,714] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 6/6 [00:11<00:00,  1.90s/it]
01/21 13:02:27 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-law]
100%|███████████████████████████████████████| 24/24 [00:00<00:00, 689474.63it/s]
[2024-01-21 13:02:27,195] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 6/6 [00:17<00:00,  2.85s/it]
01/21 13:02:44 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-veterinary_medicine]
100%|███████████████████████████████████████| 23/23 [00:00<00:00, 869090.02it/s]
[2024-01-21 13:02:44,418] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 6/6 [00:10<00:00,  1.80s/it]
01/21 13:02:55 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-modern_chinese_history]
100%|███████████████████████████████████████| 23/23 [00:00<00:00, 846219.23it/s]
[2024-01-21 13:02:55,314] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 6/6 [00:12<00:00,  2.07s/it]
01/21 13:03:07 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-chinese_language_and_literature]
100%|███████████████████████████████████████| 23/23 [00:00<00:00, 893231.41it/s]
[2024-01-21 13:03:07,826] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 6/6 [00:10<00:00,  1.74s/it]
01/21 13:03:18 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-legal_professional]
100%|███████████████████████████████████████| 23/23 [00:00<00:00, 853707.89it/s]
[2024-01-21 13:03:18,394] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 6/6 [00:22<00:00,  3.81s/it]
01/21 13:03:41 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-logic]
100%|███████████████████████████████████████| 22/22 [00:00<00:00, 862380.26it/s]
[2024-01-21 13:03:41,414] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 6/6 [00:29<00:00,  4.84s/it]
01/21 13:04:10 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_history]
100%|███████████████████████████████████████| 22/22 [00:00<00:00, 862380.26it/s]
[2024-01-21 13:04:10,543] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 6/6 [00:09<00:00,  1.55s/it]
01/21 13:04:19 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-plant_protection]
100%|███████████████████████████████████████| 22/22 [00:00<00:00, 878806.55it/s]
[2024-01-21 13:04:19,944] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 6/6 [00:08<00:00,  1.47s/it]
01/21 13:04:28 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-clinical_medicine]
100%|███████████████████████████████████████| 22/22 [00:00<00:00, 838860.80it/s]
[2024-01-21 13:04:28,856] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 6/6 [00:12<00:00,  2.01s/it]
01/21 13:04:40 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-computer_architecture]
100%|███████████████████████████████████████| 21/21 [00:00<00:00, 547083.13it/s]
[2024-01-21 13:04:41,027] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 6/6 [00:13<00:00,  2.22s/it]
01/21 13:04:54 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_biology]
100%|███████████████████████████████████████| 21/21 [00:00<00:00, 765916.38it/s]
[2024-01-21 13:04:54,470] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 6/6 [00:09<00:00,  1.53s/it]
01/21 13:05:03 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_politics]
100%|███████████████████████████████████████| 21/21 [00:00<00:00, 830947.02it/s]
[2024-01-21 13:05:03,749] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 6/6 [00:14<00:00,  2.41s/it]
01/21 13:05:18 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_chemistry]
100%|███████████████████████████████████████| 20/20 [00:00<00:00, 791378.11it/s]
[2024-01-21 13:05:18,296] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 5/5 [00:12<00:00,  2.46s/it]
01/21 13:05:30 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_history]
100%|███████████████████████████████████████| 20/20 [00:00<00:00, 798915.05it/s]
[2024-01-21 13:05:30,687] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 5/5 [00:11<00:00,  2.25s/it]
01/21 13:05:41 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-computer_network]
100%|███████████████████████████████████████| 19/19 [00:00<00:00, 731117.21it/s]
[2024-01-21 13:05:42,035] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 5/5 [00:08<00:00,  1.73s/it]
01/21 13:05:50 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-operating_system]
100%|███████████████████████████████████████| 19/19 [00:00<00:00, 724470.69it/s]
[2024-01-21 13:05:50,775] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 5/5 [00:07<00:00,  1.44s/it]
01/21 13:05:57 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-college_physics]
100%|███████████████████████████████████████| 19/19 [00:00<00:00, 737886.81it/s]
[2024-01-21 13:05:58,040] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 5/5 [00:14<00:00,  2.90s/it]
01/21 13:06:12 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-advanced_mathematics]
100%|███████████████████████████████████████| 19/19 [00:00<00:00, 766267.08it/s]
[2024-01-21 13:06:12,653] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 5/5 [00:22<00:00,  4.41s/it]
01/21 13:06:34 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_physics]
100%|███████████████████████████████████████| 19/19 [00:00<00:00, 717943.93it/s]
[2024-01-21 13:06:34,792] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 5/5 [00:10<00:00,  2.02s/it]
01/21 13:06:44 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_chemistry]
100%|███████████████████████████████████████| 19/19 [00:00<00:00, 751809.21it/s]
[2024-01-21 13:06:44,964] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 5/5 [00:10<00:00,  2.06s/it]
01/21 13:06:55 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_biology]
100%|███████████████████████████████████████| 19/19 [00:00<00:00, 804967.43it/s]
[2024-01-21 13:06:55,327] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 5/5 [00:10<00:00,  2.09s/it]
01/21 13:07:05 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_mathematics]
100%|███████████████████████████████████████| 19/19 [00:00<00:00, 758969.30it/s]
[2024-01-21 13:07:05,839] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 5/5 [00:15<00:00,  3.07s/it]
01/21 13:07:21 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_physics]
100%|███████████████████████████████████████| 19/19 [00:00<00:00, 581691.80it/s]
[2024-01-21 13:07:21,273] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 5/5 [00:10<00:00,  2.14s/it]
01/21 13:07:31 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-marxism]
100%|███████████████████████████████████████| 19/19 [00:00<00:00, 744782.95it/s]
[2024-01-21 13:07:32,035] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 5/5 [00:08<00:00,  1.65s/it]
01/21 13:07:40 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_politics]
100%|███████████████████████████████████████| 19/19 [00:00<00:00, 681126.29it/s]
[2024-01-21 13:07:40,420] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 5/5 [00:18<00:00,  3.65s/it]
01/21 13:07:58 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_geography]
100%|███████████████████████████████████████| 19/19 [00:00<00:00, 781291.92it/s]
[2024-01-21 13:07:58,750] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 5/5 [00:08<00:00,  1.64s/it]
01/21 13:08:06 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-ideological_and_moral_cultivation]
100%|███████████████████████████████████████| 19/19 [00:00<00:00, 737886.81it/s]
[2024-01-21 13:08:07,017] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 5/5 [00:06<00:00,  1.21s/it]
01/21 13:08:13 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_chinese]
100%|███████████████████████████████████████| 19/19 [00:00<00:00, 766267.08it/s]
[2024-01-21 13:08:13,184] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 5/5 [00:24<00:00,  4.92s/it]
01/21 13:08:37 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-sports_science]
100%|███████████████████████████████████████| 19/19 [00:00<00:00, 758969.30it/s]
[2024-01-21 13:08:37,875] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 5/5 [00:07<00:00,  1.42s/it]
01/21 13:08:45 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-basic_medicine]
100%|███████████████████████████████████████| 19/19 [00:00<00:00, 737886.81it/s]
[2024-01-21 13:08:45,046] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 5/5 [00:06<00:00,  1.38s/it]
01/21 13:08:51 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-probability_and_statistics]
100%|███████████████████████████████████████| 18/18 [00:00<00:00, 725937.23it/s]
[2024-01-21 13:08:52,026] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 5/5 [00:23<00:00,  4.77s/it]
01/21 13:09:15 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_mathematics]
100%|███████████████████████████████████████| 18/18 [00:00<00:00, 740171.29it/s]
[2024-01-21 13:09:15,962] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 5/5 [00:16<00:00,  3.30s/it]
01/21 13:09:32 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-discrete_mathematics]
100%|███████████████████████████████████████| 16/16 [00:00<00:00, 122461.43it/s]
[2024-01-21 13:09:32,566] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 4/4 [00:07<00:00,  1.79s/it]
01/21 13:09:39 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_geography]
100%|███████████████████████████████████████| 12/12 [00:00<00:00, 518882.97it/s]
[2024-01-21 13:09:39,795] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 3/3 [00:05<00:00,  1.73s/it]
01/21 13:09:44 - OpenCompass - INFO - time elapsed: 873.46s
01/21 13:09:49 - OpenCompass - DEBUG - Get class `NaivePartitioner` from "partitioner" registry in "opencompass"
01/21 13:09:49 - OpenCompass - DEBUG - An `NaivePartitioner` instance is built from registry, and its implementation can be found in opencompass.partitioners.naive
01/21 13:09:49 - OpenCompass - DEBUG - Key eval.runner.task.judge_cfg not found in config, ignored.
01/21 13:09:49 - OpenCompass - DEBUG - Key eval.runner.task.dump_details not found in config, ignored.
01/21 13:09:49 - OpenCompass - DEBUG - Additional config: {'eval': {'runner': {'task': {}}}}
01/21 13:09:49 - OpenCompass - INFO - Partitioned into 52 tasks.
01/21 13:09:49 - OpenCompass - DEBUG - Task 0: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-computer_network]
01/21 13:09:49 - OpenCompass - DEBUG - Task 1: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-operating_system]
01/21 13:09:49 - OpenCompass - DEBUG - Task 2: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-computer_architecture]
01/21 13:09:49 - OpenCompass - DEBUG - Task 3: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-college_programming]
01/21 13:09:49 - OpenCompass - DEBUG - Task 4: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-college_physics]
01/21 13:09:49 - OpenCompass - DEBUG - Task 5: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-college_chemistry]
01/21 13:09:49 - OpenCompass - DEBUG - Task 6: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-advanced_mathematics]
01/21 13:09:49 - OpenCompass - DEBUG - Task 7: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-probability_and_statistics]
01/21 13:09:49 - OpenCompass - DEBUG - Task 8: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-discrete_mathematics]
01/21 13:09:49 - OpenCompass - DEBUG - Task 9: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-electrical_engineer]
01/21 13:09:49 - OpenCompass - DEBUG - Task 10: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-metrology_engineer]
01/21 13:09:49 - OpenCompass - DEBUG - Task 11: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_mathematics]
01/21 13:09:49 - OpenCompass - DEBUG - Task 12: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_physics]
01/21 13:09:49 - OpenCompass - DEBUG - Task 13: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_chemistry]
01/21 13:09:49 - OpenCompass - DEBUG - Task 14: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_biology]
01/21 13:09:49 - OpenCompass - DEBUG - Task 15: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_mathematics]
01/21 13:09:49 - OpenCompass - DEBUG - Task 16: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_biology]
01/21 13:09:49 - OpenCompass - DEBUG - Task 17: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_physics]
01/21 13:09:49 - OpenCompass - DEBUG - Task 18: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_chemistry]
01/21 13:09:49 - OpenCompass - DEBUG - Task 19: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-veterinary_medicine]
01/21 13:09:49 - OpenCompass - DEBUG - Task 20: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-college_economics]
01/21 13:09:49 - OpenCompass - DEBUG - Task 21: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-business_administration]
01/21 13:09:49 - OpenCompass - DEBUG - Task 22: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-marxism]
01/21 13:09:49 - OpenCompass - DEBUG - Task 23: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-mao_zedong_thought]
01/21 13:09:49 - OpenCompass - DEBUG - Task 24: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-education_science]
01/21 13:09:49 - OpenCompass - DEBUG - Task 25: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-teacher_qualification]
01/21 13:09:49 - OpenCompass - DEBUG - Task 26: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_politics]
01/21 13:09:49 - OpenCompass - DEBUG - Task 27: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_geography]
01/21 13:09:49 - OpenCompass - DEBUG - Task 28: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_politics]
01/21 13:09:49 - OpenCompass - DEBUG - Task 29: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_geography]
01/21 13:09:49 - OpenCompass - DEBUG - Task 30: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-modern_chinese_history]
01/21 13:09:49 - OpenCompass - DEBUG - Task 31: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-ideological_and_moral_cultivation]
01/21 13:09:49 - OpenCompass - DEBUG - Task 32: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-logic]
01/21 13:09:49 - OpenCompass - DEBUG - Task 33: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-law]
01/21 13:09:49 - OpenCompass - DEBUG - Task 34: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-chinese_language_and_literature]
01/21 13:09:49 - OpenCompass - DEBUG - Task 35: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-art_studies]
01/21 13:09:49 - OpenCompass - DEBUG - Task 36: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-professional_tour_guide]
01/21 13:09:49 - OpenCompass - DEBUG - Task 37: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-legal_professional]
01/21 13:09:49 - OpenCompass - DEBUG - Task 38: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_chinese]
01/21 13:09:49 - OpenCompass - DEBUG - Task 39: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_history]
01/21 13:09:49 - OpenCompass - DEBUG - Task 40: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_history]
01/21 13:09:49 - OpenCompass - DEBUG - Task 41: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-civil_servant]
01/21 13:09:49 - OpenCompass - DEBUG - Task 42: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-sports_science]
01/21 13:09:49 - OpenCompass - DEBUG - Task 43: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-plant_protection]
01/21 13:09:49 - OpenCompass - DEBUG - Task 44: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-basic_medicine]
01/21 13:09:49 - OpenCompass - DEBUG - Task 45: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-clinical_medicine]
01/21 13:09:49 - OpenCompass - DEBUG - Task 46: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-urban_and_rural_planner]
01/21 13:09:49 - OpenCompass - DEBUG - Task 47: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-accountant]
01/21 13:09:49 - OpenCompass - DEBUG - Task 48: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-fire_engineer]
01/21 13:09:49 - OpenCompass - DEBUG - Task 49: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-environmental_impact_assessment_engineer]
01/21 13:09:49 - OpenCompass - DEBUG - Task 50: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-tax_accountant]
01/21 13:09:49 - OpenCompass - DEBUG - Task 51: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-physician]
01/21 13:09:49 - OpenCompass - DEBUG - Get class `LocalRunner` from "runner" registry in "opencompass"
01/21 13:09:49 - OpenCompass - DEBUG - An `LocalRunner` instance is built from registry, and its implementation can be found in opencompass.runners.local
01/21 13:09:49 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:09:49 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:11:21 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-computer_network]: {'accuracy': 31.57894736842105}
01/21 13:11:21 - OpenCompass - INFO - time elapsed: 60.60s
01/21 13:11:22 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:11:22 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:11:43 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-operating_system]: {'accuracy': 36.84210526315789}
01/21 13:11:49 - OpenCompass - INFO - time elapsed: 19.91s
01/21 13:11:49 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:11:49 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:12:12 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-computer_architecture]: {'accuracy': 28.57142857142857}
01/21 13:12:17 - OpenCompass - INFO - time elapsed: 20.15s
01/21 13:12:18 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:12:18 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:13:20 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-college_programming]: {'accuracy': 32.432432432432435}
01/21 13:13:20 - OpenCompass - INFO - time elapsed: 56.35s
01/21 13:13:21 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:13:21 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:13:33 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-college_physics]: {'accuracy': 26.31578947368421}
01/21 13:13:33 - OpenCompass - INFO - time elapsed: 5.75s
01/21 13:13:34 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:13:34 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:13:53 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-college_chemistry]: {'accuracy': 16.666666666666664}
01/21 13:13:54 - OpenCompass - INFO - time elapsed: 14.26s
01/21 13:13:55 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:13:55 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:14:26 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-advanced_mathematics]: {'accuracy': 21.052631578947366}
01/21 13:14:26 - OpenCompass - INFO - time elapsed: 24.90s
01/21 13:14:27 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:14:27 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:14:47 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-probability_and_statistics]: {'accuracy': 38.88888888888889}
01/21 13:14:49 - OpenCompass - INFO - time elapsed: 15.55s
01/21 13:14:49 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:14:49 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:15:50 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-discrete_mathematics]: {'accuracy': 18.75}
01/21 13:16:16 - OpenCompass - INFO - time elapsed: 61.13s
01/21 13:16:16 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:16:16 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:16:52 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-electrical_engineer]: {'accuracy': 35.13513513513514}
01/21 13:16:54 - OpenCompass - INFO - time elapsed: 17.86s
01/21 13:16:55 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:16:55 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:17:48 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-metrology_engineer]: {'accuracy': 50.0}
01/21 13:17:48 - OpenCompass - INFO - time elapsed: 30.54s
01/21 13:17:49 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:17:49 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:18:25 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_mathematics]: {'accuracy': 22.22222222222222}
01/21 13:18:25 - OpenCompass - INFO - time elapsed: 24.63s
01/21 13:18:26 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:18:26 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:19:07 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_physics]: {'accuracy': 31.57894736842105}
01/21 13:19:07 - OpenCompass - INFO - time elapsed: 14.48s
01/21 13:19:08 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:19:08 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:19:43 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_chemistry]: {'accuracy': 15.789473684210526}
01/21 13:20:12 - OpenCompass - INFO - time elapsed: 54.42s
01/21 13:20:13 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:20:13 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:20:32 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_biology]: {'accuracy': 36.84210526315789}
01/21 13:20:32 - OpenCompass - INFO - time elapsed: 10.37s
01/21 13:20:33 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:20:33 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:20:50 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_mathematics]: {'accuracy': 26.31578947368421}
01/21 13:20:50 - OpenCompass - INFO - time elapsed: 7.74s
01/21 13:20:50 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:20:50 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:21:05 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_biology]: {'accuracy': 61.904761904761905}
01/21 13:21:05 - OpenCompass - INFO - time elapsed: 7.39s
01/21 13:21:06 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:21:06 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:21:19 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_physics]: {'accuracy': 63.1578947368421}
01/21 13:21:19 - OpenCompass - INFO - time elapsed: 6.52s
01/21 13:21:20 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:21:20 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:21:34 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_chemistry]: {'accuracy': 60.0}
01/21 13:21:34 - OpenCompass - INFO - time elapsed: 7.03s
01/21 13:21:35 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:21:35 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:21:48 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-veterinary_medicine]: {'accuracy': 47.82608695652174}
01/21 13:21:48 - OpenCompass - INFO - time elapsed: 6.74s
01/21 13:21:49 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:21:49 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:22:01 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-college_economics]: {'accuracy': 41.81818181818181}
01/21 13:22:01 - OpenCompass - INFO - time elapsed: 5.99s
01/21 13:22:01 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:22:01 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:22:14 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-business_administration]: {'accuracy': 33.33333333333333}
01/21 13:22:14 - OpenCompass - INFO - time elapsed: 5.85s
01/21 13:22:15 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:22:15 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:22:27 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-marxism]: {'accuracy': 68.42105263157895}
01/21 13:22:27 - OpenCompass - INFO - time elapsed: 6.38s
01/21 13:22:28 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:22:28 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:22:39 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-mao_zedong_thought]: {'accuracy': 70.83333333333334}
01/21 13:22:39 - OpenCompass - INFO - time elapsed: 5.05s
01/21 13:22:40 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:22:40 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:22:51 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-education_science]: {'accuracy': 58.620689655172406}
01/21 13:22:51 - OpenCompass - INFO - time elapsed: 5.12s
01/21 13:22:52 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:22:52 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:23:03 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-teacher_qualification]: {'accuracy': 70.45454545454545}
01/21 13:23:03 - OpenCompass - INFO - time elapsed: 5.09s
01/21 13:23:03 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:23:03 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:23:15 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_politics]: {'accuracy': 26.31578947368421}
01/21 13:23:15 - OpenCompass - INFO - time elapsed: 5.58s
01/21 13:23:15 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:23:15 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:23:27 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_geography]: {'accuracy': 47.368421052631575}
01/21 13:23:27 - OpenCompass - INFO - time elapsed: 5.31s
01/21 13:23:28 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:23:28 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:23:40 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_politics]: {'accuracy': 52.38095238095239}
01/21 13:23:40 - OpenCompass - INFO - time elapsed: 6.11s
01/21 13:23:41 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:23:41 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:23:52 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_geography]: {'accuracy': 58.333333333333336}
01/21 13:23:52 - OpenCompass - INFO - time elapsed: 5.68s
01/21 13:23:53 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:23:53 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:24:05 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-modern_chinese_history]: {'accuracy': 73.91304347826086}
01/21 13:24:05 - OpenCompass - INFO - time elapsed: 6.29s
01/21 13:24:05 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:24:05 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:24:16 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-ideological_and_moral_cultivation]: {'accuracy': 63.1578947368421}
01/21 13:24:16 - OpenCompass - INFO - time elapsed: 5.21s
01/21 13:24:17 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:24:17 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:24:29 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-logic]: {'accuracy': 31.818181818181817}
01/21 13:24:29 - OpenCompass - INFO - time elapsed: 5.91s
01/21 13:24:29 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:24:29 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:24:41 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-law]: {'accuracy': 25.0}
01/21 13:24:41 - OpenCompass - INFO - time elapsed: 6.18s
01/21 13:24:42 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:24:42 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:24:55 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-chinese_language_and_literature]: {'accuracy': 30.434782608695656}
01/21 13:24:55 - OpenCompass - INFO - time elapsed: 6.18s
01/21 13:24:56 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:24:56 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:25:10 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-art_studies]: {'accuracy': 60.60606060606061}
01/21 13:25:10 - OpenCompass - INFO - time elapsed: 6.80s
01/21 13:25:10 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:25:10 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:25:21 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-professional_tour_guide]: {'accuracy': 62.06896551724138}
01/21 13:25:21 - OpenCompass - INFO - time elapsed: 4.76s
01/21 13:25:22 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:25:22 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:25:34 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-legal_professional]: {'accuracy': 39.130434782608695}
01/21 13:25:34 - OpenCompass - INFO - time elapsed: 6.15s
01/21 13:25:34 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:25:34 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:25:46 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_chinese]: {'accuracy': 63.1578947368421}
01/21 13:25:46 - OpenCompass - INFO - time elapsed: 5.56s
01/21 13:25:46 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:25:46 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:25:56 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_history]: {'accuracy': 70.0}
01/21 13:25:56 - OpenCompass - INFO - time elapsed: 4.41s
01/21 13:25:57 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:25:57 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:26:08 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_history]: {'accuracy': 59.09090909090909}
01/21 13:26:08 - OpenCompass - INFO - time elapsed: 5.36s
01/21 13:26:09 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:26:09 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:26:20 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-civil_servant]: {'accuracy': 53.191489361702125}
01/21 13:26:20 - OpenCompass - INFO - time elapsed: 5.42s
01/21 13:26:21 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:26:21 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:26:33 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-sports_science]: {'accuracy': 52.63157894736842}
01/21 13:26:33 - OpenCompass - INFO - time elapsed: 6.01s
01/21 13:26:34 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:26:34 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:26:45 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-plant_protection]: {'accuracy': 59.09090909090909}
01/21 13:26:45 - OpenCompass - INFO - time elapsed: 5.62s
01/21 13:26:46 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:26:46 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:26:57 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-basic_medicine]: {'accuracy': 47.368421052631575}
01/21 13:26:57 - OpenCompass - INFO - time elapsed: 5.50s
01/21 13:26:58 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:26:58 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:27:10 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-clinical_medicine]: {'accuracy': 40.909090909090914}
01/21 13:27:10 - OpenCompass - INFO - time elapsed: 5.61s
01/21 13:27:11 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:27:11 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:27:22 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-urban_and_rural_planner]: {'accuracy': 45.65217391304348}
01/21 13:27:22 - OpenCompass - INFO - time elapsed: 4.93s
01/21 13:27:23 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:27:23 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:27:34 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-accountant]: {'accuracy': 26.53061224489796}
01/21 13:27:34 - OpenCompass - INFO - time elapsed: 5.41s
01/21 13:27:35 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:27:35 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:27:46 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-fire_engineer]: {'accuracy': 22.58064516129032}
01/21 13:27:46 - OpenCompass - INFO - time elapsed: 5.98s
01/21 13:27:47 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:27:47 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:27:59 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-environmental_impact_assessment_engineer]: {'accuracy': 64.51612903225806}
01/21 13:27:59 - OpenCompass - INFO - time elapsed: 6.09s
01/21 13:28:00 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:28:00 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:28:13 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-tax_accountant]: {'accuracy': 34.69387755102041}
01/21 13:28:13 - OpenCompass - INFO - time elapsed: 6.41s
01/21 13:28:14 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:28:14 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:28:27 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-physician]: {'accuracy': 40.816326530612244}
01/21 13:28:27 - OpenCompass - INFO - time elapsed: 5.91s
01/21 13:28:27 - OpenCompass - DEBUG - An `DefaultSummarizer` instance is built from registry, and its implementation can be found in opencompass.summarizers.default
dataset                                         version    metric         mode      opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b
----------------------------------------------  ---------  -------------  ------  -------------------------------------------------------------------------
ceval-computer_network                          db9ce2     accuracy       gen                                                                         31.58
ceval-operating_system                          1c2571     accuracy       gen                                                                         36.84
ceval-computer_architecture                     a74dad     accuracy       gen                                                                         28.57
ceval-college_programming                       4ca32a     accuracy       gen                                                                         32.43
ceval-college_physics                           963fa8     accuracy       gen                                                                         26.32
ceval-college_chemistry                         e78857     accuracy       gen                                                                         16.67
ceval-advanced_mathematics                      ce03e2     accuracy       gen                                                                         21.05
ceval-probability_and_statistics                65e812     accuracy       gen                                                                         38.89
ceval-discrete_mathematics                      e894ae     accuracy       gen                                                                         18.75
ceval-electrical_engineer                       ae42b9     accuracy       gen                                                                         35.14
ceval-metrology_engineer                        ee34ea     accuracy       gen                                                                         50
ceval-high_school_mathematics                   1dc5bf     accuracy       gen                                                                         22.22
ceval-high_school_physics                       adf25f     accuracy       gen                                                                         31.58
ceval-high_school_chemistry                     2ed27f     accuracy       gen                                                                         15.79
ceval-high_school_biology                       8e2b9a     accuracy       gen                                                                         36.84
ceval-middle_school_mathematics                 bee8d5     accuracy       gen                                                                         26.32
ceval-middle_school_biology                     86817c     accuracy       gen                                                                         61.9
ceval-middle_school_physics                     8accf6     accuracy       gen                                                                         63.16
ceval-middle_school_chemistry                   167a15     accuracy       gen                                                                         60
ceval-veterinary_medicine                       b4e08d     accuracy       gen                                                                         47.83
ceval-college_economics                         f3f4e6     accuracy       gen                                                                         41.82
ceval-business_administration                   c1614e     accuracy       gen                                                                         33.33
ceval-marxism                                   cf874c     accuracy       gen                                                                         68.42
ceval-mao_zedong_thought                        51c7a4     accuracy       gen                                                                         70.83
ceval-education_science                         591fee     accuracy       gen                                                                         58.62
ceval-teacher_qualification                     4e4ced     accuracy       gen                                                                         70.45
ceval-high_school_politics                      5c0de2     accuracy       gen                                                                         26.32
ceval-high_school_geography                     865461     accuracy       gen                                                                         47.37
ceval-middle_school_politics                    5be3e7     accuracy       gen                                                                         52.38
ceval-middle_school_geography                   8a63be     accuracy       gen                                                                         58.33
ceval-modern_chinese_history                    fc01af     accuracy       gen                                                                         73.91
ceval-ideological_and_moral_cultivation         a2aa4a     accuracy       gen                                                                         63.16
ceval-logic                                     f5b022     accuracy       gen                                                                         31.82
ceval-law                                       a110a1     accuracy       gen                                                                         25
ceval-chinese_language_and_literature           0f8b68     accuracy       gen                                                                         30.43
ceval-art_studies                               2a1300     accuracy       gen                                                                         60.61
ceval-professional_tour_guide                   4e673e     accuracy       gen                                                                         62.07
ceval-legal_professional                        ce8787     accuracy       gen                                                                         39.13
ceval-high_school_chinese                       315705     accuracy       gen                                                                         63.16
ceval-high_school_history                       7eb30a     accuracy       gen                                                                         70
ceval-middle_school_history                     48ab4a     accuracy       gen                                                                         59.09
ceval-civil_servant                             87d061     accuracy       gen                                                                         53.19
ceval-sports_science                            70f27b     accuracy       gen                                                                         52.63
ceval-plant_protection                          8941f9     accuracy       gen                                                                         59.09
ceval-basic_medicine                            c409d6     accuracy       gen                                                                         47.37
ceval-clinical_medicine                         49e82d     accuracy       gen                                                                         40.91
ceval-urban_and_rural_planner                   95b885     accuracy       gen                                                                         45.65
ceval-accountant                                002837     accuracy       gen                                                                         26.53
ceval-fire_engineer                             bc23f5     accuracy       gen                                                                         22.58
ceval-environmental_impact_assessment_engineer  c64e2d     accuracy       gen                                                                         64.52
ceval-tax_accountant                            3a5e3c     accuracy       gen                                                                         34.69
ceval-physician                                 6e277d     accuracy       gen                                                                         40.82
ceval-stem                                      -          naive_average  gen                                                                         35.09
ceval-social-science                            -          naive_average  gen                                                                         52.79
ceval-humanities                                -          naive_average  gen                                                                         52.58
ceval-other                                     -          naive_average  gen                                                                         44.36
ceval-hard                                      -          naive_average  gen                                                                         23.91
ceval                                           -          naive_average  gen                                                                         44.16
01/21 13:28:27 - OpenCompass - INFO - write summary to /root/opencompass/outputs/default/20240121_125459/summary/summary_20240121_125459.txt
01/21 13:28:27 - OpenCompass - INFO - write csv to /root/opencompass/outputs/default/20240121_125459/summary/summary_20240121_125459.csv
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
# %load /root/opencompass/outputs/default/20240121_125459/summary/summary_20240121_125459.txt
20240121_125459
tabulate format
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
dataset version metric mode opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b
---------------------------------------------- --------- ------------- ------ -------------------------------------------------------------------------
ceval-computer_network db9ce2 accuracy gen 31.58
ceval-operating_system 1c2571 accuracy gen 36.84
ceval-computer_architecture a74dad accuracy gen 28.57
ceval-college_programming 4ca32a accuracy gen 32.43
ceval-college_physics 963fa8 accuracy gen 26.32
ceval-college_chemistry e78857 accuracy gen 16.67
ceval-advanced_mathematics ce03e2 accuracy gen 21.05
ceval-probability_and_statistics 65e812 accuracy gen 38.89
ceval-discrete_mathematics e894ae accuracy gen 18.75
ceval-electrical_engineer ae42b9 accuracy gen 35.14
ceval-metrology_engineer ee34ea accuracy gen 50
ceval-high_school_mathematics 1dc5bf accuracy gen 22.22
ceval-high_school_physics adf25f accuracy gen 31.58
ceval-high_school_chemistry 2ed27f accuracy gen 15.79
ceval-high_school_biology 8e2b9a accuracy gen 36.84
ceval-middle_school_mathematics bee8d5 accuracy gen 26.32
ceval-middle_school_biology 86817c accuracy gen 61.9
ceval-middle_school_physics 8accf6 accuracy gen 63.16
ceval-middle_school_chemistry 167a15 accuracy gen 60
ceval-veterinary_medicine b4e08d accuracy gen 47.83
ceval-college_economics f3f4e6 accuracy gen 41.82
ceval-business_administration c1614e accuracy gen 33.33
ceval-marxism cf874c accuracy gen 68.42
ceval-mao_zedong_thought 51c7a4 accuracy gen 70.83
ceval-education_science 591fee accuracy gen 58.62
ceval-teacher_qualification 4e4ced accuracy gen 70.45
ceval-high_school_politics 5c0de2 accuracy gen 26.32
ceval-high_school_geography 865461 accuracy gen 47.37
ceval-middle_school_politics 5be3e7 accuracy gen 52.38
ceval-middle_school_geography 8a63be accuracy gen 58.33
ceval-modern_chinese_history fc01af accuracy gen 73.91
ceval-ideological_and_moral_cultivation a2aa4a accuracy gen 63.16
ceval-logic f5b022 accuracy gen 31.82
ceval-law a110a1 accuracy gen 25
ceval-chinese_language_and_literature 0f8b68 accuracy gen 30.43
ceval-art_studies 2a1300 accuracy gen 60.61
ceval-professional_tour_guide 4e673e accuracy gen 62.07
ceval-legal_professional ce8787 accuracy gen 39.13
ceval-high_school_chinese 315705 accuracy gen 63.16
ceval-high_school_history 7eb30a accuracy gen 70
ceval-middle_school_history 48ab4a accuracy gen 59.09
ceval-civil_servant 87d061 accuracy gen 53.19
ceval-sports_science 70f27b accuracy gen 52.63
ceval-plant_protection 8941f9 accuracy gen 59.09
ceval-basic_medicine c409d6 accuracy gen 47.37
ceval-clinical_medicine 49e82d accuracy gen 40.91
ceval-urban_and_rural_planner 95b885 accuracy gen 45.65
ceval-accountant 002837 accuracy gen 26.53
ceval-fire_engineer bc23f5 accuracy gen 22.58
ceval-environmental_impact_assessment_engineer c64e2d accuracy gen 64.52
ceval-tax_accountant 3a5e3c accuracy gen 34.69
ceval-physician 6e277d accuracy gen 40.82
ceval-stem - naive_average gen 35.09
ceval-social-science - naive_average gen 52.79
ceval-humanities - naive_average gen 52.58
ceval-other - naive_average gen 44.36
ceval-hard - naive_average gen 23.91
ceval - naive_average gen 44.16
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

-------------------------------------------------------------------------------------------------------------------------------- THIS IS A DIVIDER --------------------------------------------------------------------------------------------------------------------------------

csv format
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
dataset,version,metric,mode,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b
ceval-computer_network,db9ce2,accuracy,gen,31.58
ceval-operating_system,1c2571,accuracy,gen,36.84
ceval-computer_architecture,a74dad,accuracy,gen,28.57
ceval-college_programming,4ca32a,accuracy,gen,32.43
ceval-college_physics,963fa8,accuracy,gen,26.32
ceval-college_chemistry,e78857,accuracy,gen,16.67
ceval-advanced_mathematics,ce03e2,accuracy,gen,21.05
ceval-probability_and_statistics,65e812,accuracy,gen,38.89
ceval-discrete_mathematics,e894ae,accuracy,gen,18.75
ceval-electrical_engineer,ae42b9,accuracy,gen,35.14
ceval-metrology_engineer,ee34ea,accuracy,gen,50.00
ceval-high_school_mathematics,1dc5bf,accuracy,gen,22.22
ceval-high_school_physics,adf25f,accuracy,gen,31.58
ceval-high_school_chemistry,2ed27f,accuracy,gen,15.79
ceval-high_school_biology,8e2b9a,accuracy,gen,36.84
ceval-middle_school_mathematics,bee8d5,accuracy,gen,26.32
ceval-middle_school_biology,86817c,accuracy,gen,61.90
ceval-middle_school_physics,8accf6,accuracy,gen,63.16
ceval-middle_school_chemistry,167a15,accuracy,gen,60.00
ceval-veterinary_medicine,b4e08d,accuracy,gen,47.83
ceval-college_economics,f3f4e6,accuracy,gen,41.82
ceval-business_administration,c1614e,accuracy,gen,33.33
ceval-marxism,cf874c,accuracy,gen,68.42
ceval-mao_zedong_thought,51c7a4,accuracy,gen,70.83
ceval-education_science,591fee,accuracy,gen,58.62
ceval-teacher_qualification,4e4ced,accuracy,gen,70.45
ceval-high_school_politics,5c0de2,accuracy,gen,26.32
ceval-high_school_geography,865461,accuracy,gen,47.37
ceval-middle_school_politics,5be3e7,accuracy,gen,52.38
ceval-middle_school_geography,8a63be,accuracy,gen,58.33
ceval-modern_chinese_history,fc01af,accuracy,gen,73.91
ceval-ideological_and_moral_cultivation,a2aa4a,accuracy,gen,63.16
ceval-logic,f5b022,accuracy,gen,31.82
ceval-law,a110a1,accuracy,gen,25.00
ceval-chinese_language_and_literature,0f8b68,accuracy,gen,30.43
ceval-art_studies,2a1300,accuracy,gen,60.61
ceval-professional_tour_guide,4e673e,accuracy,gen,62.07
ceval-legal_professional,ce8787,accuracy,gen,39.13
ceval-high_school_chinese,315705,accuracy,gen,63.16
ceval-high_school_history,7eb30a,accuracy,gen,70.00
ceval-middle_school_history,48ab4a,accuracy,gen,59.09
ceval-civil_servant,87d061,accuracy,gen,53.19
ceval-sports_science,70f27b,accuracy,gen,52.63
ceval-plant_protection,8941f9,accuracy,gen,59.09
ceval-basic_medicine,c409d6,accuracy,gen,47.37
ceval-clinical_medicine,49e82d,accuracy,gen,40.91
ceval-urban_and_rural_planner,95b885,accuracy,gen,45.65
ceval-accountant,002837,accuracy,gen,26.53
ceval-fire_engineer,bc23f5,accuracy,gen,22.58
ceval-environmental_impact_assessment_engineer,c64e2d,accuracy,gen,64.52
ceval-tax_accountant,3a5e3c,accuracy,gen,34.69
ceval-physician,6e277d,accuracy,gen,40.82
ceval-stem,-,naive_average,gen,35.09
ceval-social-science,-,naive_average,gen,52.79
ceval-humanities,-,naive_average,gen,52.58
ceval-other,-,naive_average,gen,44.36
ceval-hard,-,naive_average,gen,23.91
ceval,-,naive_average,gen,44.16
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

-------------------------------------------------------------------------------------------------------------------------------- THIS IS A DIVIDER --------------------------------------------------------------------------------------------------------------------------------

raw format
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-------------------------------
Model: opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b
ceval-computer_network: {'accuracy': 31.57894736842105}
ceval-operating_system: {'accuracy': 36.84210526315789}
ceval-computer_architecture: {'accuracy': 28.57142857142857}
ceval-college_programming: {'accuracy': 32.432432432432435}
ceval-college_physics: {'accuracy': 26.31578947368421}
ceval-college_chemistry: {'accuracy': 16.666666666666664}
ceval-advanced_mathematics: {'accuracy': 21.052631578947366}
ceval-probability_and_statistics: {'accuracy': 38.88888888888889}
ceval-discrete_mathematics: {'accuracy': 18.75}
ceval-electrical_engineer: {'accuracy': 35.13513513513514}
ceval-metrology_engineer: {'accuracy': 50.0}
ceval-high_school_mathematics: {'accuracy': 22.22222222222222}
ceval-high_school_physics: {'accuracy': 31.57894736842105}
ceval-high_school_chemistry: {'accuracy': 15.789473684210526}
ceval-high_school_biology: {'accuracy': 36.84210526315789}
ceval-middle_school_mathematics: {'accuracy': 26.31578947368421}
ceval-middle_school_biology: {'accuracy': 61.904761904761905}
ceval-middle_school_physics: {'accuracy': 63.1578947368421}
ceval-middle_school_chemistry: {'accuracy': 60.0}
ceval-veterinary_medicine: {'accuracy': 47.82608695652174}
ceval-college_economics: {'accuracy': 41.81818181818181}
ceval-business_administration: {'accuracy': 33.33333333333333}
ceval-marxism: {'accuracy': 68.42105263157895}
ceval-mao_zedong_thought: {'accuracy': 70.83333333333334}
ceval-education_science: {'accuracy': 58.620689655172406}
ceval-teacher_qualification: {'accuracy': 70.45454545454545}
ceval-high_school_politics: {'accuracy': 26.31578947368421}
ceval-high_school_geography: {'accuracy': 47.368421052631575}
ceval-middle_school_politics: {'accuracy': 52.38095238095239}
ceval-middle_school_geography: {'accuracy': 58.333333333333336}
ceval-modern_chinese_history: {'accuracy': 73.91304347826086}
ceval-ideological_and_moral_cultivation: {'accuracy': 63.1578947368421}
ceval-logic: {'accuracy': 31.818181818181817}
ceval-law: {'accuracy': 25.0}
ceval-chinese_language_and_literature: {'accuracy': 30.434782608695656}
ceval-art_studies: {'accuracy': 60.60606060606061}
ceval-professional_tour_guide: {'accuracy': 62.06896551724138}
ceval-legal_professional: {'accuracy': 39.130434782608695}
ceval-high_school_chinese: {'accuracy': 63.1578947368421}
ceval-high_school_history: {'accuracy': 70.0}
ceval-middle_school_history: {'accuracy': 59.09090909090909}
ceval-civil_servant: {'accuracy': 53.191489361702125}
ceval-sports_science: {'accuracy': 52.63157894736842}
ceval-plant_protection: {'accuracy': 59.09090909090909}
ceval-basic_medicine: {'accuracy': 47.368421052631575}
ceval-clinical_medicine: {'accuracy': 40.909090909090914}
ceval-urban_and_rural_planner: {'accuracy': 45.65217391304348}
ceval-accountant: {'accuracy': 26.53061224489796}
ceval-fire_engineer: {'accuracy': 22.58064516129032}
ceval-environmental_impact_assessment_engineer: {'accuracy': 64.51612903225806}
ceval-tax_accountant: {'accuracy': 34.69387755102041}
ceval-physician: {'accuracy': 40.816326530612244}
ceval-stem: {'naive_average': 35.09356534942919}
ceval-social-science: {'naive_average': 52.78796324667468}
ceval-humanities: {'naive_average': 52.57983339778566}
ceval-other: {'naive_average': 44.36193216316587}
ceval-hard: {'naive_average': 23.90807748538011}
ceval: {'naive_average': 44.15596847357301}
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

1
import pandas as pd
1
2
data = pd.read_csv("/root/opencompass/outputs/default/20240121_125459/summary/summary_20240121_125459.csv")
data

dataset version metric mode opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b
0 ceval-computer_network db9ce2 accuracy gen 31.58
1 ceval-operating_system 1c2571 accuracy gen 36.84
2 ceval-computer_architecture a74dad accuracy gen 28.57
3 ceval-college_programming 4ca32a accuracy gen 32.43
4 ceval-college_physics 963fa8 accuracy gen 26.32
5 ceval-college_chemistry e78857 accuracy gen 16.67
6 ceval-advanced_mathematics ce03e2 accuracy gen 21.05
7 ceval-probability_and_statistics 65e812 accuracy gen 38.89
8 ceval-discrete_mathematics e894ae accuracy gen 18.75
9 ceval-electrical_engineer ae42b9 accuracy gen 35.14
10 ceval-metrology_engineer ee34ea accuracy gen 50.00
11 ceval-high_school_mathematics 1dc5bf accuracy gen 22.22
12 ceval-high_school_physics adf25f accuracy gen 31.58
13 ceval-high_school_chemistry 2ed27f accuracy gen 15.79
14 ceval-high_school_biology 8e2b9a accuracy gen 36.84
15 ceval-middle_school_mathematics bee8d5 accuracy gen 26.32
16 ceval-middle_school_biology 86817c accuracy gen 61.90
17 ceval-middle_school_physics 8accf6 accuracy gen 63.16
18 ceval-middle_school_chemistry 167a15 accuracy gen 60.00
19 ceval-veterinary_medicine b4e08d accuracy gen 47.83
20 ceval-college_economics f3f4e6 accuracy gen 41.82
21 ceval-business_administration c1614e accuracy gen 33.33
22 ceval-marxism cf874c accuracy gen 68.42
23 ceval-mao_zedong_thought 51c7a4 accuracy gen 70.83
24 ceval-education_science 591fee accuracy gen 58.62
25 ceval-teacher_qualification 4e4ced accuracy gen 70.45
26 ceval-high_school_politics 5c0de2 accuracy gen 26.32
27 ceval-high_school_geography 865461 accuracy gen 47.37
28 ceval-middle_school_politics 5be3e7 accuracy gen 52.38
29 ceval-middle_school_geography 8a63be accuracy gen 58.33
30 ceval-modern_chinese_history fc01af accuracy gen 73.91
31 ceval-ideological_and_moral_cultivation a2aa4a accuracy gen 63.16
32 ceval-logic f5b022 accuracy gen 31.82
33 ceval-law a110a1 accuracy gen 25.00
34 ceval-chinese_language_and_literature 0f8b68 accuracy gen 30.43
35 ceval-art_studies 2a1300 accuracy gen 60.61
36 ceval-professional_tour_guide 4e673e accuracy gen 62.07
37 ceval-legal_professional ce8787 accuracy gen 39.13
38 ceval-high_school_chinese 315705 accuracy gen 63.16
39 ceval-high_school_history 7eb30a accuracy gen 70.00
40 ceval-middle_school_history 48ab4a accuracy gen 59.09
41 ceval-civil_servant 87d061 accuracy gen 53.19
42 ceval-sports_science 70f27b accuracy gen 52.63
43 ceval-plant_protection 8941f9 accuracy gen 59.09
44 ceval-basic_medicine c409d6 accuracy gen 47.37
45 ceval-clinical_medicine 49e82d accuracy gen 40.91
46 ceval-urban_and_rural_planner 95b885 accuracy gen 45.65
47 ceval-accountant 002837 accuracy gen 26.53
48 ceval-fire_engineer bc23f5 accuracy gen 22.58
49 ceval-environmental_impact_assessment_engineer c64e2d accuracy gen 64.52
50 ceval-tax_accountant 3a5e3c accuracy gen 34.69
51 ceval-physician 6e277d accuracy gen 40.82
52 ceval-stem - naive_average gen 35.09
53 ceval-social-science - naive_average gen 52.79
54 ceval-humanities - naive_average gen 52.58
55 ceval-other - naive_average gen 44.36
56 ceval-hard - naive_average gen 23.91
57 ceval - naive_average gen 44.16

参考文献

  1. open-compass/opencompass
  2. 大模型评测教程