TheoremQA

python3 run.py --models hf_internlm2_7b --datasets TheoremQA_5shot_gen_6f0af8 --debug
python3 run.py --models hf_internlm2_chat_7b --datasets TheoremQA_5shot_gen_6f0af8 --debug

Base Models

model	TheoremQA
llama-7b-turbomind	10.25
llama-13b-turbomind	11.25
llama-30b-turbomind	14.25
llama-65b-turbomind	15.62
llama-2-7b-turbomind	12.62
llama-2-13b-turbomind	11.88
llama-2-70b-turbomind	15.62
llama-3-8b-turbomind	20.25
llama-3-70b-turbomind	33.62
internlm2-1.8b-turbomind	10.50
internlm2-7b-turbomind	21.88
internlm2-20b-turbomind	26.00
qwen-1.8b-turbomind	9.38
qwen-7b-turbomind	15.00
qwen-14b-turbomind	21.62
qwen-72b-turbomind	27.12
qwen1.5-0.5b-hf	5.88
qwen1.5-1.8b-hf	12.00
qwen1.5-4b-hf	13.75
qwen1.5-7b-hf	4.25
qwen1.5-14b-hf	12.62
qwen1.5-32b-hf	26.62
qwen1.5-72b-hf	26.62
qwen1.5-moe-a2-7b-hf	7.50
mistral-7b-v0.1-hf	17.00
mistral-7b-v0.2-hf	16.25
mixtral-8x7b-v0.1-hf	24.12
mixtral-8x22b-v0.1-hf	36.75
yi-6b-hf	13.88
yi-34b-hf	24.75
deepseek-7b-base-hf	12.38
deepseek-67b-base-hf	21.25

Chat Models

model	TheoremQA
qwen1.5-0.5b-chat-hf	9.00
qwen1.5-1.8b-chat-hf	9.25
qwen1.5-4b-chat-hf	13.88
qwen1.5-7b-chat-hf	12.25
qwen1.5-14b-chat-hf	13.63
qwen1.5-32b-chat-hf	19.25
qwen1.5-72b-chat-hf	22.75
qwen1.5-110b-chat-hf	17.50
internlm2-chat-1.8b-hf	13.63
internlm2-chat-1.8b-sft-hf	12.88
internlm2-chat-7b-hf	18.50
internlm2-chat-7b-sft-hf	18.75
internlm2-chat-20b-hf	23.00
internlm2-chat-20b-sft-hf	25.12
llama-3-8b-instruct-hf	19.38
llama-3-70b-instruct-hf	36.25
llama-3-8b-instruct-lmdeploy	19.62
llama-3-70b-instruct-lmdeploy	34.50
mistral-7b-instruct-v0.1-hf	12.62
mistral-7b-instruct-v0.2-hf	11.38
mixtral-8x7b-instruct-v0.1-hf	26.00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

README.md

README.md

TheoremQA

Base Models

Chat Models

Files

README.md

Latest commit

History

README.md

File metadata and controls

TheoremQA

Base Models

Chat Models