mmlu
tqa
safim
hellaswag
gsm8k
nq
agi_english
CRUXEval-output
DS1000
CRUXEval-input
arc_challenge
mbpp
lcb_codegen
mbpp+
piqa
siqa
humaneval
humaneval+
0
5
10
15
20
25
model_family
deepseek-coder
opencodeinterpreter-ds
Qwen1.5
llama
deepseek-llm
llama2
Mixtral-8
Meta-Llama-3
deepseek-base
deepseek-instruct
codellama
wizardcoder
LLama3
DSCoder
meta-llama-Llama-3
Qwen-Qwen1.5
deepseek-ai-deepseek-coder
codegen
benchmark_id
signal to noise
plotly-logomark