mbpp: by examples

Home Doc/Code

There are 9 examples not solved by any model. Solving some of these can be a good signal that your model is indeed better than leading models if these are good problems.
Mbpp/235, Mbpp/260, Mbpp/306, Mbpp/311, Mbpp/398, Mbpp/430, Mbpp/462, Mbpp/590, Mbpp/603

example_link	model	min_elo
Mbpp/780	gpt-4-1106-preview	1074.055
Mbpp/310	meta-llama-3-70b-instruct	1061.476
Mbpp/448	bigcode--starcoder2-15b-instruct-v0.1	1046.197
Mbpp/765	mistral-large-latest	1027.328
Mbpp/103	databricks--dbrx-instruct	1007.678
Mbpp/468	microsoft--Phi-3-mini-4k-instruct	1003.010
Mbpp/124	octocoder	979.655

These are 10 problems with the lowest correlation with the overall evaluation (i.e. better models tend to do worse on these. )

Histogram of problems by the accuracy on each problem.

Histogram of problems by the minimum Elo to solve each problem.