plotly-logomark

Not solved by any model

There are 6 examples not solved by any model. Solving some of these can be a good signal that your model is indeed better than leading models if these are good problems.
HumanEval/129, HumanEval/130, HumanEval/132, HumanEval/145, HumanEval/163, HumanEval/32

These are 10 problems with the lowest correlation with the overall evaluation (i.e. better models tend to do worse on these. )

Histogram of problems by the accuracy on each problem.

Histogram of problems by the minimum Elo to solve each problem.