lcb_codegen: by examples

Home   Doc/Code

Not solved by any model

There are 97 examples not solved by any model. Solving some of these can be a good signal that your model is indeed better than leading models if these are good problems.
1899_D, 2849, 2879, 2921, 2952, 3017, 3024, 3025, 3046, 3047, 3091, 3171, 3184, 3190, 3192, 3200, 3211, 3212, 3219, 3223, 3224, 3228, 3233, 3240, 3243, 3261, 3265, 3297, 3298, 3299, 3308, 3317, abc301_d, abc301_e, abc301_f, abc302_f, abc303_e, abc304_d, abc305_d, abc305_e, abc306_d, abc306_e, abc307_c, abc307_d, abc307_e, abc308_e, abc309_d, abc309_e, abc310_e, abc310_f, abc311_c, abc311_d, abc312_e, abc312_f, abc314_d, abc314_e, abc314_f, abc315_d, abc315_e, abc315_f, abc318_e, abc319_c, abc320_c, abc321_d, abc321_e, abc322_e, abc323_d, abc323_e, abc324_d, abc324_e, abc325_d, abc325_f, abc326_d, abc326_e, abc327_e, abc329_c, abc329_e, abc329_f, abc330_e, abc331_d, abc331_e, abc333_d, abc333_e, abc334_c, abc336_d, abc337_d, abc337_e, abc338_d, abc338_f, abc340_c, abc340_e, abc341_e, abc341_f, abc342_d, abc342_e, abc343_a, abc343_e

Problems solved by 1 model only

example_link model min_elo
abc308_f GPT-4O-2024-05-13 1667.579
3080 GPT-4O-2024-05-13 1667.579
3032 GPT-4O-2024-05-13 1667.579
abc310_d GPT-4O-2024-05-13 1667.579
abc320_e GPT-4O-2024-05-13 1667.579
abc332_c GPT-4O-2024-05-13 1667.579
3141 GPT-4O-2024-05-13 1667.579
abc312_b GPT-4O-2024-05-13 1667.579
3209 GPT-4-Turbo-2024-04-09 1466.443
abc342_c GPT-4-Turbo-2024-04-09 1466.443
abc321_b GPT-4-Turbo-2024-04-09 1466.443
2757 Gemini-Pro-1.5 (May) 1435.522
1883_C Gemini-Pro-1.5 (May) 1435.522
3292 GPT-4-0613 1372.862
abc319_e GPT-4-0613 1372.862
2893 WCoder-33B-V1.1 1276.673
3166 CodeQwen15-7B-Chat 1113.607
3196 CodeQwen15-7B-Chat 1113.607
3244 CodeQwen15-7B-Chat 1113.607
2833 CodeQwen15-7B-Chat 1113.607
3262 CodeQwen15-7B-Chat 1113.607
3033 CodeQwen15-7B-Chat 1113.607
abc336_c Command-R+ 1060.359

Suspect problems

These are 10 problems with the lowest correlation with the overall evaluation (i.e. better models tend to do worse on these. )

example_link acc tau
abc338_a 0.431 -0.200
2819 0.914 -0.080
3203 0.034 -0.051
2886 0.828 -0.011
abc336_c 0.017 -0.003
3195 0.103 -0.003
abc324_f 0.034 0.000
abc340_b 0.966 0.014
2847 0.776 0.034
abc339_c 0.414 0.043

Histogram of accuracies

Histogram of problems by the accuracy on each problem.

Histogram of difficulties

Histogram of problems by the minimum Elo to solve each problem.