There are 37 examples not solved by any model.
Solving some of these can be a good signal that your model is indeed better than leading models if these are good problems.
atcoder.abc311_c, atcoder.abc315_e, atcoder.abc319_c, atcoder.abc324_f, atcoder.abc333_e, atcoder.abc343_a, atcoder.abc343_e, atcoder.abc350_c, atcoder.abc350_e, atcoder.abc362_c, atcoder.abc363_f, atcoder.abc373_g, atcoder.abc376_f, atcoder.abc389_g, atcoder.abc397_d, atcoder.arc181_c, atcoder.arc181_d, atcoder.arc182_d, atcoder.arc183_b, atcoder.arc183_d, atcoder.arc184_d, atcoder.arc186_a, atcoder.arc186_c, atcoder.arc186_e, atcoder.arc188_c, atcoder.arc189_a, atcoder.arc190_a, atcoder.arc190_c, atcoder.arc191_c, atcoder.arc192_b, atcoder.arc193_b, atcoder.arc194_c, atcoder.arc196_a, atcoder.arc196_c, leetcode.3478, leetcode.3527, leetcode.3763
| example_link | model | min_elo |
|---|---|---|
| atcoder.arc184_c | O4-Mini (High) | 1078.512 |
| leetcode.3638 | O3 (High) | 1068.955 |
| atcoder.arc192_e | O3 (High) | 1068.955 |
| atcoder.arc183_c | O3 (High) | 1068.955 |
| atcoder.abc355_e | O3 (High) | 1068.955 |
| leetcode.3701 | O3 (High) | 1068.955 |
| leetcode.3762 | O3 (High) | 1068.955 |
| atcoder.abc392_d | DeepSeek-R1-0528 | 1068.603 |
| atcoder.arc191_a | DeepSeek-R1-0528 | 1068.603 |
| atcoder.arc196_d | DeepSeek-R1-0528 | 1068.603 |
| atcoder.abc338_f | DeepSeek-R1-0528 | 1068.603 |
| atcoder.arc188_d | DeepSeek-R1-0528 | 1068.603 |
| atcoder.abc374_d | DeepSeek-R1-0528 | 1068.603 |
| atcoder.abc315_f | DeepSeek-R1-0528 | 1068.603 |
| atcoder.abc400_g | DeepSeek-R1-0528 | 1068.603 |
| leetcode.3482 | DeepSeek-R1-0528 | 1068.603 |
| atcoder.abc327_e | DeepSeek-R1-0528 | 1068.603 |
| atcoder.abc314_e | DeepSeek-R1-0528 | 1068.603 |
| atcoder.arc191_d | Gemini-2.5-Pro-06-05 | 1067.196 |
| atcoder.arc195_d | Gemini-2.5-Pro-03-25 | 1057.047 |
| atcoder.abc371_f | Grok-3-Mini (High) | 1044.562 |
| atcoder.abc375_b | Grok-3-Mini (High) | 1044.562 |
| atcoder.abc398_g | Grok-3-Mini (High) | 1044.562 |
| atcoder.abc376_g | Grok-3-Mini (High) | 1044.562 |
| atcoder.abc399_e | Grok-3-Mini (High) | 1044.562 |
| atcoder.abc366_g | Grok-3-Mini (High) | 1044.562 |
| atcoder.abc382_g | Grok-3-Mini (High) | 1044.562 |
| atcoder.abc370_g | Grok-3-Mini (High) | 1044.562 |
| atcoder.abc396_e | Grok-3-Mini (High) | 1044.562 |
| atcoder.arc195_c | Grok-3-Mini (High) | 1044.562 |
| atcoder.abc378_g | Grok-3-Mini (High) | 1044.562 |
| atcoder.arc185_c | Grok-3-Mini (High) | 1044.562 |
| atcoder.abc337_e | O3-Mini-2025-01-31 (High) | 1043.181 |
| leetcode.3613 | Gemini-Flash-2.0-Thinking-12-19 | 966.561 |
These are 10 problems with the lowest correlation with the overall evaluation (i.e. better models tend to do worse on these. )
| example_link | acc | tau |
|---|---|---|
| atcoder.abc344_b | 0.900 | -0.336 |
| atcoder.abc384_f | 0.767 | -0.329 |
| atcoder.abc386_f | 0.800 | -0.320 |
| atcoder.abc384_g | 0.600 | -0.294 |
| leetcode.2816 | 0.967 | -0.258 |
| leetcode.3453 | 0.700 | -0.157 |
| atcoder.abc351_a | 0.967 | -0.134 |
| leetcode.3613 | 0.033 | -0.116 |
| atcoder.abc375_a | 0.967 | -0.116 |
| atcoder.arc193_a | 0.300 | -0.101 |
Histogram of problems by the accuracy on each problem.
Histogram of problems by the minimum Elo to solve each problem.