There are 1138 examples not solved by any model.
Solving some of these can be a good signal that your model is indeed better than leading models if these are good problems.
nq/10, nq/1000, nq/101, nq/1011, nq/1012, nq/1015, nq/1017, nq/1018, nq/1024, nq/1025, nq/1026, nq/1029, nq/1032, nq/1041, nq/1043, nq/105, nq/1052, nq/1054, nq/106, nq/1060, nq/1062, nq/1063, nq/1064, nq/1068, nq/1071, nq/1084, nq/1085, nq/1089, nq/1090, nq/1094, nq/1095, nq/1097, nq/110, nq/1100, nq/1108, nq/1111, nq/1112, nq/1115, nq/1117, nq/112, nq/1128, nq/1130, nq/1132, nq/1136, nq/1139, nq/1152, nq/1154, nq/1158, nq/1160, nq/1163, nq/1173, nq/1177, nq/118, nq/1181, nq/1191, nq/1192, nq/121, nq/1211, nq/1216, nq/1224, nq/1229, nq/1230, nq/1231, nq/1233, nq/1234, nq/124, nq/1244, nq/1245, nq/1258, nq/1265, nq/1266, nq/127, nq/1273, nq/1274, nq/1279, nq/1281, nq/1283, nq/1288, nq/1299, nq/130, nq/1300, nq/1302, nq/1304, nq/1306, nq/1307, nq/1308, nq/1313, nq/1319, nq/132, nq/1321, nq/1326, nq/1328, nq/1329, nq/133, nq/1336, nq/134, nq/1342, nq/1344, nq/1345, nq/1349, nq/135, nq/1358, nq/1360, nq/1363, nq/1365, nq/1372, nq/1373, nq/1375, nq/1379, nq/1381, nq/1384, nq/1387, nq/1393, nq/1396, nq/1398, nq/1399, nq/1400, nq/1401, nq/1402, nq/1405, nq/1407, nq/1411, nq/1417, nq/1419, nq/1420, nq/1422, nq/1425, nq/1426, nq/1428, nq/1430, nq/1439, nq/1440, nq/1444, nq/1448, nq/1449, nq/1451, nq/1452, nq/1453, nq/146, nq/1463, nq/1464, nq/1467, nq/1473, nq/1477, nq/148, nq/1487, nq/1488, nq/1493, nq/15, nq/1502, nq/1504, nq/1507, nq/1512, nq/1514, nq/1515, nq/1516, nq/1520, nq/1521, nq/1522, nq/1525, nq/1530, nq/1532, nq/1533, nq/1534, nq/1536, nq/154, nq/1542, nq/1545, nq/1546, nq/1548, nq/1549, nq/1550, nq/1553, nq/1554, nq/1556, nq/1560, nq/1563, nq/1564, nq/1565, nq/1566, nq/1568, nq/157, nq/1572, nq/1575, nq/1577, nq/1588, nq/1590, nq/1591, nq/1592, nq/1595, nq/1596, nq/1597, nq/160, nq/1601, nq/1604, nq/1606, nq/1607, nq/1609, nq/1615, nq/1617, nq/162, nq/1621, nq/1626, nq/1627, nq/1631, nq/1633, nq/1636, nq/164, nq/1647, nq/1648, nq/165, nq/1652, nq/1656, nq/1658, nq/1659, nq/166, nq/1660, nq/1661, nq/1662, nq/1663, nq/1664, nq/1666, nq/1667, nq/1669, nq/1672, nq/1673, nq/1675, nq/1676, nq/1682, nq/1683, nq/169, nq/1691, nq/1693, nq/17, nq/1700, nq/1701, nq/1708, nq/1715, nq/1716, nq/1719, nq/1726, nq/1730, nq/1735, nq/1736, nq/1737, nq/1739, nq/1742, nq/1746, nq/175, nq/1750, nq/1751, nq/1752, nq/1754, nq/1755, nq/1762, nq/1763, nq/1765, nq/1767, nq/1768, nq/1773, nq/1774, nq/1776, nq/1778, nq/1779, nq/178, nq/1780, nq/1781, nq/1782, nq/1785, nq/1786, nq/1788, nq/1796, nq/1807, nq/1808, nq/1809, nq/181, nq/1812, nq/1813, nq/1815, nq/1816, nq/1824, nq/1827, nq/1828, nq/1833, nq/1837, nq/184, nq/1842, nq/1843, nq/1846, nq/1848, nq/1849, nq/1850, nq/1854, nq/1858, nq/1859, nq/1860, nq/1865, nq/1866, nq/187, nq/1870, nq/1872, nq/1876, nq/1878, nq/1880, nq/1883, nq/1887, nq/1888, nq/189, nq/1896, nq/1900, nq/1909, nq/1912, nq/1913, nq/1918, nq/1919, nq/1923, nq/1925, nq/1926, nq/193, nq/1933, nq/1937, nq/1939, nq/1942, nq/1943, nq/1949, nq/1950, nq/1951, nq/1952, nq/1954, nq/1955, nq/1959, nq/1965, nq/1967, nq/1969, nq/1971, nq/1977, nq/1979, nq/1982, nq/1983, nq/1985, nq/1989, nq/199, nq/1992, nq/1998, nq/2, nq/2000, nq/2001, nq/2002, nq/2003, nq/2004, nq/2010, nq/2014, nq/2017, nq/2018, nq/202, nq/2024, nq/2029, nq/203, nq/2032, nq/2033, nq/2034, nq/204, nq/2042, nq/2043, nq/2045, nq/205, nq/2050, nq/2053, nq/2054, nq/2057, nq/206, nq/2060, nq/2068, nq/2070, nq/2074, nq/2077, nq/208, nq/2081, nq/2085, nq/2093, nq/2094, nq/2096, nq/2103, nq/2104, nq/2113, nq/2116, nq/2118, nq/212, nq/2121, nq/2126, nq/2131, nq/2134, nq/2137, nq/2144, nq/2145, nq/2147, nq/2148, nq/215, nq/2150, nq/2151, nq/2154, nq/2157, nq/2158, nq/2166, nq/2176, nq/2183, nq/219, nq/2192, nq/2194, nq/2195, nq/2197, nq/2200, nq/2204, nq/2206, nq/221, nq/2210, nq/2211, nq/2212, nq/2214, nq/2219, nq/2224, nq/2230, nq/2236, nq/2237, nq/2239, nq/224, nq/2241, nq/2244, nq/2245, nq/2248, nq/2252, nq/2261, nq/2262, nq/2263, nq/2264, nq/2265, nq/227, nq/2275, nq/2278, nq/2284, nq/2288, nq/229, nq/2294, nq/2298, nq/2302, nq/2304, nq/2305, nq/2308, nq/2309, nq/2313, nq/232, nq/2322, nq/233, nq/2330, nq/2331, nq/2338, nq/2340, nq/2342, nq/2346, nq/2349, nq/2355, nq/2356, nq/2359, nq/2360, nq/2361, nq/2366, nq/2367, nq/2372, nq/2379, nq/238, nq/2382, nq/2383, nq/2388, nq/2395, nq/240, nq/2400, nq/2401, nq/2403, nq/2409, nq/2410, nq/2411, nq/2414, nq/2418, nq/242, nq/2421, nq/2423, nq/2424, nq/243, nq/2433, nq/2435, nq/2437, nq/2441, nq/2447, nq/245, nq/2453, nq/2459, nq/2460, nq/2462, nq/2466, nq/2470, nq/2471, nq/2473, nq/2480, nq/2481, nq/2482, nq/2489, nq/2493, nq/2494, nq/2496, nq/2499, nq/25, nq/2501, nq/2507, nq/2508, nq/251, nq/2511, nq/2513, nq/252, nq/2520, nq/2523, nq/2524, nq/2528, nq/2529, nq/2530, nq/2536, nq/2539, nq/2545, nq/2553, nq/2559, nq/2560, nq/2563, nq/2566, nq/2573, nq/2574, nq/2577, nq/2578, nq/2579, nq/258, nq/2593, nq/2594, nq/2595, nq/2602, nq/2605, nq/2606, nq/2608, nq/2611, nq/2613, nq/2615, nq/2617, nq/2624, nq/2627, nq/2628, nq/2629, nq/263, nq/2631, nq/2635, nq/2637, nq/2638, nq/2649, nq/2651, nq/2655, nq/2656, nq/2657, nq/2658, nq/2660, nq/2661, nq/2662, nq/2669, nq/2670, nq/2671, nq/2673, nq/2678, nq/2687, nq/2688, nq/2694, nq/2695, nq/2696, nq/2699, nq/2707, nq/2709, nq/2710, nq/2717, nq/272, nq/2722, nq/2724, nq/2728, nq/273, nq/2733, nq/2735, nq/2738, nq/2740, nq/2745, nq/2747, nq/2748, nq/2750, nq/2754, nq/2757, nq/2758, nq/2763, nq/2764, nq/2765, nq/2773, nq/2774, nq/2785, nq/279, nq/2793, nq/2794, nq/2795, nq/2797, nq/2798, nq/28, nq/2800, nq/2802, nq/2805, nq/2807, nq/2809, nq/2813, nq/2814, nq/2821, nq/2823, nq/2828, nq/283, nq/2830, nq/2832, nq/2836, nq/2838, nq/2839, nq/2841, nq/2842, nq/2846, nq/2849, nq/2851, nq/2856, nq/2859, nq/2860, nq/2868, nq/2869, nq/2870, nq/2876, nq/2878, nq/2879, nq/2883, nq/2885, nq/2886, nq/2887, nq/2889, nq/2897, nq/29, nq/2906, nq/2907, nq/291, nq/2912, nq/2913, nq/2920, nq/2923, nq/2924, nq/2927, nq/2928, nq/2929, nq/2930, nq/2931, nq/2932, nq/2934, nq/2935, nq/294, nq/2940, nq/2942, nq/2945, nq/2950, nq/2951, nq/2954, nq/2955, nq/2958, nq/2966, nq/2968, nq/297, nq/2971, nq/2972, nq/2973, nq/2977, nq/2978, nq/2982, nq/2983, nq/2985, nq/2988, nq/299, nq/2993, nq/2996, nq/3, nq/300, nq/3003, nq/3009, nq/301, nq/3010, nq/3011, nq/3014, nq/3030, nq/3032, nq/3034, nq/3039, nq/304, nq/3044, nq/3046, nq/3053, nq/3056, nq/3063, nq/3066, nq/3068, nq/307, nq/3072, nq/308, nq/3083, nq/3084, nq/3088, nq/3089, nq/3093, nq/3102, nq/3104, nq/3106, nq/311, nq/3111, nq/3112, nq/3117, nq/3118, nq/312, nq/3120, nq/3122, nq/3126, nq/3127, nq/3130, nq/3135, nq/3136, nq/3141, nq/3142, nq/3143, nq/3145, nq/3146, nq/3148, nq/3149, nq/315, nq/3150, nq/3153, nq/3154, nq/3157, nq/3159, nq/316, nq/3162, nq/3173, nq/3176, nq/3180, nq/3181, nq/3184, nq/3189, nq/3190, nq/3191, nq/3192, nq/3197, nq/32, nq/3200, nq/3205, nq/3207, nq/3208, nq/3209, nq/3213, nq/3216, nq/3218, nq/3221, nq/3224, nq/3227, nq/3230, nq/3232, nq/3235, nq/3238, nq/3242, nq/3245, nq/3246, nq/3247, nq/325, nq/3252, nq/326, nq/3260, nq/3261, nq/3262, nq/3267, nq/3269, nq/3270, nq/3273, nq/328, nq/3283, nq/3286, nq/329, nq/3291, nq/3292, nq/3295, nq/3299, nq/3300, nq/3301, nq/3302, nq/3303, nq/3307, nq/3312, nq/3313, nq/3316, nq/3324, nq/3325, nq/3328, nq/333, nq/3330, nq/3337, nq/3344, nq/3351, nq/3359, nq/3360, nq/3361, nq/3369, nq/3370, nq/3375, nq/3376, nq/3381, nq/3382, nq/3389, nq/3395, nq/3396, nq/3398, nq/340, nq/3400, nq/3402, nq/3404, nq/3409, nq/3410, nq/3411, nq/3412, nq/3416, nq/3419, nq/342, nq/3423, nq/3428, nq/3429, nq/343, nq/3431, nq/3432, nq/3435, nq/3437, nq/3445, nq/3446, nq/3447, nq/3448, nq/345, nq/3455, nq/3456, nq/3460, nq/3461, nq/3462, nq/3464, nq/3470, nq/3471, nq/3479, nq/3480, nq/3496, nq/3499, nq/3500, nq/3501, nq/3504, nq/3505, nq/3507, nq/3510, nq/3518, nq/352, nq/3521, nq/353, nq/3534, nq/3535, nq/3536, nq/3552, nq/3558, nq/3561, nq/3562, nq/3566, nq/3567, nq/3568, nq/357, nq/3573, nq/3576, nq/3577, nq/3579, nq/3580, nq/3584, nq/3587, nq/3589, nq/3594, nq/3595, nq/3597, nq/3599, nq/3602, nq/3603, nq/3604, nq/363, nq/366, nq/372, nq/373, nq/374, nq/376, nq/38, nq/381, nq/384, nq/386, nq/389, nq/39, nq/390, nq/391, nq/392, nq/393, nq/394, nq/4, nq/400, nq/401, nq/408, nq/41, nq/413, nq/421, nq/422, nq/424, nq/428, nq/429, nq/433, nq/436, nq/439, nq/44, nq/442, nq/446, nq/447, nq/449, nq/453, nq/454, nq/455, nq/456, nq/46, nq/461, nq/464, nq/465, nq/469, nq/473, nq/476, nq/48, nq/484, nq/486, nq/490, nq/493, nq/495, nq/496, nq/5, nq/500, nq/501, nq/503, nq/51, nq/510, nq/518, nq/519, nq/522, nq/528, nq/53, nq/536, nq/537, nq/540, nq/542, nq/545, nq/556, nq/557, nq/558, nq/560, nq/565, nq/576, nq/578, nq/580, nq/583, nq/584, nq/589, nq/592, nq/596, nq/598, nq/600, nq/602, nq/609, nq/61, nq/611, nq/613, nq/614, nq/616, nq/618, nq/620, nq/623, nq/627, nq/629, nq/631, nq/633, nq/634, nq/639, nq/641, nq/647, nq/648, nq/649, nq/660, nq/662, nq/663, nq/664, nq/668, nq/669, nq/67, nq/670, nq/672, nq/677, nq/678, nq/68, nq/683, nq/684, nq/69, nq/696, nq/697, nq/698, nq/699, nq/7, nq/70, nq/700, nq/702, nq/704, nq/709, nq/71, nq/711, nq/714, nq/715, nq/717, nq/719, nq/720, nq/726, nq/728, nq/730, nq/734, nq/735, nq/738, nq/74, nq/745, nq/746, nq/749, nq/753, nq/757, nq/758, nq/760, nq/761, nq/762, nq/763, nq/767, nq/77, nq/776, nq/778, nq/779, nq/780, nq/781, nq/784, nq/788, nq/79, nq/791, nq/793, nq/794, nq/800, nq/801, nq/803, nq/805, nq/808, nq/817, nq/822, nq/823, nq/824, nq/828, nq/829, nq/838, nq/839, nq/840, nq/844, nq/848, nq/853, nq/858, nq/863, nq/864, nq/865, nq/869, nq/87, nq/870, nq/872, nq/876, nq/877, nq/878, nq/883, nq/885, nq/889, nq/89, nq/892, nq/894, nq/897, nq/899, nq/9, nq/90, nq/901, nq/907, nq/91, nq/913, nq/917, nq/919, nq/921, nq/922, nq/930, nq/933, nq/938, nq/939, nq/946, nq/947, nq/948, nq/949, nq/950, nq/951, nq/953, nq/955, nq/957, nq/960, nq/961, nq/966, nq/969, nq/97, nq/971, nq/974, nq/977, nq/978, nq/98, nq/981, nq/990, nq/993, nq/994, nq/996, nq/997
example_link | model | min_elo |
---|---|---|
nq/2884 | dbrx-base | 1384.139 |
nq/3380 | dbrx-base | 1384.139 |
nq/2228 | dbrx-base | 1384.139 |
nq/3563 | dbrx-base | 1384.139 |
nq/3334 | dbrx-base | 1384.139 |
nq/457 | dbrx-base | 1384.139 |
nq/3055 | dbrx-base | 1384.139 |
nq/377 | dbrx-base | 1384.139 |
nq/3368 | dbrx-base | 1384.139 |
nq/1348 | dbrx-base | 1384.139 |
nq/1734 | dbrx-base | 1384.139 |
nq/2743 | dbrx-base | 1384.139 |
nq/1462 | dbrx-base | 1384.139 |
nq/1602 | dbrx-base | 1384.139 |
nq/21 | dbrx-base | 1384.139 |
nq/845 | dbrx-base | 1384.139 |
nq/2117 | dbrx-base | 1384.139 |
nq/573 | dbrx-base | 1384.139 |
nq/2711 | dbrx-base | 1384.139 |
nq/2554 | dbrx-base | 1384.139 |
nq/1637 | dbrx-base | 1384.139 |
nq/2478 | dbrx-base | 1384.139 |
nq/1517 | dbrx-base | 1384.139 |
nq/513 | dbrx-base | 1384.139 |
nq/2509 | dbrx-base | 1384.139 |
nq/1641 | dbrx-base | 1384.139 |
nq/888 | dbrx-base | 1384.139 |
nq/2227 | dbrx-base | 1384.139 |
nq/399 | dbrx-base | 1384.139 |
nq/3601 | dbrx-base | 1384.139 |
nq/113 | dbrx-base | 1384.139 |
nq/904 | dbrx-base | 1384.139 |
nq/2686 | dbrx-base | 1384.139 |
nq/170 | dbrx-base | 1384.139 |
nq/3364 | dbrx-base | 1384.139 |
nq/637 | dbrx-base | 1384.139 |
nq/995 | dbrx-base | 1384.139 |
nq/3453 | dbrx-base | 1384.139 |
nq/2319 | dbrx-base | 1384.139 |
nq/2881 | dbrx-base | 1384.139 |
nq/1897 | dbrx-base | 1384.139 |
nq/1484 | dbrx-base | 1384.139 |
nq/3438 | dbrx-base | 1384.139 |
nq/3490 | dbrx-base | 1384.139 |
nq/3161 | dbrx-base | 1384.139 |
nq/1474 | dbrx-base | 1384.139 |
nq/690 | dbrx-base | 1384.139 |
nq/991 | dbrx-base | 1384.139 |
nq/2483 | dbrx-base | 1384.139 |
nq/49 | dbrx-base | 1384.139 |
nq/319 | dbrx-base | 1384.139 |
nq/1834 | dbrx-base | 1384.139 |
nq/2614 | dbrx-base | 1384.139 |
nq/1050 | dbrx-base | 1384.139 |
nq/3540 | dbrx-base | 1384.139 |
nq/183 | dbrx-base | 1384.139 |
nq/140 | dbrx-base | 1384.139 |
nq/1930 | dbrx-base | 1384.139 |
nq/3449 | dbrx-base | 1384.139 |
nq/932 | dbrx-base | 1384.139 |
nq/2517 | dbrx-base | 1384.139 |
nq/2796 | dbrx-base | 1384.139 |
nq/2488 | dbrx-base | 1384.139 |
nq/83 | dbrx-base | 1384.139 |
nq/3215 | dbrx-base | 1384.139 |
nq/2405 | dbrx-base | 1384.139 |
nq/2716 | dbrx-base | 1384.139 |
nq/3264 | dbrx-base | 1384.139 |
nq/3476 | dbrx-base | 1384.139 |
nq/2866 | dbrx-base | 1384.139 |
nq/3399 | dbrx-base | 1384.139 |
nq/2833 | dbrx-base | 1384.139 |
nq/1544 | dbrx-base | 1384.139 |
nq/2135 | dbrx-base | 1384.139 |
nq/176 | dbrx-base | 1384.139 |
nq/2370 | dbrx-base | 1384.139 |
nq/2293 | dbrx-base | 1384.139 |
nq/2555 | dbrx-base | 1384.139 |
nq/1830 | dbrx-base | 1384.139 |
nq/3320 | dbrx-base | 1384.139 |
nq/2256 | dbrx-base | 1384.139 |
nq/1146 | dbrx-base | 1384.139 |
nq/3166 | dbrx-base | 1384.139 |
nq/1377 | dbrx-base | 1384.139 |
nq/3420 | dbrx-base | 1384.139 |
nq/3494 | dbrx-base | 1384.139 |
nq/2271 | dbrx-base | 1384.139 |
nq/1727 | dbrx-base | 1384.139 |
nq/225 | dbrx-base | 1384.139 |
nq/1640 | dbrx-base | 1384.139 |
nq/1127 | dbrx-base | 1384.139 |
nq/3356 | dbrx-base | 1384.139 |
nq/211 | dbrx-base | 1384.139 |
nq/3472 | dbrx-base | 1384.139 |
nq/1102 | dbrx-base | 1384.139 |
nq/2290 | dbrx-base | 1384.139 |
nq/3076 | dbrx-base | 1384.139 |
nq/2701 | dbrx-base | 1384.139 |
nq/3554 | dbrx-base | 1384.139 |
nq/347 | dbrx-base | 1384.139 |
nq/3038 | dbrx-base | 1384.139 |
nq/3042 | dbrx-base | 1384.139 |
nq/1175 | dbrx-base | 1384.139 |
nq/497 | dbrx-base | 1384.139 |
nq/2285 | dbrx-base | 1384.139 |
nq/579 | dbrx-base | 1384.139 |
nq/900 | dbrx-base | 1384.139 |
nq/2914 | dbrx-base | 1384.139 |
nq/1838 | dbrx-base | 1384.139 |
nq/1496 | dbrx-base | 1384.139 |
nq/693 | dbrx-base | 1384.139 |
nq/310 | dbrx-base | 1384.139 |
nq/1594 | dbrx-base | 1384.139 |
nq/1964 | dbrx-base | 1384.139 |
nq/1526 | dbrx-base | 1384.139 |
nq/2059 | dbrx-base | 1384.139 |
nq/3025 | dbrx-base | 1384.139 |
nq/3110 | dbrx-base | 1384.139 |
nq/2445 | dbrx-base | 1384.139 |
nq/1491 | dbrx-base | 1384.139 |
nq/2939 | dbrx-base | 1384.139 |
nq/2143 | dbrx-base | 1384.139 |
nq/2893 | dbrx-base | 1384.139 |
nq/771 | dbrx-base | 1384.139 |
nq/1684 | dbrx-base | 1384.139 |
nq/441 | dbrx-base | 1384.139 |
nq/271 | dbrx-base | 1384.139 |
nq/626 | dbrx-base | 1384.139 |
nq/2729 | dbrx-base | 1384.139 |
nq/3243 | dbrx-base | 1384.139 |
nq/1331 | dbrx-base | 1384.139 |
nq/1037 | dbrx-base | 1384.139 |
nq/1911 | dbrx-base | 1384.139 |
nq/3285 | dbrx-base | 1384.139 |
nq/2725 | dbrx-base | 1384.139 |
nq/3271 | dbrx-base | 1384.139 |
nq/1646 | dbrx-base | 1384.139 |
nq/2477 | dbrx-base | 1384.139 |
nq/1202 | dbrx-base | 1384.139 |
nq/1441 | dbrx-base | 1384.139 |
nq/2328 | dbrx-base | 1384.139 |
nq/3236 | dbrx-base | 1384.139 |
nq/1269 | dbrx-base | 1384.139 |
nq/1845 | dbrx-base | 1384.139 |
nq/3477 | dbrx-base | 1384.139 |
nq/785 | Meta-Llama-3-70B | 1334.493 |
nq/890 | Meta-Llama-3-70B | 1334.493 |
nq/1906 | Meta-Llama-3-70B | 1334.493 |
nq/514 | Meta-Llama-3-70B | 1334.493 |
nq/2311 | Meta-Llama-3-70B | 1334.493 |
nq/1113 | Meta-Llama-3-70B | 1334.493 |
nq/438 | Meta-Llama-3-70B | 1334.493 |
nq/2998 | Meta-Llama-3-70B | 1334.493 |
nq/2786 | Meta-Llama-3-70B | 1334.493 |
nq/549 | Meta-Llama-3-70B | 1334.493 |
nq/1802 | Meta-Llama-3-70B | 1334.493 |
nq/1443 | Meta-Llama-3-70B | 1334.493 |
nq/1343 | Meta-Llama-3-70B | 1334.493 |
nq/3489 | Meta-Llama-3-70B | 1334.493 |
nq/3353 | Meta-Llama-3-70B | 1334.493 |
nq/886 | Meta-Llama-3-70B | 1334.493 |
nq/2827 | Meta-Llama-3-70B | 1334.493 |
nq/1123 | Meta-Llama-3-70B | 1334.493 |
nq/255 | Meta-Llama-3-70B | 1334.493 |
nq/2819 | Meta-Llama-3-70B | 1334.493 |
nq/2702 | Meta-Llama-3-70B | 1334.493 |
nq/3210 | Meta-Llama-3-70B | 1334.493 |
nq/3155 | Meta-Llama-3-70B | 1334.493 |
nq/2601 | Meta-Llama-3-70B | 1334.493 |
nq/3533 | Meta-Llama-3-70B | 1334.493 |
nq/567 | Meta-Llama-3-70B | 1334.493 |
nq/2586 | Meta-Llama-3-70B | 1334.493 |
nq/2775 | Meta-Llama-3-70B | 1334.493 |
nq/158 | Meta-Llama-3-70B | 1334.493 |
nq/3165 | Mixtral-8x22B-v0.1 | 1326.103 |
nq/1218 | Mixtral-8x22B-v0.1 | 1326.103 |
nq/1999 | Mixtral-8x22B-v0.1 | 1326.103 |
nq/1340 | Mixtral-8x22B-v0.1 | 1326.103 |
nq/570 | Mixtral-8x22B-v0.1 | 1326.103 |
nq/1806 | Mixtral-8x22B-v0.1 | 1326.103 |
nq/2058 | Mixtral-8x22B-v0.1 | 1326.103 |
nq/2503 | Mixtral-8x22B-v0.1 | 1326.103 |
nq/3425 | Mixtral-8x22B-v0.1 | 1326.103 |
nq/2123 | Mixtral-8x22B-v0.1 | 1326.103 |
nq/732 | Mixtral-8x22B-v0.1 | 1326.103 |
nq/1208 | Mixtral-8x22B-v0.1 | 1326.103 |
nq/2280 | Mixtral-8x22B-v0.1 | 1326.103 |
nq/1027 | Mixtral-8x22B-v0.1 | 1326.103 |
nq/1418 | Mixtral-8x22B-v0.1 | 1326.103 |
nq/1291 | Mixtral-8x22B-v0.1 | 1326.103 |
nq/1851 | Mixtral-8x22B-v0.1 | 1326.103 |
nq/2844 | Mixtral-8x22B-v0.1 | 1326.103 |
nq/2633 | Mixtral-8x22B-v0.1 | 1326.103 |
nq/3115 | Qwen1.5-110B | 1315.674 |
nq/3583 | Qwen1.5-110B | 1315.674 |
nq/1033 | Qwen1.5-110B | 1315.674 |
nq/1253 | Qwen1.5-110B | 1315.674 |
nq/3551 | Qwen1.5-110B | 1315.674 |
nq/3572 | llama_65B | 1250.270 |
nq/1459 | llama_65B | 1250.270 |
nq/1263 | llama_65B | 1250.270 |
nq/1562 | llama_65B | 1250.270 |
nq/3345 | llama_65B | 1250.270 |
nq/3491 | llama_65B | 1250.270 |
nq/1186 | llama_65B | 1250.270 |
nq/2168 | deepseek-llm-67b-base | 1244.695 |
nq/867 | deepseek-llm-67b-base | 1244.695 |
nq/535 | deepseek-llm-67b-base | 1244.695 |
nq/3134 | deepseek-llm-67b-base | 1244.695 |
nq/1759 | Mixtral-8x7B-v0.1 | 1234.964 |
nq/192 | Mixtral-8x7B-v0.1 | 1234.964 |
nq/625 | Mixtral-8x7B-v0.1 | 1234.964 |
nq/213 | Mixtral-8x7B-v0.1 | 1234.964 |
nq/1042 | Mixtral-8x7B-v0.1 | 1234.964 |
nq/2916 | Mixtral-8x7B-v0.1 | 1234.964 |
nq/837 | Qwen1.5-72B | 1209.968 |
nq/650 | Qwen1.5-72B | 1209.968 |
nq/3598 | Qwen1.5-72B | 1209.968 |
nq/2088 | Qwen1.5-72B | 1209.968 |
nq/1920 | Qwen1.5-72B | 1209.968 |
nq/1605 | Qwen1.5-72B | 1209.968 |
nq/3131 | Qwen1.5-72B | 1209.968 |
nq/414 | Qwen1.5-72B | 1209.968 |
nq/804 | llama_33B | 1197.090 |
nq/515 | falcon-40b | 1173.058 |
nq/75 | falcon-40b | 1173.058 |
nq/2111 | falcon-40b | 1173.058 |
nq/2533 | llama2_70B | 1155.579 |
nq/3114 | llama2_70B | 1155.579 |
nq/1053 | llama2_70B | 1155.579 |
nq/2090 | llama2_70B | 1155.579 |
nq/594 | llama2_70B | 1155.579 |
nq/2756 | llama2_70B | 1155.579 |
nq/1582 | llama2_70B | 1155.579 |
nq/2599 | Qwen1.5-32B | 1118.421 |
nq/1654 | Qwen1.5-32B | 1118.421 |
nq/1370 | Qwen1.5-32B | 1118.421 |
nq/2253 | Qwen1.5-32B | 1118.421 |
nq/1007 | Qwen1.5-32B | 1118.421 |
nq/2472 | Meta-Llama-3-8B | 1114.260 |
nq/832 | Meta-Llama-3-8B | 1114.260 |
nq/318 | Meta-Llama-3-8B | 1114.260 |
nq/1465 | Mistral-7B-v0.1 | 1097.943 |
nq/2114 | Mistral-7B-v0.1 | 1097.943 |
nq/1610 | Mistral-7B-v0.1 | 1097.943 |
nq/119 | deepseek-moe-16b-base | 1057.727 |
nq/1334 | deepseek-moe-16b-base | 1057.727 |
nq/16 | deepseek-moe-16b-base | 1057.727 |
nq/84 | deepseek-moe-16b-base | 1057.727 |
nq/1857 | deepseek-moe-16b-base | 1057.727 |
nq/509 | deepseek-moe-16b-base | 1057.727 |
nq/2540 | deepseek-moe-16b-base | 1057.727 |
nq/765 | deepseek-moe-16b-base | 1057.727 |
nq/3322 | llama2_13B | 1053.748 |
nq/3318 | llama2_13B | 1053.748 |
nq/2380 | llama2_13B | 1053.748 |
nq/3094 | llama2_13B | 1053.748 |
nq/2307 | llama2_13B | 1053.748 |
nq/2862 | llama2_13B | 1053.748 |
nq/3586 | llama2_13B | 1053.748 |
nq/2047 | llama2_13B | 1053.748 |
nq/1156 | llama2_13B | 1053.748 |
nq/1261 | llama2_13B | 1053.748 |
nq/1028 | llama2_13B | 1053.748 |
nq/3309 | mpt-30b | 1045.264 |
nq/2904 | mpt-30b | 1045.264 |
nq/2557 | mpt-30b | 1045.264 |
nq/196 | mpt-30b | 1045.264 |
nq/3336 | mpt-30b | 1045.264 |
nq/72 | gemma-7b | 1021.563 |
nq/1805 | gemma-7b | 1021.563 |
nq/3564 | gemma-7b | 1021.563 |
nq/2286 | gemma-7b | 1021.563 |
nq/652 | Qwen1.5-14B | 999.153 |
nq/2393 | Qwen1.5-14B | 999.153 |
nq/2947 | Qwen1.5-14B | 999.153 |
nq/410 | falcon-7b | 982.747 |
nq/1970 | llama_07B | 973.919 |
nq/2422 | llama_07B | 973.919 |
nq/3194 | llama_07B | 973.919 |
nq/3539 | deepseek-llm-7b-base | 973.632 |
nq/755 | deepseek-llm-7b-base | 973.632 |
nq/2461 | deepseek-llm-7b-base | 973.632 |
nq/2235 | deepseek-llm-7b-base | 973.632 |
nq/2389 | llama2_07B | 970.799 |
nq/3493 | llama2_07B | 970.799 |
nq/443 | Qwen1.5-7B | 918.189 |
nq/2643 | Qwen1.5-7B | 918.189 |
nq/100 | stablelm-3b-4e1t | 884.271 |
nq/945 | stablelm-3b-4e1t | 884.271 |
nq/248 | stablelm-base-alpha-7b-v2 | 875.637 |
nq/1905 | Qwen1.5-4B | 848.905 |
nq/406 | Qwen1.5-4B | 848.905 |
nq/1821 | gemma-2b | 827.521 |
nq/2443 | pythia-12b-deduped-v0 | 734.457 |
nq/1632 | pythia-12b-deduped-v0 | 734.457 |
nq/1406 | pythia-12b-deduped-v0 | 734.457 |
nq/2315 | pythia-12b-deduped-v0 | 734.457 |
nq/2417 | Qwen1.5-1.8B | 725.581 |
nq/2229 | Qwen1.5-1.8B | 725.581 |
nq/1437 | Qwen1.5-1.8B | 725.581 |
nq/3590 | pythia-6.9b-deduped-v0 | 704.104 |
nq/2979 | pythia-6.9b-deduped-v0 | 704.104 |
nq/1466 | pythia-6.9b-deduped-v0 | 704.104 |
nq/674 | pythia-2.8b-deduped | 610.768 |
nq/502 | pythia-2.8b-deduped | 610.768 |
nq/2590 | pythia-2.8b-deduped | 610.768 |
nq/3543 | Qwen1.5-0.5B | 588.441 |
nq/2456 | pythia-1b-deduped | 534.710 |
nq/2350 | pythia-1b-deduped | 534.710 |
nq/2901 | pythia-1b-deduped | 534.710 |
nq/452 | pythia-1b-deduped | 534.710 |
These are 10 problems with the lowest correlation with the overall evaluation (i.e. better models tend to do worse on these. )
example_link | acc | tau |
---|---|---|
nq/789 | 0.306 | -0.473 |
nq/2610 | 0.278 | -0.430 |
nq/1803 | 0.222 | -0.426 |
nq/2616 | 0.222 | -0.426 |
nq/3529 | 0.417 | -0.402 |
nq/860 | 0.111 | -0.401 |
nq/289 | 0.083 | -0.396 |
nq/553 | 0.278 | -0.385 |
nq/1766 | 0.083 | -0.340 |
nq/3315 | 0.083 | -0.324 |
Histogram of problems by the accuracy on each problem.
Histogram of problems by the minimum Elo to solve each problem.