🔎 CRUXEval Sample Explorer 🔎

CRUXEval is a benchmark complementary to HumanEval and MBPP measuring code reasoning, understanding, and execution capabilities!

Homepage Paper Code HF Dataset Leaderboard

model: temp:

def f(num):
    initial = [1]
    total = initial
    for _ in range(num):
        total = [1] + [x+y for x, y in zip(total, total[1:])]
        initial.append(total[-1])
    return sum(initial)
assert f(3) == 4

CRUXEval-I assert f(??) == 4

CRUXEval-O assert f(3) == ??

The site is inspired by and based on Minerva sample explorer and Llemma. Thanks!