🔎 CRUXEval Sample Explorer 🔎

CRUXEval is a benchmark complementary to HumanEval and MBPP measuring code reasoning, understanding, and execution capabilities!

Homepage Paper Code HF Dataset Leaderboard

model: temp:

CRUXEval-I

CRUXEval-O

The site is inspired by and based on Minerva sample explorer and Llemma. Thanks!