Reasoning and problem-solving metrics
Evaluating an LLM’s ability to reason and solve problems is crucial for many applications. Let’s look at some key benchmarks in this area.
AI2 Reasoning Challenge
AI2 Reasoning Challenge (ARC) is designed to test grade-school-level science questions that require reasoning. See also: https://huggingface.co/datasets/allenai/ai2_arc
Here is an example of an ARC question:
One year, the oak trees in a park began producing more acorns than usual. The next year, the population of chipmunks in the park also increased. Which best explains why there were more chipmunks the next year?
- Shady areas increased
- Food sources increased
- Oxygen levels increased
- Available water increased
Correct answer: B. Food sources increased
This question requires the student to reason about the relationship between the increase in acorns (a food source for chipmunks) and the subsequent rise in the chipmunk population, rather...