Generative AI alone may not be enough: Evaluating AI Support for Learning Mathematical Proof

Chen, Eason; Judicke, Sophia; Beigh, Kayla; Tang, Xinyi; Xiao, Zimo; Li, Chuangji; Li, Shizhuo; Luttmer, Reed; Singh, Shreya; Yampolsky, Maria; Parikh, Naman; Zhao, Yi; Chen, Meiyi; Huang, Scarlett; Mohanty, Anishka; Johnson, Gregory; Mackey, John; Lin, Jionghao; Koedinger, Ken

Computer Science > Human-Computer Interaction

arXiv:2509.16778 (cs)

[Submitted on 20 Sep 2025]

Title:Generative AI alone may not be enough: Evaluating AI Support for Learning Mathematical Proof

Authors:Eason Chen, Sophia Judicke, Kayla Beigh, Xinyi Tang, Zimo Xiao, Chuangji Li, Shizhuo Li, Reed Luttmer, Shreya Singh, Maria Yampolsky, Naman Parikh, Yi Zhao, Meiyi Chen, Scarlett Huang, Anishka Mohanty, Gregory Johnson, John Mackey, Jionghao Lin, Ken Koedinger

View PDF HTML (experimental)

Abstract:We evaluate the effectiveness of LLM-Tutor, a large language model (LLM)-powered tutoring system that combines an AI-based proof-review tutor for real-time feedback on proof-writing and a chatbot for mathematics-related queries. Our experiment, involving 148 students, demonstrated that the use of LLM-Tutor significantly improved homework performance compared to a control group without access to the system. However, its impact on exam performance and time spent on tasks was found to be insignificant. Mediation analysis revealed that students with lower self-efficacy tended to use the chatbot more frequently, which partially contributed to lower midterm scores. Furthermore, students with lower self-efficacy were more likely to engage frequently with the proof-review-AI-tutor, a usage pattern that positively contributed to higher final exam scores. Interviews with 19 students highlighted the accessibility of LLM-Tutor and its effectiveness in addressing learning needs, while also revealing limitations and concerns regarding potential over-reliance on the tool. Our results suggest that generative AI alone like chatbot may not suffice for comprehensive learning support, underscoring the need for iterative design improvements with learning sciences principles with generative AI educational tools like LLM-Tutor.

Subjects:	Human-Computer Interaction (cs.HC)
Cite as:	arXiv:2509.16778 [cs.HC]
	(or arXiv:2509.16778v1 [cs.HC] for this version)
	https://doi.org/10.48550/arXiv.2509.16778

Submission history

From: Eason Chen [view email]
[v1] Sat, 20 Sep 2025 18:57:46 UTC (4,362 KB)

Computer Science > Human-Computer Interaction

Title:Generative AI alone may not be enough: Evaluating AI Support for Learning Mathematical Proof

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Human-Computer Interaction

Title:Generative AI alone may not be enough: Evaluating AI Support for Learning Mathematical Proof

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators