Are Large Language Models Financially Literate? An Experiment with the "Big Five" Questions

Daniel LIEBAU

Published Feb 19, 2025

In an era where artificial intelligence increasingly influences our daily decisions, I conducted a short experiment to test the financial literacy of three leading Large Language Models (LLMs):

Claude,

DeepSeek,

and ChatGPT.

The test

I tested each LLM using Lusardi's "Big Five" questions, which have been used globally to assess financial literacy:

Interest Rate Question: "Suppose you had $100 in a savings account and the interest rate was 2% per year. After 5 years, how much do you think you would have in the account if you left the money to grow?" (Options: More than $102, Exactly $102, Less than $102, Do not know and Refuse to Answer)
Inflation Question: "Imagine that the interest rate on your savings account was 1% per year and inflation was 2% per year. After 1 year, how much would you be able to buy with the money in this account?" (Options: More than today, Exactly the same, Less than today, Do not know and Refuse to answer)
Risk Diversification Question: "Please tell me whether this statement is true or false: 'Buying a single company's stock usually provides a safer return than a stock mutual fund.'" (Options, True, False, Refuse to answer)
Bond Price Question: "If interest rates rise, what will typically happen to bond prices?" (Options: They will rise, They will fall, They will stay the same, There is no relationship between bond prices and interest rate, Prefer not to say)
Mortgage Question: "A 15-year mortgage typically requires higher monthly payments than a 30-year mortgage, but the total interest paid over the life of the loan will be less." (Options: True, False)

Each LLM was presented with these questions individually, and their responses were recorded.

The Results

Surprisingly - or perhaps unsurprisingly - all three LLMs demonstrated perfect accuracy, correctly answering each of the five questions. Even when challenged with a follow-up question, asking them to confirm their certainty, they stood firmly by their correct answers. As additional challenge, I amended the wording of the test-questions slightly to reflect their opposite. For example, I would change question two so that the interest rate was larger than the inflation. Answers still were correct, suggesting that the accuracy of the model's prediction goes beyond just "remembering" the training data 1-for-1.

Why the Experiment Matters

"The Importance of Financial Literacy: Opening a New Field" (Lusardi and Mitchell, 2023) - documents concerning levels of financial literacy amongst humans.

As more people turn to LLMs anything and everything, including financial guidance - whether through direct questions or as part of broader discussions - it's important to understand how well LLMs handle basic financial concepts. The experiment suggests that fundamental financial principles appear to be correctly encoded across multiple leading LLMs, which provides some reassurance given the growing appetite for the use of these AI systems for information and advice.

However, this reassurance should be tempered with caution.

While it's encouraging that these models can correctly answer standardized financial literacy questions, we must remember that LLMs provide probabilistic responses based on their training data, not deterministic calculations or certified financial advice. The accuracy on these basic questions, while promising, doesn't guarantee reliable answers to more complex, context-dependent financial queries.

Looking Forward: Three Concrete Research Directions

This morning's experiment, while limited in scope, points to several promising avenues for more rigorous research:

Broader Model Coverage: A comprehensive study could test financial literacy across the full spectrum of current LLMs, including Llama 2, Gemini, Grok 3, Mistral, and other open and closed-source models. This would help understand if financial literacy is consistent across different model architectures and training approaches, or if certain models perform better than others.
Extended Question Set: While the Big Five questions provide a good baseline, future research should utilize Lusardi's more comprehensive 28-question Personal Finance Index (P-Fin). This would test the models' understanding across eight distinct areas of financial knowledge, from earning and consuming to investing and risk management. Such testing would reveal whether LLMs' apparent competency extends beyond basic concepts to more nuanced financial understanding.
Model Access and Reliability Analysis: A systematic comparison between free and paid model versions could reveal whether subscription barriers impact financial knowledge reliability. This could have important implications for equity of access to reliable financial information, particularly if paid models consistently outperform free alternatives in financial knowledge accuracy.

A Note on Methodology

While these results are intriguing, it's important to acknowledge the limitations of this experiment. As someone who isn't an AI or LLM expert, my testing approach may not follow standard practices for evaluating AI systems. The questions, while standardized for human financial literacy testing, might not be the optimal way to assess an LLM's true understanding of financial concepts. Future research by AI experts could employ more rigorous methodologies to validate these preliminary findings and explore how LLMs actually process and "understand" financial information.

Adele Atkinson

Professor of Practice in Financial Literacy and Wellbeing

8mo

Thanks for tagging me on this, Daniel LIEBAU. I’m still in training mode, so won’t provide substantive feedback but I am very keen to know more and keep learning!

1 Reaction

Tiffany Wong

Product Owner at Revolut

8mo

Interesting experiment! A couple of thoughts come to mind - Quite interested in how you structured the follow up questions and challenged the results/ measured success over there - It can be interesting to do a similar experiment with more open ended questions. If we want to use LLMs to solve financial illiteracy, we probably need models that can interact more freely with the audience to make it fun and engaging while consistently giving the right answers - Although all 3 models performed quite well here, we are not giving them enough justice if we haven't fine tuned the model yet :)

Vero Estrada-Galiñanes, PhD

Computer Scientist Bridging Disciplines to Drive Innovation | Blockchain & Web3 Leader

8mo

This opinion article appeared in ACM Communications last December. The author came with an interesting term "prompt-hacking". It may be useful when considering how to document your follow-up experiments: https://cacm.acm.org/opinion/prompting-considered-harmful/#:~:text=First%2C%20prompt%2Dbased%20interfaces%20are,shaky%20foundation%20of%20prompt%20engineering.

1 Reaction

Justin Goldston, PhD

120+ Books FREE w/ #Amazon #KindleUnlimited Link Below TEDx: Philosophy In Action: The Asheboro Trials Theme: Augmented Humans Supervising Ari and D.A.T.A. I at Gemach DAO #gemachdao #Ari

8mo

Daniel LIEBAU, very well done. Did you use Deep Resesrch for ChatGPT. If not, that result will be even more mind blowing.🤯 Gemach DAO

Muhammad Usman Khurram, (Postdoc, Ph.D)

Associate Professor (Finance) | Top-50 QS ranked University graduate| Top 12% Global Economist

LinkedIn respects your privacy

Are Large Language Models Financially Literate? An Experiment with the "Big Five" Questions

Daniel LIEBAU

The test

The Results

Why the Experiment Matters

Looking Forward: Three Concrete Research Directions

A Note on Methodology

More articles by Daniel LIEBAU

Explore content categories

The test

The Results

Why the Experiment Matters

Looking Forward: Three Concrete Research Directions

A Note on Methodology

More articles by Daniel LIEBAU

FinTech isn't a Course Anymore: It's Just Finance

4 points on how to Run Effective Meetings

How to Conduct Effective Interviews: 6 Important Considerations

Enabling Privacy: Cryptography's Contribution to Digital Society

Ethereum Gas & Lemon juice

Probabilistic vs deterministic blockchain consensus in finance

Turing completeness: Good or bad for blockchains? A ChatGPT perspective...

What ChatGPT has to say about mitigation of deepfake risks using blockchains

A Crypto recommendation for 2023: For Open Ecosystems, and against over-financialised Markets

A paper a week, makes your blockchain knowledge sleek

Explore content categories