The Hypothesis Trap
credit DALL-e

The Hypothesis Trap

When Our Models Are Designed to Confirm, Not Discover


This article was born from a series of conversations, debates, and open-ended reflections that emerged on professional networks like LinkedIn. What started as a curiosity, why so many chronic disease drugs fail in humans, has snowballed into a deeper realization: the very tools we use to study disease may be leading us astray.

What surprised me most was not just that the models were flawed, but that the models are designed to confirm the hypotheses they’re supposed to test. This wasn’t something I fully appreciated in the beginning. But through thoughtful commentary and scientific dialogue with peers, I’ve come to see this problem with fresh eyes. If you're just entering the world of biomedical research, my goal is to help you build that critical lens early, not to be cynical, but to be rigorous and honest about the tools we use.

“Artificially driving amyloid pathology in transgenic mice hardwires the hypothesis into the model, making it difficult to determine whether observed phenomena are intrinsic to disease or artifacts of the system.”

Let’s pause here. What does this mean for someone just entering biomedical research or clinical science? Picture this: You believe beta-amyloid causes Alzheimer’s disease. So you build a mouse that produces an abnormal amount of beta-amyloid, and then, surprise! The mouse gets symptoms that look like Alzheimer’s. But did you discover a cause, or just force a symptom to appear by building it into the system? This is like creating a robot that falls over when pushed, then proving your theory that falls are caused by pushes, when you built the robot to fall that way in the first place. The problem is not in the model being wrong; it’s in the model being designed to affirm your belief. That is what we mean by “hardwiring the hypothesis into the model" baking the answer into the experiment before it’s even run.

In biomedical science, there is a growing crisis, one not of technology or resources, but of epistemology. As researchers chase the elusive causes and cures of chronic diseases, a critical methodological flaw has become increasingly evident: we are hardwiring our hypotheses into the very models designed to test them.

“Epistemology” is a big word, but it simply means: how do we know what we know? It’s not about having better tools, we already have those. The problem is how we use them. In trying to cure chronic diseases, we often build models that don't ask open-ended questions but instead confirm what we already assume. For example, if we think inflammation causes diabetes, we might build a mouse that has inflammation and then test drugs that reduce it. But if inflammation isn’t the real driver in humans, we’ve just wasted time and resources chasing our own assumptions.

This is not a small problem. It is a foundational distortion, one that distills a single, often reductionist theory of disease into genetically or chemically engineered animal models, and then uses those models to validate the very theory that created them. The result is a powerful illusion of reproducibility, one that produces consistent data across laboratories but may have little to do with the biological complexity of human disease.

In early science education, we learn that reproducibility or getting the same result in different experiments is a good thing. And it is, in many contexts. But imagine all the labs are using the same flawed recipe. They all add the same artificial ingredients, and of course, they bake the same cake. It looks like strong science, but it's built on a false foundation. That’s what’s happening when we engineer animals to reflect only one possible cause of a disease, we may get neat, consistent results, but they might not mean anything for actual human patients.

This article is a call to re-examine the logic of disease modeling, particularly in mice. It is not simply a critique of animal research; it is a critique of circular reasoning masquerading as discovery.

You’ll hear often in school or research labs that “mouse models are standard.” And they are. But what this article challenges is how those models are created and what they’re really telling us. It's not that using animals is inherently flawed, but that when we design them to prove us right, we’re no longer doing discovery. We’re just reinforcing the same loop of logic over and over and mistaking that loop for progress.

From Exploration to Confirmation: When Models Become Echo Chambers

At the core of good science lies the principle of falsifiability: a hypothesis must be testable, and it must be possible to prove it wrong. But when we engineer models that express a particular disease phenotype based on a presupposed mechanism, we create a research tool that confirms its own premises.

Let’s simplify this: real science means asking a question that could turn out to be wrong. But when you design a model that makes a certain outcome inevitable, there’s no way to fail so there’s no real test. If you think stress causes ulcers and you create a rat that gets ulcers every time it’s stressed, you didn’t discover anything, you built a creature that follows your script. That’s not falsifiability, it’s programming.

Take the Alzheimer’s amyloid hypothesis as a case in point. For decades, transgenic mouse models overexpressing mutant human amyloid precursor protein (APP) have formed the bedrock of Alzheimer's research. These animals reliably develop beta-amyloid plaques and are used to test interventions aimed at reducing them. When a drug clears plaques and improves mouse cognition, it’s declared a success.

This is probably the most famous example of this trap. Scientists created mice that are genetically engineered to develop Alzheimer-like plaques. If a drug reduces those plaques in the mouse and they perform better on maze tests, it’s seen as evidence that the drug works. But this doesn’t tell us if those plaques cause Alzheimer’s in humans. It only tells us that removing plaques helps the specific type of brain damage that we engineered into the mouse.

But what is being tested? The hypothesis or the fidelity of a model constructed to enact it?

This question is the intellectual heart of the argument. Are we studying Alzheimer’s disease or are we studying the system we built to reflect our idea of it? If the answer was built into the model, we’re not testing the hypothesis, we’re just watching it play out in a closed system.

In such models, the disease is not discovered, it is implanted. The phenotype is not emergent, it is programmed. And the outcome is not independent validation, it is mechanistic echo. This is not oversight. This is the core methodological flaw. We are studying the products of our hypotheses, not the diseases they are meant to explain.

When you see a result in a mouse model, ask yourself: did that result emerge naturally? Or was it something we inserted by design? It's the difference between seeing lightning in nature and creating sparks in a lab, they might look similar, but one tells you about the real world, the other tells you about your equipment.

The Comfort of Reproducibility, the Cost of Relevance

One of the major appeals of hardwired models is reproducibility. If several labs use the same transgenic mouse line, and each reports consistent plaque formation and similar behavioral changes, this creates the illusion of robust, externally validated science.

When you’re young in science, it’s easy to equate repeated results with reliable truth. But if everyone is working with the same flawed template, they’ll all get the same answer even if it’s wrong. It's like testing the same math equation with the same inputs over and over. You'll get the same result but what if the equation doesn't describe reality?

But in truth, this is internal consistency, not clinical relevance. It’s a closed loop of confirmation. You can reproduce results a hundred times in a model that never had biological fidelity to begin with.

What matters isn’t whether your model produces the same results, it’s whether those results actually translate to people. We’re building tight, elegant systems that can’t fail, but they also can't reflect the chaos and complexity of the real world.

This is not just a philosophical quibble. The real-world consequences are dire. Consider: Dozens of anti-amyloid drugs developed using transgenic mouse models have failed in Phase III trials. Sepsis therapies targeting TNF-alpha succeeded in mice, based on models engineered to amplify that inflammatory pathway, but failed repeatedly in humans. Metabolic interventions that reverse obesity or diabetes in genetically altered rodents often collapse under the weight of human complexity, where lifestyle, environment, and comorbidities play essential roles.

This isn’t an abstract theory. Real patients are affected when drugs are built on misleading data. Years and millions of dollars go into testing Alzheimer’s drugs in humans, only to see them fail because the foundational logic was flawed. If a drug works in a mouse engineered for one problem, but that problem isn’t the real one in humans, the drug will always fail.

In each of these cases, the underlying issue is not that the mouse is “different from a human.” That’s trivial. The issue is that the hypothesis was wired into the animal model before it was ever observed in people.

You’ll often hear people say, “Well, mice aren’t humans.” That misses the point. The real problem is deeper: we’ve designed models to confirm our guesses, not to challenge them. So the models are already biased before the first experiment is even run.

A Thought Experiment: Reverse the Logic

Imagine a world where we start not with a hypothesis, but with real-world, longitudinal data from human patients. From these data, we observe unexpected correlations, patterns that suggest mechanisms we do not yet understand. Then, and only then, do we seek to test these mechanisms experimentally.

What if we flipped our process? What if we watched patients first, really studied them, tracked them, listened to them and let the data tell us what might be going on? Then we could design models to test those patterns. That’s what discovery looks like.

In contrast, most of today’s disease models do the reverse. They start with a mechanistic belief e.g., "amyloid causes Alzheimer’s," or "TNF drives sepsis" and then design an animal system in which that belief is made to be true.

This is backwards science. It’s like assuming someone’s guilty and then building a courtroom that only accepts evidence that supports their guilt. We’re skipping the part where we test whether the hypothesis was even valid to begin with.

What follows is a cascade of funding, trials, and publications reinforcing the same logic. Failures in human trials are often met not with re-evaluation of the model, but with tweaks to dosing, endpoints, or patient selection.
But what if the model was wrong from the start not in its mechanics, but in its epistemic integrity?

We keep adjusting the output when we should be questioning the input. If a drug fails in people but worked in mice, maybe we need to ask: did we ever have the right disease in our model? Did we hardwire the wrong thing?

Examples Beyond Alzheimer’s: A Broader Pattern of Hypothesis Hardwiring

1. Parkinson’s Disease and the Alpha-Synuclein Story Parkinson’s models often rely on overexpression of mutant alpha-synuclein to replicate Lewy body pathology. These mice exhibit motor deficits and neuronal degeneration but again, the pathology is pre-imposed, not spontaneously arising. The model is valuable for studying the consequences of synuclein aggregation, but it cannot tell us whether synuclein is the cause, consequence, or just a correlate in human disease.

In human Parkinson’s, we observe clumps of a protein called alpha-synuclein in the brain. But we don’t actually know if they’re the cause of the disease or just a side effect. So, researchers make mice that overproduce this protein and then test drugs to reduce it. But if the protein buildup isn’t actually the cause in humans, we’re treating a shadow not the fire. We’re again hardwiring an assumption into a model and calling it evidence.

2. Autoimmune Diseases and the EAE Model in Multiple Sclerosis Experimental autoimmune encephalomyelitis (EAE) is induced in mice using myelin peptides and immune adjuvants. It’s widely used as a model for MS. But EAE is an acute, monophasic inflammatory disease, while MS in humans is chronic, multifocal, and often driven by environmental and genetic interplay. Again, the model embodies a specific immunologic hypothesis, T-cell mediated demyelination and tests only interventions that align with it.

Multiple Sclerosis (MS) in humans is complex and unfolds over years with ups and downs. In mice, researchers inject chemicals to trigger a one-time inflammatory attack on nerves. That’s not MS, that’s a short, forced version of what we think MS might be. It’s useful for learning certain mechanisms, but it can’t reflect the full, lived experience of the disease. If you only test therapies that treat what you forced into the model, you’re not testing treatments, you’re testing your design.

3. Type 2 Diabetes: The Obese Mouse That’s Not Like Us Ob/ob and db/db mice, lacking leptin signaling, develop obesity and insulin resistance. They’ve been instrumental in studying metabolic syndrome. But human obesity is rarely due to leptin deficiency; it’s a polygenic, environmentally modulated state. Using leptin-deficient mice to model lifestyle-driven diabetes in humans hardwires a hormonal hypothesis that may not apply to 95% of patients.

Imagine building a diabetes model using mice that are genetically incapable of producing a key hormone, leptin. Yes, these mice get fat and insulin resistant. But most human patients do have leptin. Their diabetes is due to diet, stress, genetics, and environment. So if a drug “fixes” the mouse’s diabetes by restoring leptin signaling, what does that tell us about humans? Maybe nothing. This is a perfect example of a beautifully functioning model that doesn’t map to reality.

Why Does This Happen? The Structural Pressures of Modern Science

This methodological pattern is not born of ignorance. It’s a structural consequence of modern science:

This isn’t a conspiracy or laziness it’s the way the system is set up. To get research funding, you often have to tell a simple story: “Here’s the cause of this disease, here’s a model that reflects it, and here’s how we’ll treat it.” The simpler and neater that story is, the more likely you’ll get published or funded. But human disease isn’t neat. And when we build science to favor clean answers, we lose the messy, complicated truths that matter.

And so, the scientific ecosystem encourages models that are mechanistically tight but epistemologically circular. The more elegant the model, the more seductive the data and the harder it becomes to see that we are testing what we designed, not what nature reveals.

As a young scientist, you’ll be taught to admire beautiful models, systems where everything connects, where inputs and outputs behave just right. But beware: those elegant models can blind you. They may just be confirming what you assumed at the beginning. If you’ve ever seen a puzzle that looks too perfect, ask yourself: did nature really build this, or did we?

Where Do We Go From Here? Toward Open-Ended Models

If hardwiring the hypothesis into the model is the problem, the solution lies in resisting that temptation. We need models of disease that allow hypotheses to emerge, not be enforced. Some steps toward that include:

So what do we do instead? Build models that ask open-ended questions. That don’t assume the cause ahead of time. That leave room for failure, surprise, contradiction, all the things that real science thrives on. Below are four ways we might move in that direction.

1. Develop Phenomenological Models Use human data, clinical, genomic, longitudinal, to build models that replicate observed disease trajectories, not presumed mechanisms. Let the model be messy if that’s what the disease is.

Instead of forcing mice to match our guesses, we should let patient data guide the way. What if we took thousands of human cases and asked: what actually happens over time? What patterns do we see? Then we build systems, biological or computational, that reflect that complexity. Messy data isn’t a problem. It’s reality. Let the models be messy, too.

2. Prioritize Reverse Translation When human trials fail, reverse translate the findings. Ask: What did the model miss? What features of human biology were invisible to the preclinical system?

This is about learning from failure. When a drug doesn’t work in people, we shouldn’t blame the trial. We should look at the model and ask: what did we overlook? Maybe the mouse was never the right test case. Instead of pushing forward blindly, we go back, revisit our assumptions, and rethink the path.

3. Diversify Model Systems Avoid putting all theoretical eggs in one model basket. Use multiple species, organoids, computational simulations, and patient-derived data to test competing explanations. No single model can explain everything. Just like no one patient defines a disease. We need a variety of tools, cell cultures, human-derived tissues, machine learning, observational studies and we need to compare what they tell us. If all your models agree, be suspicious. They might all share the same bias.


4. Embrace Negative Data Failures are not embarrassments; they are signals. If a drug works in mice and not in humans, the model may be too specific, not too simple.

In research, negative results are often ignored or buried. But they're full of insight. If something doesn’t work especially when it was supposed to ask why. That’s where the real learning begins. Maybe your model only works because it was custom-built to do so. That’s not a failure of the drug, it’s a failure of the model to reflect reality.

Conclusion: Models Should Be Maps, Not Mirrors

One of my mentors at Yale, the late Dr. Alvin Feinstein championed clinical epidemiology because he saw the gap between biologic theory and human reality. He warned us against the seduction of internal validity at the expense of external truth. Today, the "hardwiring of hypotheses into models" is perhaps the most glaring example of that seduction.

Feinstein was ahead of his time. He knew that just because a study is precise and controlled doesn’t mean it reflects the real world. He fought for relevance for methods and models that serve the patient, not the theory. That’s what we’re calling for here too.

In our desire for clean causality, we’ve built models that are too perfect, too precise, too aligned with our initial assumptions. And in doing so, we’ve created a kind of scientific solipsism: we see only what we have made visible.

It’s like drawing a map and then walking in circles inside it, convinced you’ve explored the world. We’ve created models that show us only what we believed to begin with. It feels like knowledge but it’s just confirmation.

It is time to ask: are we modeling disease, or modeling our beliefs about disease?
Until we are willing to disentangle hypotheses from the systems designed to test them, we risk continuing to build elegant machines that answer the wrong questions.

That’s the final question. Are we really chasing the truth about human disease, or just reinforcing the stories we find most convenient? If you’re starting your career in medicine or research, this is the challenge: question the systems. Don’t just aim to be right, aim to find what’s real, even if it surprises you.

Aim to find what is real.

Not what fits.

Not what repeats.

What is real.




© [Arina Cadariu] [2025]. All rights reserved. This article is part of The Science of Letting Go—a personal and educational project exploring the intersections of biology, genetics, epigenetics, clinical medicine, epidemiology, and the ethics of scientific communication. All views expressed are solely those of the author and do not represent the views of any employer, institution, or affiliated entity.

This work is protected under international copyright law. No part of this publication may be reproduced, excerpted, copied, or adapted—whether in full or in part—without prior written permission from the author. Unauthorized use, including commercial or institutional repurposing by clinics, wellness providers, or longevity brands, is expressly prohibited.

This work has been digitally archived and timestamped to confirm original authorship and publication date.

Simon Lowe

🔬 Science Writer & PKPD Specialist

6mo

I'm inclined to agree, especially where what we want from predictive modelling in MIDD is to inform on the efficacy and safety of clinical trials before they happen. What I want to know is is the industry also open to probabilistic algorithms determining the frequency of dose from a given pathogen. E.g. say you already have a good PKPD model for predicting liver stage parasitemia (Malaria), but want to expand this to cover the frequency of bites by factoring in the vector conditions, such as using insecticides and mosquito nets. That way, you could not only bring things closer to a field perspective, but also narrow down chemoprevention candidates within the context of how they'll actually be used in specific environments.

Janneke Hogervorst

Postdoctoral researcher - views are my own

6mo

A compelling narrative on why we need to move away from animal "models" of human disease now, if we are serious about our health and not so bent on allowing scientists to keep doing what they've always been doing under the premise of freedom of science.

Steven Rauchman, MD

Medical Expert Witness | DUI & Nystagmus Specialist | Traumatic Brain Injury (TBI) Consultant | Ophthalmologist & Surgeon | Principal Investigator

6mo

Arian . These are very insightful comments and the post is well written . Many aspects of brain research make these same inherent assumptions in research on mice . My concerns are slightly more general . In order to do many single neuron measurements in a mouse the head of the mouse must be fixated . I understand that might be necessary for an accurate measurement . However the very act of head fixation modifies the entire model . Researches claim they habituate the mouse to head fixation to minimize the stress introduced into the mouse . I get it , I’m told that’s the best we can do . Then the experimenter introduces a simple stimulant and makes generalizations on activity in localized regions of the brain . But a real mouse engages freely with its external environment and is confronted with multiple stimuli simultaneously . Even the theoretical islolation of anatomic regions presupposes a fundamental knowledge which may be incorrect . Structure and function in the brain is incredibly complex . And in the human brain ever more complex . Researchers also treat the brain as isolated from the body . In humans we now know the gut microbiome interacts with the brain constantly . The confounding variables introduced is a big problem .

Allison Reiss

Associate Professor at NYU Grossman Long Island School of Medicine

6mo

To view or add a comment, sign in

More articles by Arina Cadariu MD MPH

Others also viewed

Explore content categories