HRM: A 27M Parameter Model That Outperforms Claude and Gemini

3mo

🚨 A 27 million parameter model just outperformed Claude 3.5 and Gemini on hard reasoning tasks. No chain-of-thought. No massive pretraining. No hallucinations. Meet HRM – Hierarchical Reasoning Model, a brain-inspired AI that might be our first real glimpse at AGI. Built by Sapient Intelligence (a Singapore startup founded by a Gen-Z Tsinghua prodigy + ex-DeepMind researchers), HRM doesn’t use tokens like traditional LLMs. It doesn’t “think out loud” by predicting the next word. Instead, it thinks internally. Like a human brain. - One module makes quick decisions - Another refines strategies over time And they loop—just like real cognition. Results? – 40.3% on ARC-AGI-1 (vs Claude’s 21.2%) – 100% on Extreme Sudoku – 100% on Maze-Hard All with just 1000 training examples and zero pretraining. This is called Chain-in-Representation which is a shift from CoT (Chain-of-Thought). No prompt hacks. No brute force. Just smart, recursive internal reasoning. Why it matters: 🔹 Structure, not size, might be the key to AGI. 🔹 HRM’s architecture spontaneously mirrors brain patterns seen in neuroscience. 🔹 It adapts—taking longer on complex tasks, faster on simple ones. In an era obsessed with trillion-parameter scaling and GPU burn, HRM is a wake-up call. A 27M parameter model… …trained on 1000 examples… …just beat some of the most expensive models ever built. OpenAI, Google, Anthropic: are you watching? We may have just seen the first crack in the Transformer empire.

18 Comments

Ahsan Umar

AI/ML Engineer & Researcher | (GPU Poor) LLMs, NLP & Computer Vision | Applied AI & Innovating with Open Source

3mo

Any links to official paper or technical reports.

2 Reactions

Trevor Osborne

3mo

This is what happens when you read headlines and not the actual paper. The G in AGI for this model stands for “gamified” not “general”: For ARC-AGI challenge, we start with all input-output example pairs in the training and the evaluation sets. The dataset is augmented by applying translations, rotations, flips, and color permutations to the puzzles. Each task example is prepended with a learnable special token that represents the puzzle it belongs to.

5 Reactions

Christiaan van der Walt, PhD

Head of Artificial Intelligence

3mo

Very exciting!

2 Reactions

Patrick Morris-Suzuki

Senior Staff Software Engineer (Tech Lead Manager) at Google

3mo

"cracks in the transformer empire" Except it's actually a transformer architecture... The "recurrent" part of the model involved recurrently calling transformer models.

2 Reactions

Chai Toh (he/him/his)

Tech Executive, CEO Safe AI Foundation

3mo

HRM sounds promising and exciting. People should compare it with existing approaches to explore pros, cons, and limits. Who else has implemented and used HRM ?? CoT is one way. There should be several ways to realize reasoning.

1 Reaction

Geoff Gibbins

Human-AI Strategy | Innovation | Author

3mo

Thanks Mark Minevich - useful takeaways. Do you have a sense of whether OpenAI, Google and Anthropic are watching, or doubling down on scale?

1 Reaction

Roi Krakovski

Co-Founder & CEO at Usearch | Ph.D. | AI-driven intelligence

3mo

What AGI? 😂

1 Reaction

James Babcock

Founder & CEO, Ori AI Technologies

3mo

No doubt this is a big leap for reasoning-based models. But let’s not forget, cognition alone isn’t intelligence. The next wave isn’t just logic loops. It’s emotional alignment. We built Ori to do what models like HRM can’t: Understand emotional tone Adapt to human stress Align with mental health needs in real time This is a solid step for the mind. But the soul still needs a voice.

1 Reaction

See more comments

To view or add a comment, sign in

More Relevant Posts

Arindam Dasgupta

AI Architect @ IBM | Driving Agentic AI & Cloud Solutions in Banking | AWS DevOps | NIT Alumni | 2× Univ. Gold Medalist
3w
Report this post
𝐅𝐢𝐧𝐞-𝐓𝐮𝐧𝐢𝐧𝐠 𝐭𝐡𝐞 𝐌𝐞𝐦𝐨𝐫𝐲: 𝐓𝐡𝐞 𝐍𝐞𝐱𝐭 𝐋𝐞𝐚𝐩 𝐢𝐧 𝐀𝐠𝐞𝐧𝐭𝐢𝐜 𝐈𝐧𝐭𝐞𝐥𝐥𝐢𝐠𝐞𝐧𝐜𝐞: Most “intelligent” agents collapse because their memory drifts. They keep everything. And when everything matters, nothing does. Real intelligence isn’t about longer context windows. It’s about smarter compression ; pruning noise, amplifying signal, and weighting relevance dynamically. In other words, we need to start fine-tuning the memory, not the model. The New Playbook 🔸 Memory Pruning → keep what affects outcomes 🔸 Context Reweighting → amplify recurrent signals 🔸 Embedding Recalibration → align semantic distance with feedback 🔸 Temporal Decay → let old context fade gracefully The result? a. Agents that learn what to forget. b. And in that forgetting, they become more intelligent. This is where the next leap in AI will come from; not bigger transformers, but better context transformers. 𝐖𝐡𝐲 𝐀𝐠𝐞𝐧𝐭𝐢𝐜 𝐌𝐞𝐦𝐨𝐫𝐲 𝐌𝐚𝐭𝐭𝐞𝐫𝐬: When everyone’s building “agents” with vector memory, the real differentiator isn’t how much you remember; it’s how well you share and align memory across time, tools, and agents. In agentic systems, memory isn’t a database; it’s an ecosystem. 🔸Short-Term Memory → working context that reacts instantly. Reflexive cache of goals, states, and sensory inputs; Equivalent to a cognitive scratchpad 🔸Long-Term Memory → structured, shared, and semantically rich Vector stores for similarity and recall. SQL/graph stores for factual and relational grounding. Episodic logs for experience-based learning. Shared memory fabric for multi-agent coordination These layers decide what persists, what decays, and what transfers; forming the foundation of shared and relevant context. We’ve mastered how to prompt models. The next mastery is how to remember; and remind each other. #ArtificialIntelligence #GenerativeAI #AgenticAI #AIAgents #LLM
Like Comment
To view or add a comment, sign in
Karl Sponholz Karl Sponholz is an Influencer

Chief Product and Technology Officer | LinkedIn Top Voice AI | Entrepreneur | Mentor
4w
Report this post
🧠 𝐎𝐟𝐟𝐥𝐨𝐚𝐝 𝐨𝐫 𝐃𝐢𝐞: 𝐖𝐡𝐲 𝐌𝐞𝐦𝐨𝐫𝐲 𝐈𝐬 𝐭𝐡𝐞 𝐑𝐞𝐚𝐥 𝐅𝐫𝐨𝐧𝐭𝐢𝐞𝐫 𝐢𝐧 𝐀𝐈 𝐀𝐠𝐞𝐧𝐭𝐬 Models can think, but they can’t remember. That’s their biggest strength… and their biggest flaw. Every agent lives inside a short-term memory bubble, once the window resets, the past disappears. That’s why the next leap in AI isn’t bigger models. It’s context engineering - and the first rule is simple: 👉 Offload everything that doesn’t belong in the next prompt. ⚙️ 𝐖𝐡𝐚𝐭 “𝐎𝐟𝐟𝐥𝐨𝐚𝐝𝐢𝐧𝐠” 𝐫𝐞𝐚𝐥𝐥𝐲 𝐦𝐞𝐚𝐧𝐬 Offloading is how agents build long-term cognition - by storing, recalling, and reasoning across time. It shows up in many forms: 🗒 Scratchpads → short-term working memory for multi-step reasoning 🔎 Vector stores / RAG systems → long-term knowledge recall 🧩 Structured state → machine-readable task and goal memory 📜 Event logs → persistent record for traceability and learning Each layer moves context outside the model - but keeps it reachable when needed. That’s how agents evolve from reactive chatbots to adaptive systems. 🧩 𝐖𝐡𝐲 𝐢𝐭 𝐦𝐚𝐭𝐭𝐞𝐫𝐬 Agents that don’t offload suffer from: • Prompt bloat: endless context stuffing • State drift: forgetting or hallucinating prior facts • Reasoning resets: repeating the same steps again and again 💡 The next era of AI won’t be defined by how models generate - It’ll be defined by how agents remember. In the end, every serious agent architecture faces the same constraint: not how much the model knows, but how well it remembers. Context isn’t a prompt problem - it’s an infrastructure problem. Solve that, and you don’t just scale your system. You scale its intelligence.
Like Comment
To view or add a comment, sign in
Ünver Çiftçi

AI research, engineering, and education
2w
Report this post
World Models are redefining how AI learns and imagines. From code-driven environments to symbolic reasoning and self-generated curricula — 2025 marks a turning point. Read the full story: World Models in 2025: From Code to Imagination: https://lnkd.in/dYAkTWNJ

World Models in 2025: From Code to Imagination medium.com
Like Comment
To view or add a comment, sign in
AI Univers(e)ity

3,930 followers
2w
Report this post
World Models are redefining how AI learns and imagines. From code-driven environments to symbolic reasoning and self-generated curricula — 2025 marks a turning point. Read the full story: World Models in 2025: From Code to Imagination: https://lnkd.in/dN4cp4NC

World Models in 2025: From Code to Imagination medium.com
Like Comment
To view or add a comment, sign in
Hao Hoang

AI Researcher & Engineer | Applied Mathematics
2w
Report this post
𝘞𝘩𝘢𝘵 𝘪𝘧 𝘧𝘪𝘯𝘦-𝘵𝘶𝘯𝘪𝘯𝘨 𝘓𝘓𝘔 𝘢𝘨𝘦𝘯𝘵𝘴 𝘪𝘴 𝘧𝘶𝘯𝘥𝘢𝘮𝘦𝘯𝘵𝘢𝘭𝘭𝘺 𝘪𝘯𝘦𝘧𝘧𝘪𝘤𝘪𝘦𝘯𝘵? 𝘞𝘦'𝘳𝘦 𝘵𝘰𝘭𝘥 𝘴𝘱𝘦𝘤𝘪𝘢𝘭𝘪𝘻𝘢𝘵𝘪𝘰𝘯 𝘳𝘦𝘲𝘶𝘪𝘳𝘦𝘴 𝘤𝘰𝘴𝘵𝘭𝘺 𝘱𝘢𝘳𝘢𝘮𝘦𝘵𝘦𝘳 𝘶𝘱𝘥𝘢𝘵𝘦𝘴, 𝘣𝘶𝘵 𝘯𝘦𝘸 𝘳𝘦𝘴𝘦𝘢𝘳𝘤𝘩 𝘴𝘶𝘨𝘨𝘦𝘴𝘵𝘴 𝘸𝘦 𝘤𝘢𝘯 𝘢𝘤𝘩𝘪𝘦𝘷𝘦 𝘣𝘦𝘵𝘵𝘦𝘳 𝘳𝘦𝘴𝘶𝘭𝘵𝘴 𝘸𝘪𝘵𝘩𝘰𝘶𝘵 𝘶𝘱𝘥𝘢𝘵𝘪𝘯𝘨 𝘢 𝘴𝘪𝘯𝘨𝘭𝘦 𝘱𝘢𝘳𝘢𝘮𝘦𝘵𝘦𝘳. Adapting LLMs for specialized domains (like tool use) is expensive, data-hungry, and often shatters their generalization capabilities. A new paper from the Tencent, "𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠-𝐅𝐫𝐞𝐞 𝐆𝐫𝐨𝐮𝐩 𝐑𝐞𝐥𝐚𝐭𝐢𝐯𝐞 𝐏𝐨𝐥𝐢𝐜𝐲 𝐎𝐩𝐭𝐢𝐦𝐢𝐳𝐚𝐭𝐢𝐨𝐧 (𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠-𝐅𝐫𝐞𝐞 𝐆𝐑𝐏𝐎)," directly attacks this problem. Instead of costly parameter-space RL, they propose RL in the context space. Here's the brilliant part: The method uses a frozen LLM to generate multiple rollouts, has the LLM itself introspect and determine a 'semantic advantage' (i.e., why one solution is better), and then distills this insight into an evolving 'experiential knowledge' library. This knowledge is fed back into the prompt as a token prior, guiding the agent's behavior. The results: Applied to the powerful DeepSeek-V3.1-Terminus model, this method: - Boosted performance on the AIME25 math benchmark by +5.4 points (67.9% → 73.3%). - Achieved this with only 100 training samples and a cost of ~$18. - In contrast, traditional RL fine-tuning on a smaller 32B model costs ~$10,000 and achieves significantly lower scores. This could end the costly trade-off between specialization and generalization. We could soon deploy a single, powerful base model that adapts to countless specialized tasks simply by loading different "experiential knowledge" files into its context - no retraining required. #AI #LLM #MachineLearning #AgenticAI #ReinforcementLearning #Research #Innovation

1 Comment
Like Comment
To view or add a comment, sign in
Bin 🗑️ Lab

105 followers
3w
Report this post
Attention Is All You Need But Is It All We Use? In 2017, eight researchers from Google Brain — Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser, and Polosukhin — published a quiet revolution: “Attention Is All You Need” (2017, NeurIPS) They replaced recurrence and convolutions with one elegant idea: attention — a way for models to focus on any relationship, anywhere, all at once. That single leap created the architecture behind GPT, Claude, Gemini, LLaMA — basically, modern AI itself. But eight years later, a new question echoes through labs and research halls: Have we truly used everything we’re capable of with Transformers or are we just scaling them higher, not smarter? Today, we know their strengths: - unmatched parallelism, - graceful scaling, - incredible pattern recognition. Yet, the concerns are real: Scaling limits compute and energy grow faster than performance. Reasoning gaps LLMs pattern-match brilliantly but often fail to understand or explain why. Memory and grounding Transformers attend, but they don’t remember; they see patterns, not the world itself. So the next leap may not come from trillions of parameters but from models that can reason, remember, and ground themselves in reality. At Bin 🗑️ Lab, we like to “recycle” old ideas into new possibilities — to rethink what’s been thrown away. Maybe “attention” was only the beginning. Maybe it’s time to build systems that not only attend… but comprehend. Transformers taught machines to pay attention. Now it’s on us to teach them to understand. Read more -> https://lnkd.in/eMk6P5qT #AI #Transformers #DeepLearning #MachineLearning #AttentionIsAllYouNeed #BinLab #AIResearch #LLMs #NeuralNetworks #Innovation
Like Comment
To view or add a comment, sign in
Marcel Gutsche

CTO & Co-Founder at rabbitAI | Building Ground Truth Systems for Computer Vision | Physics PhD
2w
Report this post
Why Today's "Reasoning Models" Don't Actually Reason Despite their impressive capabilities, today's LLMs don't truly reason. They predict tokens based on statistical patterns. Ask a model "What is 3+5?" and it excels. Why? It's seen this pattern countless times in training data. But present something less common, like solving "3^x = x^9", and the limitations emerge. (try it out yourself!) True reasoning requires manipulating abstract symbols according to mathematical principles, not just pattern matching. So what's Missing? Genuine reasoning needs a symbolic workspace: a place where concepts can be explicitly manipulated, tested, and transformed. Current models lack this fundamental architecture. I believe the breakthrough will come from two directions: 1. Neuro-symbolic hybrids that combine neural prediction with explicit rule systems 2. Visual grounding through computer vision, allowing AI to build common sense by experiencing the physical world directly The contextual richness of visual experience could be what moves us beyond statistical mimicry toward authentic reasoning. What's your take? Are we asking the wrong questions about AI reasoning, or are hybrid approaches the answer? #AI #LLM #AIReasoning #NeuralNetworks #ComputerVision
77 Comments
Like Comment
To view or add a comment, sign in
rabbitAI

963 followers
2w
Report this post
Visual grounding is essential. We provide 1mm-precision ground truth that gives AI systems what they need: real-world visual understanding. True reasoning requires physical context, and we're building the foundation for it.
Marcel Gutsche

CTO & Co-Founder at rabbitAI | Building Ground Truth Systems for Computer Vision | Physics PhD
2w

Why Today's "Reasoning Models" Don't Actually Reason Despite their impressive capabilities, today's LLMs don't truly reason. They predict tokens based on statistical patterns. Ask a model "What is 3+5?" and it excels. Why? It's seen this pattern countless times in training data. But present something less common, like solving "3^x = x^9", and the limitations emerge. (try it out yourself!) True reasoning requires manipulating abstract symbols according to mathematical principles, not just pattern matching. So what's Missing? Genuine reasoning needs a symbolic workspace: a place where concepts can be explicitly manipulated, tested, and transformed. Current models lack this fundamental architecture. I believe the breakthrough will come from two directions: 1. Neuro-symbolic hybrids that combine neural prediction with explicit rule systems 2. Visual grounding through computer vision, allowing AI to build common sense by experiencing the physical world directly The contextual richness of visual experience could be what moves us beyond statistical mimicry toward authentic reasoning. What's your take? Are we asking the wrong questions about AI reasoning, or are hybrid approaches the answer? #AI #LLM #AIReasoning #NeuralNetworks #ComputerVision
Like Comment
To view or add a comment, sign in
Yu Cao
1mo
Report this post
Thinking in Hidden Space: Why Tiny Recursion Might Be the Future of AI Reasoning What if the real power of reasoning doesn’t come from generating more tokens—but from thinking inside the hidden space? The Tiny Recursion Model (TRM) shows exactly that. With only a 7M-parameter network, it learns to “think–rethink” recursively in its latent space, refining both its internal reasoning state z and its prediction y step by step. Each recursion acts like a micro-inference cycle—an internal flow of thought—allowing the model to self-correct without any explicit chain-of-thought supervision. This mechanism feels like performing inference on z itself: a learned reasoning trajectory rather than a fixed embedding. It’s elegant, biologically inspired, and shockingly efficient—outperforming much larger models on complex reasoning benchmarks like ARC-AGI and Sudoku. TRM suggests that the next frontier in AI reasoning might not be bigger transformers or longer context windows, but dynamic latent flows—models that learn to reason recursively, within themselves. Maybe “less is more” isn’t just a slogan. It’s a blueprint for smarter, smaller, self-reflective intelligence. https://lnkd.in/ey-t5zFz #TinyRecursionModel, #latent_reasoning, #recursive_inference, #efficient_intelligence, #z_space_thinking

Less is More: Recursive Reasoning with Tiny Networks arxiv.org
Like Comment
To view or add a comment, sign in
Bunty Shah

VP@MSCI | Generative AI Expert | LLMs | LLMOps | Machine Learning | NLP | Data Scientist | 12 Yrs Experience
3w
Report this post
'SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs'. What caught my attention as an AI Architect: - The authors address a major hurdle in LLM reasoning: how to balance explicit chain-of-thought with richer, latent reasoning, especially without extra training. - SWIREASONING is a clever training-free framework that dynamically switches between explicit and latent reasoning, guided by entropy-based confidence estimates from next-token distributions. It also prevents 'overthinking' by capping the switches, leading to Pareto-efficient inference. Key findings: - On top mathematics/STEM benchmarks, SWIREASONING boosts accuracy (up to +2.8 points) and delivers an impressive 56–79% token efficiency over baselines like CoT and Soft-Thinking. - The gains are most significant for deep deductive tasks, showing this approach's edge when long reasoning chains or uncertainty are present. Broader implications: - This plug-and-play framework lets us deploy smarter, more efficient reasoning strategies in LLMs—no retraining needed. It's especially attractive for scaling and resource-conscious AI system deployments. - Tighter context control and dynamic switching could be foundational for future reasoning-centric model architectures and even benefit reinforcement learning integrations. Curious how dynamic switching between explicit and latent thinking might shape the next generation of intelligent systems? Links in the comments. PDF attached: Original research source.

2 Comments
Like Comment
To view or add a comment, sign in

43,935 followers

View Profile Connect

LinkedIn respects your privacy

HRM: A 27M Parameter Model That Outperforms Claude and Gemini

More from this author

America’s Invisible Empire: The Unmatched Rise of U.S. Data Center Power

500,000 Drones a Month: Can the U.S. Outsmart China in the Drone Arms Race?

The End of Software Engineering as We Know It: How Agentic AI Is Taking Over

Explore content categories

HRM: A 27M Parameter Model That Outperforms Claude and Gemini

More Relevant Posts

More from this author

America’s Invisible Empire: The Unmatched Rise of U.S. Data Center Power

500,000 Drones a Month: Can the U.S. Outsmart China in the Drone Arms Race?

The End of Software Engineering as We Know It: How Agentic AI Is Taking Over

Explore related topics

Explore content categories