Diffusion models can't replace LLMs, says Denis O.

Leadership and Keynote Speaker and member of the Data Science Research Centre at University of Derby

3mo

Oh, Dear! Yet more magical thinking about reducing hallucinations of LLMs by using diffusion models. As Denis O. critiqued them, this won't happen. Diffusion models are at least as much subject to hallucinations as transformer based LLMs. Anyone who has used a GenAI image generator knows of the difficulty in getting it to faithfully follow the prompts. "It seems likely that diffusion LLMs could challenge auto-regressive tech, offering new capabilities like improved reasoning and controllability, but their full impact is still emerging." As always, the proponents of these new technologies keep talking about reasoning. It is just not going to happen and we know that image generators are highly uncontrollable. The idea of generating a whole block of text in parallel is wrong from a semantics and linguistics perspective. Language has sequential dependencies. An argument is developed sequentially. A critical evaluation is developed sequentially, not in parallel. "Andrej Karpathy and Andrew Ng, both renowned AI researchers, have enthusiastically welcomed the arrival of Inception Lab’s diffusion LLM." Surely, this is the kiss of death, not an endorsement. "This process begins with random noise which is then gradually refined and “denoised” into a coherent stream of tokens. This is analogous to how diffusion models generate images by starting with noise and iteratively removing it to reveal a clear image." It is not analogous, it is exactly the same, need to get the facts right. "Parallel Processing: Diffusion models can process and generate text in parallel, potentially leading to significant speed advantages." As with image generators, the final output of tokens will be fast, just transferring out of the output buffer to the user, but after a long pause (not even thinking) to do all the denoising. We also know that diffusion models eat TFLOPs, just as much as or more than transformers. https://lnkd.in/eq2HWhWZ

What is Diffusion LLM and why it matters medium.com

25 Comments

Richard Self

Leadership and Keynote Speaker and member of the Data Science Research Centre at University of Derby

3mo

Gary Marcus Bogdan Grigorescu Purnima Gauthron Victoria Coté Dr Jason Price Anindita Biswas Dr. Jeffrey Funk Onyinye Ozor Prof J. Mark Bishop Mick Fews Shobhit Tankha Ivo Koutsaroff (掘露伊保龍, KUTSURO Iboryu) Koutsaroff (掘露伊保龍, KUTSURO Iboryu) Dinakar R. Denis O. Terry Bollinger Dr Sara Saravi Russ Guillette @Ellzabeth abete Hernandez Axel C.Mykel G. Larson ♞ Neil Gentleman-Hobbs Peter Bednar Susan Furnell Beth Carey John Ball Stephen Klein Tara TAUBMAN-BASSIRIAN LL.M Maria Santacaterina Dr Eliza Mik Dagmar Monett Ulla Coester Donn Gurule Evelyn Van Orden Debbie Reynolds Garrick Villaume Jason Gulya Tim El-Sheikh Tim Grice Hristo Georgiev Moira S.

8 Reactions

Purnima Gauthron

Philosophy | Engineering | Finance I “Artificial Intelligence” Fact-checking

3mo

Bear in mind that AI celebrities like Andrej Karpathy and Andrew Ng are also deep-pocketed investors in AI. They have to keep the drumbeat going in the "AI base". dLLMs, same inaccuracies, but faster and cheaper will diffuse away..:)

11 Reactions

Sonny M.

Founder @TerraNex | Research & Development | AI | LLM | ML | Ex Microsoft

3mo

Richard Self saying “parallel text generation is semantically wrong” feels a bit too strong. Language definitely works best when it’s built step by step. We process ideas in order, and that sequence matters. But that doesn’t mean everything a model does has to happen one word at a time. In fact, during training, models often use parallel structures like masked language modeling or bidirectional attention to learn how words relate across a sentence, even if they’re far apart. Some of the newer diffusion based models, like Inception or CoDi, actually try to mix in that step by step logic too. They don’t just throw sequence out the window they’re trying to keep the benefits of autoregressive models while exploring new ways to generate text. So no, parallel generation isn’t “wrong.” It’s just a different approach, and in some cases, it might actually help with things like long range coherence or editing larger chunks of text. It’s not magic, but it’s also not something to dismiss out of hand.

Bogdan Grigorescu

Sr Tech Lead | Engineering | Automation

3mo

"We also know that diffusion models eat TFLOPs" - TSLOP that's what the article looks and feels like. It's written at least partly with a generative LLM (looks more like GPT than diffusion). Checkout the key points, key aspects and key potential advantages paragraphs (former two at the beginning of article, latter one at the end of the piece).

4 Reactions

Anindita Biswas

I bring common sense back in writing about tech & society. 16 yrs of strategic clarity in B2B tech thought leadership. Focus: Privacy, AI/ ML, Cybersecurity, IIoT, Energy, Telecom & more. Substack: The Monocled Writer.

3mo

At this point such pronouncements seem like a social experiment. Keep repeating unsubstantiated claims until people believe them to be the truth.

3 Reactions

Roderick Mann

Writing. Teaching. Management Consulting.

3mo

You're absolutely right—language unfolds over time, and its meaning is shaped by how ideas are introduced, developed, and linked together. Each statement depends on what came before. You can't understand a conclusion without grasping its premises. Arguments typically follow a flow—claims, evidence, counterarguments, then conclusions. Readers or listeners refine their understanding step by step, adjusting their perspective with each new piece of information. Whether it's a persuasive essay or a philosophical debate, the arrangement of ideas steers how they're understood and judged. Parallel processing, like analyzing multiple perspectives at once, can supplement this, but at its core, language guides us linearly.

3 Reactions

Valery Kashcheva

Generative AI Engineer

3mo

It can be assumed that in the process of human thinking, there are constant 'recall' steps that lead to the generation of a set of relevant concepts and meanings in the mind. This is most likely a parallel process. Diffusion models could be quite suitable for implementing such a mechanism.

2 Reactions

Dinakar R.

Founding Principal Enterprise Architect I Independent Board Advisor|Research & Risk analyst l ASAI consultant

3mo

The way to read this: Richard Self Denis O. The honest facts about LLMs and their limitations are well-exposed. Another prefix to LLMs, dLLMs to spin a new story. This fits into both categories: predictably irrational at scale and lipstick on Yaatt

2 Reactions

Terry Bollinger

3mo

Good points!

1 Reaction

Mike Bainbridge

Consultant at Teracomb

3mo

This is great. The more people are exploring new techniques the better. I think we can now get past the "AGI" / sentience is just round the corner and just accept them as useful tools for the vast majority of people using them with care.

See more comments

To view or add a comment, sign in

LinkedIn respects your privacy

More from this author

A Unique Week when Reality strikes Uber, Amazon, Lyft, IBM and Google Waymo and Hype is now revoked. What it means for CxOs.

Distributed Ledger Resources

5th International Big Data and Analytics Educational Conference

Explore content categories