/AI14h ago

Microsoft VP of AI Nando de Freitas proposes replacing complex training pipelines with unified, continual interactive causal agents

The single stream replaces separate SFT and RLHF objectives

182011520117.9K

#28

Original post

Nando de Freitas@NandoDF#28inAI

The field of AI is at a local minimum. Not a local minimum in architectures and models, but a local minimum on how we train: a Frankenstein multi-stage approach. In this new blog entry, I propose a different route based on continual interaction and causality.

https://love4all.ai/blog/continual-interactive-causal-agents/

4:50 AM · Jun 7, 2026 · 13K Views

/AI14h ago

Microsoft VP of AI Nando de Freitas proposes replacing complex training pipelines with unified, continual interactive causal agents

The single stream replaces separate SFT and RLHF objectives

182011520117.9K

#28

Original post

Nando de Freitas@NandoDF#28inAI

https://love4all.ai/blog/continual-interactive-causal-agents/

4:50 AM · Jun 7, 2026 · 13K Views

Sentiment

Users like Nando de Freitas's proposal for single-stream interactive causal agent training because they find the causal approach interesting and directionally promising.

Pos

100.0%

Neg

0.0%

6 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS2.8KBOOKMARKS3LIKES9REPLIES2

Michael Black@Michael_J_Black

@NandoDF I like this. Directionally, it feels right.

Nando de Freitas@NandoDF

https://love4all.ai/blog/continual-interactive-causal-agents/

13h2.8K93

Pedro A. Ortega@AdaptiveAgents

This is the way

Nando de Freitas@NandoDF

https://love4all.ai/blog/continual-interactive-causal-agents/

6h27841

Julius Adebayo@juliusadml

thought provoking read.

Nando de Freitas@NandoDF

https://love4all.ai/blog/continual-interactive-causal-agents/

2h18001

Pim de Witte@PimDeWitte

@Michael_J_Black @NandoDF So basically world models 😜

13h411

Pim de Witte@PimDeWitte

@NandoDF @Michael_J_Black I view all of those things as just text conditioning and steerability on a WM architecture. What you’re describing here is precisely the original promise (and reason) WMs are being pursued so hard. In case you hadn’t read yet: https://www.notboring.co/p/world-models

13h8

Nando de Freitas@NandoDF

@Michael_J_Black Thanks, Michael. It’s still not very developed or properly tested, but I agree that directionally, it feels worth trying

Michael Black@Michael_J_Black

@NandoDF I like this. Directionally, it feels right.

13h1.8K30

Nando de Freitas@NandoDF

@PimDeWitte @Michael_J_Black Precisely not. This is not about model architectures, what people often stress when talking about world models. This works with Jepa or GPT. This is about causal interactive training. It’s all about environments, not agents.

13h16

Nando de Freitas@NandoDF

Thanks for asking. Pedro pointed out issue much earlier when I was working on General AgenT One — Gato 🐈

https://arxiv.org/abs/2205.06175

and wrote about it

https://arxiv.org/abs/2110.10819

Then Pedro came up with this brilliant theoretical insight:

https://www.adaptiveagents.org/universal_ai_as_imitation

And we generalised it to LLMs recently:

https://love4all.ai/files/why-it-is-important-to-understand-causality-and-agency.pdf

https://love4all.ai/files/emergent-reward-maximization.pdf

7h551

Pim de Witte@PimDeWitte

@NandoDF @Michael_J_Black P(S’|A,S) defines how the world evolves, P(S’|do(A),S) is how you train general agents inside those WMs. They supplement each other in the loop you lay out. I see your distinction though - two sides of the same coin

12h221

Nando de Freitas@NandoDF

@PimDeWitte @Michael_J_Black Could you please be more precise. Could you show us precisely how world models are trained causally. Thanks

8h61

Nando de Freitas@NandoDF

@BlissyOnX I agree. But everything has a beginning 🙂

7h51

tsunami_crypto@ls_brd

@NandoDF multi-stage pipeline feels like duct taping training phases together

the interventional agent is cleaner but do we have the infra to pull it off

14h34

Blissy@BlissyOnX

@NandoDF the single stream feels inevitable tbh, but the gap between proposing it and making it work is where the real friction lives

14h25

Strata@ChainZenit

@NandoDF this take on causality is super interesting, how did you start?

14h24

Pim de Witte@PimDeWitte

Loop would hold for causal / non causally trained WMs no? You are definitely the expert on the former, I was mostly commenting on the fact that your loop is why WMs are so exciting, as it rapidly accelerates the amount of environments. We do generative WMs (not causal). For what it’s worth, I like your post and my comment was intended as light jest, not criticism!

4h23

Roei Herzig ✈️ CVPR@roeiherzig

@NandoDF What is the conceptual difference between stages 2 and 3, or 5 and 6?

12h21

Rugbist@rugbist_

@NandoDF single stream agent approach would def reduce all the prompt engineering hell we deal with now

question is what happens to fine tuning in that setup

14h19

Alex YGift@Radipdegen

@NandoDF hard to disagree that multi-stage feels like patching a leak with more patches

wonder if the compute budget holds up in practice though

14h16

dimenwarper@tsuname

@NandoDF I like the causal take, makes a lot of sense as a unifier. How would you also merge pre-training here? or is that simply too much of a completely different thing (non-interventional bootstrapping)

5h15

Invincible@InvincibleEdge

@NandoDF curious how u see continual interaction scaling in practice tho

most labs cant even keep one training run stable

14h12