From Columns to Rewards: Automating the Two Pillars That Drive Modern AI

From Columns to Rewards: Automating the Two Pillars That Drive Modern AI

When I worked at Google, I was lucky to collaborate with some of the brightest machine-learning (ML) engineers. They worked on feature engineering. By picking the factors to guide the ML model, their advances could generate tens to hundreds of millions of additional revenue.

Imagine an Excel spreadsheet with hundreds of columns of data. Add two columns, multiply two, divide by another, and subtract a fourth. Each of these is a feature. ML models used features to predict the best ad to show.

It started as a craft, reflecting the vibes of the era. Over time, we’ve mechanized this art into a machine called AutoML that massively accelerates the discovery of the right features.

Today, reinforcement learning (RL) is in the same place as feature engineering 15 years ago.

What is RL? It’s a technique of teaching AI to accomplish goals.

Consider a brave Roomba. It presses into a dirty room.

Then it must make a cleaning plan and execute it. Creating the plan is step 1. To complete the plan, like any good worker, it will reward itself, not with a foosball break, but with some points.

Its reward function might be: +0.1 for each new square foot cleaned, -5 for bumping into a wall, and +100 for returning to its dock with a full dustbin. The tireless vacuum’s behavior is shaped by this simple arithmetic. (NB : I’m simplifying quite a bit here.)

Today, AI can create the plan, but isn’t yet able to develop the reward functions. People do this, much as we developed features 15 years ago.

Will we see an AutoRL? Not for a while. The techniques for RL are still up for debate. Andrej Karpathy highlighted the debate in a recent podcast.

Article content

This current wave of AI improvement could hinge on RL success. Today, it’s very much a craft. The potential to automate it—to a degree or fully—will transform the way we build agentic systems.

is this why Ai can't really correct errors / accept human correction aka learn?

Like
Reply

To view or add a comment, sign in

More articles by Tomasz Tunguz

  • Are We Being Railroaded by AI?

    Just how much are we spending on AI? Compared to other massive infrastructure projects, AI is the sixth largest in US…

    7 Comments
  • Are We Being Railroaded by AI?

    Just how much are we spending on AI? Compared to other massive infrastructure projects, AI is the sixth largest in US…

    20 Comments
  • A 1 in 15,787 Chance Blog Post

    I wrote a post titled Congratulations, Robot. You’ve Been Promoted! in which OpenAI declared that their AI coders were…

    21 Comments
  • OpenAI's $1 Trillion Infrastructure Spend

    OpenAI has committed to spending $1.15 trillion on hardware & cloud infrastructure between 2025 & 2035.

    22 Comments
  • Small Data Becomes Big Data

    I sleep better knowing my agents work through the night. Less work for me in the morning.

    6 Comments
  • $555B of Cloud Spend

    Microsoft & Google both announced earnings yesterday, & the scale of AI adoption remains staggering. The infrastructure…

    11 Comments
  • The Growth Premium Persists in SaaS Valuations

    As the world follows every whisper & rumor from AI, what has happened to public SaaS? The answer is not much! All the…

    2 Comments
  • Product-Market Fit is No Longer Static

    There’s this notion that product market fit is this binary condition that once you pass the gate, you’re done. And…

    15 Comments
  • Where Is Your AI Running?

    It’s 9:30 AM. Do you know where your agent is? As we enter the era of agentic AI, this is an increasingly important…

    4 Comments
  • Good Morning & Good Luck

    I asked Claude this morning about the most important news in tech. A few follow-up questions about Salesforce’s…

    9 Comments

Explore content categories