From Columns to Rewards: Automating the Two Pillars That Drive Modern AI
When I worked at Google, I was lucky to collaborate with some of the brightest machine-learning (ML) engineers. They worked on feature engineering. By picking the factors to guide the ML model, their advances could generate tens to hundreds of millions of additional revenue.
Imagine an Excel spreadsheet with hundreds of columns of data. Add two columns, multiply two, divide by another, and subtract a fourth. Each of these is a feature. ML models used features to predict the best ad to show.
It started as a craft, reflecting the vibes of the era. Over time, we’ve mechanized this art into a machine called AutoML that massively accelerates the discovery of the right features.
Today, reinforcement learning (RL) is in the same place as feature engineering 15 years ago.
What is RL? It’s a technique of teaching AI to accomplish goals.
Consider a brave Roomba. It presses into a dirty room.
Then it must make a cleaning plan and execute it. Creating the plan is step 1. To complete the plan, like any good worker, it will reward itself, not with a foosball break, but with some points.
Its reward function might be: +0.1 for each new square foot cleaned, -5 for bumping into a wall, and +100 for returning to its dock with a full dustbin. The tireless vacuum’s behavior is shaped by this simple arithmetic. (NB : I’m simplifying quite a bit here.)
Today, AI can create the plan, but isn’t yet able to develop the reward functions. People do this, much as we developed features 15 years ago.
Will we see an AutoRL? Not for a while. The techniques for RL are still up for debate. Andrej Karpathy highlighted the debate in a recent podcast.
This current wave of AI improvement could hinge on RL success. Today, it’s very much a craft. The potential to automate it—to a degree or fully—will transform the way we build agentic systems.
Growth Consultant
2wis this why Ai can't really correct errors / accept human correction aka learn?