SlideShare a Scribd company logo
2
Most read
6
Most read
22
Most read
Michelangelo Palette
Feature Engineering @ Uber
Amit Nene
Staff Engineer,
Michelangelo ML
Platform
Eric Chen
Engineering Manager,
Michelangelo ML
Platform
Enable engineers and data scientists across the
company to easily build and deploy machine learning
solutions at scale.
ML-as-a-service
○ Managing Data/Features
○ Tools for managing, end-to-end, heterogenous
training workflows
○ Batch, online & mobile serving
○ Feature and Model drift monitoring
Michelangelo @ Uber
MANAGE DATA
TRAIN MODELS
EVALUATE MODELS
DEPLOY MODELS
MAKE PREDICTIONS
MONITOR PREDICTIONS
Feature Engineering @ Uber
○ Example: ETA for EATS order
○ Key ML features
○ How large is the order?
○ How busy is the restaurant?
○ How quick is the restaurant?
○ How busy is the traffic?
Managing Features
One of the hardest problems in ML
○ Finding good Features & labels
○ Data in production: reliability, scale, low latency
○ Data parity: training/serving skew
○ Real-time features: traditional tools don’t work
Palette Feature Store
Uber-specific curated and crowd-sourced feature database that is easy to use with machine
learning projects.
One stop shop
○ Search for features in single catalog/spec: rider, driver, restaurant, trip, eaters, etc.
○ Define new features + create production pipelines from spec
○ Share features across Uber: cut redundancy, use consistent data
○ Enable tooling: Data Drift Detection, Auto Feature Selection, etc.
Feature Store Organization
Organized as <entity>:<feature-group>:<feature-name>:<join-key>
Eg. @palette:restaurant:realtime_group:orders_last_30min:restaurant_uuid
Backed by a dual datastore system: similarities to lambda
○ Offline
○ Offline (Hive based) store for bulk access of features
○ Bulk retrieval of features across time
○ Online
○ KV store (Cassandra) for serving latest known value
○ Supports lookup/join of latest feature values in real time
○ Data synced between online & offline
○ Key to avoiding training/serving skew
EATS Features revisited
○ How large is the order? ← Input
○ How busy is the restaurant?
○ How quick is the restaurant?
○ How busy is the traffic?
Creating Batch Features
Offline Batch jobs
Features join
Model
Training
job
Online
Store
(Serving)
Offline
Store
(Training)
Features join
Model
Scoring
Service
Data
dispersal
Feature
Store
Palette
Feature
spec
General trends, not sensitive to
exact time of event
Ingested from Hive queries or
Spark jobs
How quick is the restaurant ?
○ Aggregate trends
○ Use Hive QL from warehouse
○ @palette:restaurant:batch_aggr:
prepTime:rId
Hive QL
Apache Hive and Apache Spark are either registered trademarks or
trademarks of the Apache Software Foundation in the United
States and/or other countries. No endorsement by The Apache
Software Foundation is implied by the use of this mark.
Creating Real-time Features
Flink-as-service
Streaming jobs
Features join
Model
Scoring
Service
Offline
Store
(Training)
Online
Store
(Serving)
Features join
Model
Training
job
Log +
Backfill
Feature
Store
Features reflecting the latest
state of the world
Ingest from streaming jobs
How busy is the restaurant ?
○ kafka topic with events
○ perform realtime aggregations
○ @palette:restaurant:rt_aggr:nMeal:
rId
Palette
Feature
spec
Apache Flink is either a registered trademark or trademark of the
Apache Software Foundation in the United States and/or other
countries. No endorsement by The Apache Software Foundation is
implied by the use of this mark.
Flink SQL
Bring Your Own Features
Feature maintained by customers
Mechanisms for hooking
serving/training endpoints
Users maintain data parity
How busy is the region ?
○ RPC: external traffic feed
○ Log RPCs for training
○ @palette:region:traffic:nBusy:regionId
Features join
Model
Scoring
Service
Features join
Model
Training
job
Palette
Feature
spec
Offline
Proxy
Online
Proxy
Custom
store
Batch
API
Service
endpoint
RPC
Palette Feature Joins
Join @basis features
with supplied
@palette features into
single feature vector
Join billion+ rows at
points-in-time:
dominates overhead
Join/Lookup 10s of
tables at serving time
at low latency
order_i
d
nOrder restaur
ant_uui
d
latlong Label
ETA
(trainin
g)
timesta
mp
1 4 uuid1 (10,20) 40m t1
2 3 uuid2 (30,40) 35m t2
rId prepTime timestamp
uuid1 20m t1
uuid2 15m t2
join_key = rId
@basis features
training/scoring feature vector
Time + Key
join
order
_id
rId latlong Label
ETA
(trainin
g)
prepT
ime
..
1 uuid1 (10,20) 40m 20m ..
2 uuid2 (30,40) 35m 15m ..
@palette:restaurant:agg_stats:prepTime:
restaurant_uuid
@palette:restaurant:re
altime_stats:nBusy:rId
@palette:region:stats:n
Busy:regionId
Done with Feature Engineering ?
○ Feature Store Features
○ nOrder: How large is the order? (basis)
○ nMeal: How busy is the restaurant? (near real-time)
○ prepTime: How quick is the restaurant? (batch feature)
○ nBusy: How busy is the traffic? (external feature)
○ Ready to use ?
○ Model specific feature transformations
○ Chaining of features
○ Feature Transformers
Feature Consumption
Feature Store Features
○ nOrder: input feature
○ nMeal: consume directly
○ prepTime: needs transformation before use
○ nBusy: input latlong but need regionId
Setting up consumption pipelines
○ nMeal: r_id -> nMeal
○ prepTime: r_id -> prepTime -> featureImpute
○ nBusy: r_id -> lat, log -> regionId(lat, log) -> nBusy
In arbitrary order
Michelangelo Transformers
Transformer: Given a record defined a set of fields, add/modify/remove fields in the record
PipelineModel: A sequence of transformers
Spark ML: Transformer/PipelineModel on DataFrames
Michelangelo Transformers: extended transformer framework for both Apache Spark and
Apache Spark-less environments
Estimator: Analyze the data and produce a transformer
Defining a Pipeline Model
Join Palette Features
Apply Feature Eng Rules
String Indexing
One-Hot Encoding
DL Inferencing
Result Retrieval
Feature consumption
○ Feature extraction: Palette feature retrieval expressed as a transform
○ Feature mutation: Scala-like DSL for simple transforms
○ Model-centric Feature Engineering: string indexer, one-hot encoder,
threshold decision
○ Result retrieval
Modeling
○ Model inferencing (also Michelangelo Transformer)
Michelangelo Transformers Example
class MyModel (override val uid: String) extends Model[MyModel] with
MyModelParam with MLWritable with MATransformer {
...
override def transform(dataset: Dataset[_]): DataFrame = ...
override def scoreInstance(instance: util.Map[String, Object]): util.Map[String,
Object] = ...
}
class MyEstimator(override val uid: String) extends
Estimator[MyEstimator] with Params with DefaultParamsWritable {
...
override def fit(dataset: Dataset[_]): MyModel = ...
}
Palette retrieval as a Transformer
Palette Feature Transformer
Feature Meta Store
RPC Feature Proxy
Cassandra Access
Hive Access
tx_p1 = PaletteTransformer([
"@palette:restaurant:realtime_feature:nMeal:r_id",
"@palette:restaurant:batch_feature:prepTime:r_id",
"@palette:restaurant:property:lat:r_id",
"@palette:restaurant:property:log:r_id"
])
tx_p2 = PaletteTransformer([
"@palette:region:service_feature:nBusy:region_id"
])
DSL Estimator / Transformer
DSL Estimator
Code Gen / Compiler
DSL Transformer
Online classloader Offline classloader
es_dsl1 = DSLEstimator(lambdas = [
["region_id", "regionId(@palette:restaurant:property:lat:r_id,
@palette:restaurant:property:r_id"]
])
es_dsl2 = DSLEstimator(lambdas = [
["prepTime": nFill(nVal("@palette:restaurant:batch_feature:prepTime:r_id"),
avg("@palette:restaurant:batch_feature:prepTime:r_id")))"],
["nMeal": nVal("@palette:restaurant:realtime_feature:nMean:r_id")],
["nOrder": nVal("@basis:nOrder")],
["nBusy": nVal("@palette:region:service_feature:nBusy:region_id")]
])
Uber Eats Example Cont.
Computation order
○ nMeal: rId -> nMeal
○ prepTime: rId -> prepTime -> featureImpute
○ busyScale: rId -> lat, log -> regionId(lat, log) -> busyScale
Palette Transformer
id -> nMeal
id -> prepTime
id -> lat, log
DSL Transformer
lag, log -> regionID
Palette Transformer
regionID -> nBusy
DSL Transformer
impute(nMeal)
impute(prepTime)
Dev Tools: Authoring and Debugging a Pipeline
Palette feature generation
● Apache Hive QL, Apache Flink SQL
Interactive authoring
● PySpark + iPython Jupyter notebook
Centralized model store
● Serialization / Deserialization (Spark ML,
MLReadable/Writeable)
● Online and offline accessibility
basis_feature_sql = "..."
df = spark.sql(basis_feature_sql)
pipeline = Pipeline(stages=[tx_p1, es_dsl1, tx_p2, es_dsl2t, vec_asm, l_r)
pipeline_model = pipeline.fit(df)
scored_def = pipeline_model.transform(df)
model_id = MA_store.save_model(pipeline_model)
draft_id = MA_store.save_pipeline(basis_feature_sql, pipeline)
retrain_job = MA_API.train(draft_id, new_basis_feature_sql)
Takeaways
Feature Store: Batch, Realtime and External Features with online and offline parity
Offline scalability: Joins across billions of rows
Online serving latency: Parallel IO, fast storage with caching
Feature Transformers: Setup chains of transformations at training/serving time
Pipeline reliability and monitoring out-of-the-box
Thank you
https://eng.uber.com/michelangelo/
https://www.uber.com/careers/

More Related Content

What's hot (20)

PDF
Building an open data platform with apache iceberg
Alluxio, Inc.
 
PPTX
Frame - Feature Management for Productive Machine Learning
David Stein
 
PDF
Delta from a Data Engineer's Perspective
Databricks
 
PDF
The delta architecture
Prakash Chockalingam
 
PDF
The Parquet Format and Performance Optimization Opportunities
Databricks
 
PDF
Iceberg: A modern table format for big data (Strata NY 2018)
Ryan Blue
 
PDF
Making Apache Spark Better with Delta Lake
Databricks
 
PDF
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Adam Doyle
 
PDF
Data Day Texas 2017: Scaling Data Science at Stitch Fix
Stefan Krawczyk
 
PDF
Bloom filter
Hamid Feizabadi
 
PDF
Airflow presentation
Ilias Okacha
 
PDF
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
 
PDF
Google BigQuery
Matthias Feys
 
PPTX
Free Training: How to Build a Lakehouse
Databricks
 
PDF
Iceberg: a fast table format for S3
DataWorks Summit
 
PDF
Getting Started with Confluent Schema Registry
confluent
 
PDF
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 
PPTX
Apache Arrow Flight Overview
Jacques Nadeau
 
PDF
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward
 
PDF
Performant Streaming in Production: Preventing Common Pitfalls when Productio...
Databricks
 
Building an open data platform with apache iceberg
Alluxio, Inc.
 
Frame - Feature Management for Productive Machine Learning
David Stein
 
Delta from a Data Engineer's Perspective
Databricks
 
The delta architecture
Prakash Chockalingam
 
The Parquet Format and Performance Optimization Opportunities
Databricks
 
Iceberg: A modern table format for big data (Strata NY 2018)
Ryan Blue
 
Making Apache Spark Better with Delta Lake
Databricks
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Adam Doyle
 
Data Day Texas 2017: Scaling Data Science at Stitch Fix
Stefan Krawczyk
 
Bloom filter
Hamid Feizabadi
 
Airflow presentation
Ilias Okacha
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
 
Google BigQuery
Matthias Feys
 
Free Training: How to Build a Lakehouse
Databricks
 
Iceberg: a fast table format for S3
DataWorks Summit
 
Getting Started with Confluent Schema Registry
confluent
 
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 
Apache Arrow Flight Overview
Jacques Nadeau
 
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward
 
Performant Streaming in Production: Preventing Common Pitfalls when Productio...
Databricks
 

Similar to 2019 Slides - Michelangelo Palette: A Feature Engineering Platform at Uber (20)

PDF
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
Data Science Milan
 
PDF
Scaling machine learning as a service at Uber — Li Erran Li at #papis2016
PAPIs.io
 
PDF
Scaling machinelearning as a service at uber li Erran li - 2016
Karthik Murugesan
 
PDF
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx
Lex Avstreikh
 
PDF
Building a Feature Store around Dataframes and Apache Spark
Databricks
 
PPTX
IDE.pptx
Roshan Sinha
 
PDF
Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...
Data Con LA
 
PPTX
MetaConfig driven FeatureStore : MakeMyTrip | Presented at Data Con LA 2019 b...
Piyush Kumar
 
PDF
KFServing, Model Monitoring with Apache Spark and a Feature Store
Databricks
 
PDF
Managed Feature Store for Machine Learning
Logical Clocks
 
PDF
Simplify Feature Engineering in Your Data Warehouse
FeatureByte
 
PDF
KFServing and Feast
Animesh Singh
 
PDF
A Practical Enterprise Feature Store on Delta Lake
Databricks
 
PDF
Flink Forward San Francisco 2018: Dave Torok & Sameer Wadkar - "Embedding Fl...
Flink Forward
 
PDF
Building a Real-Time Feature Store at iFood
Databricks
 
PPTX
Real-Time Machine Learning with Pulsar Functions - Pulsar Summit NA 2021
StreamNative
 
PDF
Hamburg Data Science Meetup - MLOps with a Feature Store
Moritz Meister
 
PDF
Hopsworks MLOps World talk june 21
Jim Dowling
 
PDF
Scaling Data and ML with Apache Spark and Feast
Databricks
 
PDF
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Jim Dowling
 
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
Data Science Milan
 
Scaling machine learning as a service at Uber — Li Erran Li at #papis2016
PAPIs.io
 
Scaling machinelearning as a service at uber li Erran li - 2016
Karthik Murugesan
 
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx
Lex Avstreikh
 
Building a Feature Store around Dataframes and Apache Spark
Databricks
 
IDE.pptx
Roshan Sinha
 
Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...
Data Con LA
 
MetaConfig driven FeatureStore : MakeMyTrip | Presented at Data Con LA 2019 b...
Piyush Kumar
 
KFServing, Model Monitoring with Apache Spark and a Feature Store
Databricks
 
Managed Feature Store for Machine Learning
Logical Clocks
 
Simplify Feature Engineering in Your Data Warehouse
FeatureByte
 
KFServing and Feast
Animesh Singh
 
A Practical Enterprise Feature Store on Delta Lake
Databricks
 
Flink Forward San Francisco 2018: Dave Torok & Sameer Wadkar - "Embedding Fl...
Flink Forward
 
Building a Real-Time Feature Store at iFood
Databricks
 
Real-Time Machine Learning with Pulsar Functions - Pulsar Summit NA 2021
StreamNative
 
Hamburg Data Science Meetup - MLOps with a Feature Store
Moritz Meister
 
Hopsworks MLOps World talk june 21
Jim Dowling
 
Scaling Data and ML with Apache Spark and Feast
Databricks
 
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Jim Dowling
 
Ad

More from Karthik Murugesan (20)

PDF
Rakuten - Recommendation Platform
Karthik Murugesan
 
PDF
Yahoo's Knowledge Graph - 2014 slides
Karthik Murugesan
 
PDF
Free servers to build Big Data Systems on: Bing's Approach
Karthik Murugesan
 
PDF
Microsoft cosmos
Karthik Murugesan
 
PPTX
Microsoft AI Platform - AETHER Introduction
Karthik Murugesan
 
PDF
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
Karthik Murugesan
 
PDF
Lyft data Platform - 2019 slides
Karthik Murugesan
 
PDF
The Evolution of Spotify Home Architecture - Qcon 2019
Karthik Murugesan
 
PDF
Unifying Twitter around a single ML platform - Twitter AI Platform 2019
Karthik Murugesan
 
PDF
The magic behind your Lyft ride prices: A case study on machine learning and ...
Karthik Murugesan
 
PDF
The journey toward a self-service data platform at Netflix - sf 2019
Karthik Murugesan
 
PDF
Developing a ML model using TF Estimator
Karthik Murugesan
 
PDF
Production Model Deployment - StitchFix - 2018
Karthik Murugesan
 
PDF
Netflix factstore for recommendations - 2018
Karthik Murugesan
 
PDF
Trends in Music Recommendations 2018
Karthik Murugesan
 
PDF
Netflix Ads Personalization Solution - 2017
Karthik Murugesan
 
PDF
State Of AI 2018
Karthik Murugesan
 
PDF
Spotify Machine Learning Solution for Music Discovery
Karthik Murugesan
 
PDF
AirBNB - Zipline: Airbnb’s Machine Learning Data Management Platform
Karthik Murugesan
 
PDF
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
Karthik Murugesan
 
Rakuten - Recommendation Platform
Karthik Murugesan
 
Yahoo's Knowledge Graph - 2014 slides
Karthik Murugesan
 
Free servers to build Big Data Systems on: Bing's Approach
Karthik Murugesan
 
Microsoft cosmos
Karthik Murugesan
 
Microsoft AI Platform - AETHER Introduction
Karthik Murugesan
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
Karthik Murugesan
 
Lyft data Platform - 2019 slides
Karthik Murugesan
 
The Evolution of Spotify Home Architecture - Qcon 2019
Karthik Murugesan
 
Unifying Twitter around a single ML platform - Twitter AI Platform 2019
Karthik Murugesan
 
The magic behind your Lyft ride prices: A case study on machine learning and ...
Karthik Murugesan
 
The journey toward a self-service data platform at Netflix - sf 2019
Karthik Murugesan
 
Developing a ML model using TF Estimator
Karthik Murugesan
 
Production Model Deployment - StitchFix - 2018
Karthik Murugesan
 
Netflix factstore for recommendations - 2018
Karthik Murugesan
 
Trends in Music Recommendations 2018
Karthik Murugesan
 
Netflix Ads Personalization Solution - 2017
Karthik Murugesan
 
State Of AI 2018
Karthik Murugesan
 
Spotify Machine Learning Solution for Music Discovery
Karthik Murugesan
 
AirBNB - Zipline: Airbnb’s Machine Learning Data Management Platform
Karthik Murugesan
 
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
Karthik Murugesan
 
Ad

Recently uploaded (20)

PDF
AI_MOD_1.pdf artificial intelligence notes
shreyarrce
 
PDF
DevOps Design for different deployment options
henrymails
 
PPTX
西班牙武康大学毕业证书{UCAMOfferUCAM成绩单水印}原版制作
Taqyea
 
PDF
Slides PDF format Eco Economic Epochs.pdf
Steven McGee
 
PPT
introduction to networking with basics coverage
RamananMuthukrishnan
 
PPTX
一比一原版(LaTech毕业证)路易斯安那理工大学毕业证如何办理
Taqyea
 
PPT
Agilent Optoelectronic Solutions for Mobile Application
andreashenniger2
 
PPTX
本科硕士学历佛罗里达大学毕业证(UF毕业证书)24小时在线办理
Taqyea
 
PDF
Apple_Environmental_Progress_Report_2025.pdf
yiukwong
 
PPTX
Template Timeplan & Roadmap Product.pptx
ImeldaYulistya
 
PPTX
internet básico presentacion es una red global
70965857
 
PDF
Azure_DevOps introduction for CI/CD and Agile
henrymails
 
PDF
Build Fast, Scale Faster: Milvus vs. Zilliz Cloud for Production-Ready AI
Zilliz
 
PDF
Internet Governance and its role in Global economy presentation By Shreedeep ...
Shreedeep Rayamajhi
 
PDF
Web Hosting for Shopify WooCommerce etc.
Harry_Phoneix Harry_Phoneix
 
PPTX
英国假毕业证诺森比亚大学成绩单GPA修改UNN学生卡网上可查学历成绩单
Taqyea
 
PPTX
unit 2_2 copy right fdrgfdgfai and sm.pptx
nepmithibai2024
 
PDF
The-Hidden-Dangers-of-Skipping-Penetration-Testing.pdf.pdf
naksh4thra
 
PDF
𝐁𝐔𝐊𝐓𝐈 𝐊𝐄𝐌𝐄𝐍𝐀𝐍𝐆𝐀𝐍 𝐊𝐈𝐏𝐄𝐑𝟒𝐃 𝐇𝐀𝐑𝐈 𝐈𝐍𝐈 𝟐𝟎𝟐𝟓
hokimamad0
 
PPTX
PM200.pptxghjgfhjghjghjghjghjghjghjghjghjghj
breadpaan921
 
AI_MOD_1.pdf artificial intelligence notes
shreyarrce
 
DevOps Design for different deployment options
henrymails
 
西班牙武康大学毕业证书{UCAMOfferUCAM成绩单水印}原版制作
Taqyea
 
Slides PDF format Eco Economic Epochs.pdf
Steven McGee
 
introduction to networking with basics coverage
RamananMuthukrishnan
 
一比一原版(LaTech毕业证)路易斯安那理工大学毕业证如何办理
Taqyea
 
Agilent Optoelectronic Solutions for Mobile Application
andreashenniger2
 
本科硕士学历佛罗里达大学毕业证(UF毕业证书)24小时在线办理
Taqyea
 
Apple_Environmental_Progress_Report_2025.pdf
yiukwong
 
Template Timeplan & Roadmap Product.pptx
ImeldaYulistya
 
internet básico presentacion es una red global
70965857
 
Azure_DevOps introduction for CI/CD and Agile
henrymails
 
Build Fast, Scale Faster: Milvus vs. Zilliz Cloud for Production-Ready AI
Zilliz
 
Internet Governance and its role in Global economy presentation By Shreedeep ...
Shreedeep Rayamajhi
 
Web Hosting for Shopify WooCommerce etc.
Harry_Phoneix Harry_Phoneix
 
英国假毕业证诺森比亚大学成绩单GPA修改UNN学生卡网上可查学历成绩单
Taqyea
 
unit 2_2 copy right fdrgfdgfai and sm.pptx
nepmithibai2024
 
The-Hidden-Dangers-of-Skipping-Penetration-Testing.pdf.pdf
naksh4thra
 
𝐁𝐔𝐊𝐓𝐈 𝐊𝐄𝐌𝐄𝐍𝐀𝐍𝐆𝐀𝐍 𝐊𝐈𝐏𝐄𝐑𝟒𝐃 𝐇𝐀𝐑𝐈 𝐈𝐍𝐈 𝟐𝟎𝟐𝟓
hokimamad0
 
PM200.pptxghjgfhjghjghjghjghjghjghjghjghjghj
breadpaan921
 

2019 Slides - Michelangelo Palette: A Feature Engineering Platform at Uber

  • 1. Michelangelo Palette Feature Engineering @ Uber Amit Nene Staff Engineer, Michelangelo ML Platform Eric Chen Engineering Manager, Michelangelo ML Platform
  • 2. Enable engineers and data scientists across the company to easily build and deploy machine learning solutions at scale. ML-as-a-service ○ Managing Data/Features ○ Tools for managing, end-to-end, heterogenous training workflows ○ Batch, online & mobile serving ○ Feature and Model drift monitoring Michelangelo @ Uber MANAGE DATA TRAIN MODELS EVALUATE MODELS DEPLOY MODELS MAKE PREDICTIONS MONITOR PREDICTIONS
  • 3. Feature Engineering @ Uber ○ Example: ETA for EATS order ○ Key ML features ○ How large is the order? ○ How busy is the restaurant? ○ How quick is the restaurant? ○ How busy is the traffic?
  • 4. Managing Features One of the hardest problems in ML ○ Finding good Features & labels ○ Data in production: reliability, scale, low latency ○ Data parity: training/serving skew ○ Real-time features: traditional tools don’t work
  • 5. Palette Feature Store Uber-specific curated and crowd-sourced feature database that is easy to use with machine learning projects. One stop shop ○ Search for features in single catalog/spec: rider, driver, restaurant, trip, eaters, etc. ○ Define new features + create production pipelines from spec ○ Share features across Uber: cut redundancy, use consistent data ○ Enable tooling: Data Drift Detection, Auto Feature Selection, etc.
  • 6. Feature Store Organization Organized as <entity>:<feature-group>:<feature-name>:<join-key> Eg. @palette:restaurant:realtime_group:orders_last_30min:restaurant_uuid Backed by a dual datastore system: similarities to lambda ○ Offline ○ Offline (Hive based) store for bulk access of features ○ Bulk retrieval of features across time ○ Online ○ KV store (Cassandra) for serving latest known value ○ Supports lookup/join of latest feature values in real time ○ Data synced between online & offline ○ Key to avoiding training/serving skew
  • 7. EATS Features revisited ○ How large is the order? ← Input ○ How busy is the restaurant? ○ How quick is the restaurant? ○ How busy is the traffic?
  • 8. Creating Batch Features Offline Batch jobs Features join Model Training job Online Store (Serving) Offline Store (Training) Features join Model Scoring Service Data dispersal Feature Store Palette Feature spec General trends, not sensitive to exact time of event Ingested from Hive queries or Spark jobs How quick is the restaurant ? ○ Aggregate trends ○ Use Hive QL from warehouse ○ @palette:restaurant:batch_aggr: prepTime:rId Hive QL Apache Hive and Apache Spark are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of this mark.
  • 9. Creating Real-time Features Flink-as-service Streaming jobs Features join Model Scoring Service Offline Store (Training) Online Store (Serving) Features join Model Training job Log + Backfill Feature Store Features reflecting the latest state of the world Ingest from streaming jobs How busy is the restaurant ? ○ kafka topic with events ○ perform realtime aggregations ○ @palette:restaurant:rt_aggr:nMeal: rId Palette Feature spec Apache Flink is either a registered trademark or trademark of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of this mark. Flink SQL
  • 10. Bring Your Own Features Feature maintained by customers Mechanisms for hooking serving/training endpoints Users maintain data parity How busy is the region ? ○ RPC: external traffic feed ○ Log RPCs for training ○ @palette:region:traffic:nBusy:regionId Features join Model Scoring Service Features join Model Training job Palette Feature spec Offline Proxy Online Proxy Custom store Batch API Service endpoint RPC
  • 11. Palette Feature Joins Join @basis features with supplied @palette features into single feature vector Join billion+ rows at points-in-time: dominates overhead Join/Lookup 10s of tables at serving time at low latency order_i d nOrder restaur ant_uui d latlong Label ETA (trainin g) timesta mp 1 4 uuid1 (10,20) 40m t1 2 3 uuid2 (30,40) 35m t2 rId prepTime timestamp uuid1 20m t1 uuid2 15m t2 join_key = rId @basis features training/scoring feature vector Time + Key join order _id rId latlong Label ETA (trainin g) prepT ime .. 1 uuid1 (10,20) 40m 20m .. 2 uuid2 (30,40) 35m 15m .. @palette:restaurant:agg_stats:prepTime: restaurant_uuid @palette:restaurant:re altime_stats:nBusy:rId @palette:region:stats:n Busy:regionId
  • 12. Done with Feature Engineering ? ○ Feature Store Features ○ nOrder: How large is the order? (basis) ○ nMeal: How busy is the restaurant? (near real-time) ○ prepTime: How quick is the restaurant? (batch feature) ○ nBusy: How busy is the traffic? (external feature) ○ Ready to use ? ○ Model specific feature transformations ○ Chaining of features ○ Feature Transformers
  • 13. Feature Consumption Feature Store Features ○ nOrder: input feature ○ nMeal: consume directly ○ prepTime: needs transformation before use ○ nBusy: input latlong but need regionId Setting up consumption pipelines ○ nMeal: r_id -> nMeal ○ prepTime: r_id -> prepTime -> featureImpute ○ nBusy: r_id -> lat, log -> regionId(lat, log) -> nBusy In arbitrary order
  • 14. Michelangelo Transformers Transformer: Given a record defined a set of fields, add/modify/remove fields in the record PipelineModel: A sequence of transformers Spark ML: Transformer/PipelineModel on DataFrames Michelangelo Transformers: extended transformer framework for both Apache Spark and Apache Spark-less environments Estimator: Analyze the data and produce a transformer
  • 15. Defining a Pipeline Model Join Palette Features Apply Feature Eng Rules String Indexing One-Hot Encoding DL Inferencing Result Retrieval Feature consumption ○ Feature extraction: Palette feature retrieval expressed as a transform ○ Feature mutation: Scala-like DSL for simple transforms ○ Model-centric Feature Engineering: string indexer, one-hot encoder, threshold decision ○ Result retrieval Modeling ○ Model inferencing (also Michelangelo Transformer)
  • 16. Michelangelo Transformers Example class MyModel (override val uid: String) extends Model[MyModel] with MyModelParam with MLWritable with MATransformer { ... override def transform(dataset: Dataset[_]): DataFrame = ... override def scoreInstance(instance: util.Map[String, Object]): util.Map[String, Object] = ... } class MyEstimator(override val uid: String) extends Estimator[MyEstimator] with Params with DefaultParamsWritable { ... override def fit(dataset: Dataset[_]): MyModel = ... }
  • 17. Palette retrieval as a Transformer Palette Feature Transformer Feature Meta Store RPC Feature Proxy Cassandra Access Hive Access tx_p1 = PaletteTransformer([ "@palette:restaurant:realtime_feature:nMeal:r_id", "@palette:restaurant:batch_feature:prepTime:r_id", "@palette:restaurant:property:lat:r_id", "@palette:restaurant:property:log:r_id" ]) tx_p2 = PaletteTransformer([ "@palette:region:service_feature:nBusy:region_id" ])
  • 18. DSL Estimator / Transformer DSL Estimator Code Gen / Compiler DSL Transformer Online classloader Offline classloader es_dsl1 = DSLEstimator(lambdas = [ ["region_id", "regionId(@palette:restaurant:property:lat:r_id, @palette:restaurant:property:r_id"] ]) es_dsl2 = DSLEstimator(lambdas = [ ["prepTime": nFill(nVal("@palette:restaurant:batch_feature:prepTime:r_id"), avg("@palette:restaurant:batch_feature:prepTime:r_id")))"], ["nMeal": nVal("@palette:restaurant:realtime_feature:nMean:r_id")], ["nOrder": nVal("@basis:nOrder")], ["nBusy": nVal("@palette:region:service_feature:nBusy:region_id")] ])
  • 19. Uber Eats Example Cont. Computation order ○ nMeal: rId -> nMeal ○ prepTime: rId -> prepTime -> featureImpute ○ busyScale: rId -> lat, log -> regionId(lat, log) -> busyScale Palette Transformer id -> nMeal id -> prepTime id -> lat, log DSL Transformer lag, log -> regionID Palette Transformer regionID -> nBusy DSL Transformer impute(nMeal) impute(prepTime)
  • 20. Dev Tools: Authoring and Debugging a Pipeline Palette feature generation ● Apache Hive QL, Apache Flink SQL Interactive authoring ● PySpark + iPython Jupyter notebook Centralized model store ● Serialization / Deserialization (Spark ML, MLReadable/Writeable) ● Online and offline accessibility basis_feature_sql = "..." df = spark.sql(basis_feature_sql) pipeline = Pipeline(stages=[tx_p1, es_dsl1, tx_p2, es_dsl2t, vec_asm, l_r) pipeline_model = pipeline.fit(df) scored_def = pipeline_model.transform(df) model_id = MA_store.save_model(pipeline_model) draft_id = MA_store.save_pipeline(basis_feature_sql, pipeline) retrain_job = MA_API.train(draft_id, new_basis_feature_sql)
  • 21. Takeaways Feature Store: Batch, Realtime and External Features with online and offline parity Offline scalability: Joins across billions of rows Online serving latency: Parallel IO, fast storage with caching Feature Transformers: Setup chains of transformations at training/serving time Pipeline reliability and monitoring out-of-the-box