Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenzhong XU | Current 2022

Invisible Interfaces
Zhenzhong Xu (@zhenzhongxu)
Current 22 - Oct, 2022
Considerations for Abstracting Complexities of
a Real-time ML Platform

The discovery of something invisible
Ancient Greek name for
amber: elektron
Thales of Miletus

The endeavor to make it
useful
Ubiquitous
Easy and responsive
Just works!
the invisible interface

About Zhenzhong Xu
● Building real-time ML platform @Claypot
● Real-time Data Infrastructure @ Netﬂix
● Cloud infra @ Microsoft

"There's been an explosion of ML use cases that … don't make sense if they aren't in real
time. More and more people are doing ML in production,
and most cases have to be streamed."
Ali Ghodsi, Databricks CEO
Fraud prevention Personalization Customer support Dynamic pricing
Trending products Risk assessment Robotics Ads
ETA Network analysis Sentiment analysis Object detection
…

AIIA survey (2022) - https://ai-infrastructure.org/ai-infrastructure-ecosystem-report-of-2022/

Data Science
Realtime ML Platform
Data Infrastructure
Exploration &
Research
Model Architecture
& Turning
Model Analysis
& Selection
Ingestion &
Transport
Security &
Governance
Multi-tenancy
Isolation
Data Sources Storage Query & Compute
Business Decision
Optimization
Workﬂow
Orchestration
Analytics /
Visualization

Model Serving
Model Training
Model
Monitoring
Model
Evaluation
Feature
Materialization
Label
Materialization
Data
Monitoring
Data Model Flow
Data Flow
Data Flow
Data Flow
Data Flow
Product
Ecosystem
Analytics
ecosystem

Data Loop Model Loop Challenge/Value
Slow Slow Low freshness, low quality.
Out-of-date models, predictions & trainings with stale data, model
drift results in low model accuracy.
Slow Fast Low freshness, low quality.
Model training is bottlenecked by availability of fresh data. Prediction
latency high or predicted with stale data.
Fast Slow High freshness, low quality.
Fresh data available for predictions, trainings, and observability. Slow
model iteration results in out-of-date model, lower accuracy.
Fast Fast High freshness, high quality.
You want your ML ecosystem to be here.
Combine your data and model loops: why you need both to be fast

Online Customer Service
Use Case Example
● Suggest diagnostic runbook
● Proactive in-the-moment remediation action
● Fraud prevention vs detection

Deﬁne model features
● average transaction amount from past 14
days
● request channel encoding
● text embedding similarity score
Data Scientists

What’s the appropriate level of complexity the
ML platform should expose?
ML Platforms: What’s preventing Ubiquitous?

DWH
(Snowflake / BigQuery / S3)
Predictions
1
Offline batch prediction
● Use cases: churn prediction, user LTV, risk planning, etc.
2
BI
Batch job to
generate predictions
(e.g. Airflow + Spark)

App
DWH
Prediction
requests
Batch job to
generate features
Prediction
service
3
Online prediction with batch features
● Batch features: computed ofﬂine, e.g. product embeddings
● Use cases: recsys
KV store
4
For low-latency online access
Write to
ofﬂine
2
Batch
features
Write to
online
1
2
Joined batch
features

App
DWH
Prediction
requests
Batch job to
generate features
Prediction
service
3
Online prediction with on-demand features
● Batch features: queried from transactional stores, e.g. # orders in the last 30 mins
● Use cases: recsys
KV store
4
For low-latency online access
Write to
ofﬂine
2
Batch
features
Write to
online
1
2
TX store
(eg Postgres,
Cassandra)
Joined
features
4
Transactions

App
DWH
Batch job to
generate features
Prediction
service
3
Online prediction with streaming features
● Online features: computed online,
○ e.g. distance between two locations, count/percentile in the last 30 mins
KV store
Write to
ofﬂine
2
Write to
online
2
Real-time
transport
Logs
Stream feature
extraction
Feature
service
5 4
Batch
features
1
4
Prediction
requests

Combining ofﬂine and online data
Time
DWH
Stream
transaction behavior over
the last 6 months
T-7 days
T-1 day to T-6 month

Combining ofﬂine and online data
Time
DWH
Stream
transaction behavior over
the last 6 months
T-7 days
T-1 day to T-6 month
Backﬁlling
challenge

Backfill in Lambda Architecture
Data Source
In-motion Compute
At-rest Compute
Online
Storage
Offline
Storage
Online Query
(serving)
Mixed Query
(backfill)
Offline Query
(training)

Backfill in Kappa Architecture
Data Source
In-motion Compute
(Backﬁll from historical log)
Materialized
Views
Online Query
(serving)
Ofﬂine Query
(training)
batch transformation
streaming transformation

23
Unified Backfill
Data Source
In-motion Compute
(intelligent backﬁll from dual
sources)
Materialized
Views
Online Query
(serving)
Ofﬂine Query
(training)
DWH backed
logs
Orchestration & Governance

24
Abstracted Unified Backfill
Data Source
In-motion Compute
(intelligent backﬁll from dual
sources)
Materialized
Views
Online Query
(serving)
Ofﬂine Query
(training)
DWH backed
logs
Orchestration & Governance

Build model features
● Should I declare features in SQL or Python?
● How do I join existing intent classification
results to my new feature
● What confidence can I get before checking in
my code?
Data Scientists

Does the ML platform speak the same language
as the users?
ML Platforms: What’s preventing easy and responsive?

Does the ML platform speak the same language as the users?
Questions for ML Platforms:
● Can users express or declare what they need to control in a single
coherent interface?
● Can the platform understand the intent and drive the underlying system?
● Can user and platform communicate interactively, in a timely fashion?
● Can the user understand their options and tradeoffs without reading a
300-pages manual?
● How much integration effort is needed to plug a model into existing data
streams?

Online Prediction: Latency vs. Staleness
Latency
Request Prediction
Feature
computation
Prediction
retrieval
Feature
retrieval
Prediction
computation
Raw
data
Staleness
RT Feature NRT Feature Batch Feature
Staleness No staleness* > secs > hours
Latency Low (10s ms-1s sec) Lower (10s-100s ms) Lower (10s-100s ms)
Footnote: *computation takes time, latency includes the computation time; Feature performance dependent on source technology and shared trafﬁc pattern.

What about tradeoffs?
● Three dimensions!
● Can choose 2!
● Have to be ﬂexible on the 3rd
● Need clean abstractions for full
freedom
Correctness Low cost
Low latency
1. Fast & Correct
2. Cheap & Correct
3. Fast & Cheap
reference: Open Problems in Stream Processing: A Call To Action, Tyler Akidau (2019)

Python vs SQL?
≈
Intermediate representation (IR)
Compute Engines

Don’t invent a new language/DSL!
Evolve existing ones to make it better.

Connector ecosystem is getting more mature.
Nice, but what about event schema and envelope standards?

Deploy model features
● Should I duplicate the feature results in a
different table?
● Which team do I need to inform about the
change?
● Do I need to worry about
training/prediction skew?
Data Scientists

What symptoms are there indicating your
platform is not trusted?
ML Platforms: What doesn’t just work?

What symptoms are there indicating your platform is not trusted?
ML Platforms: What doesn’t just work?
● My freedom and your responsibility
● Producer and consumer tension
● Users are forced to choose between basic requirements

Ofﬂine / Online consistencies
Sharing and reusing
Schema evolution
SWE Practices

You are part of the
endeavor to make
real-time data useful!
● Ubiquitous
● Easy and responsive
● Just works!
https://zhenzhongxu.com/
zhenzhong@claypot.ai
the invisible interface

Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenzhong XU | Current 2022

More Related Content

Similar to Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenzhong XU | Current 2022 (20)

More from HostedbyConfluent (20)

Recently uploaded (20)

Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenzhong XU | Current 2022