Databricks
Unifying Data Science
and Engineering
1
AMPLab funded by tech companies:
The beginnings of Apache Spark
at UC Berkeley
● Got a glimpse at their most impactful internal projects
● They were leveraging massive amounts of data
● Doing high impact machine learning/AI
We wanted to democratize Data + AI
Apache Spark
Streaming
Spark SQL
+DataFrames
MLlib
Machine Learning
GraphX
Graph Computation
Spark Core API
R SQL Python Scala Java
Databricks started in 2013
Bring Apache Spark to the Enterprise
Only 1% of enterprises successful with AI
Data Engineers
Other 99% struggle due to organizational silos
Data Scientists
IT Line of Business
Databricks goal is to unify
data science & engineering
Data is not ready for AI
3 challenges created by data & AI divide
1
2 Data and AI technology silos
3 Data scientists and engineers are in silos
Data is not ready for AI1
Massive data in data lakes
Data Lake
Vision to do AI on that data
Data Lake AI
Data Lake
Data is not ready for AI
AI
Inconsistent Data
Lack of Schema
Slow Performance and Costly
Databricks Delta
Brings data reliability and performance to data lakes
Fast Analytics
+Data Reliability
Blob Storage
Data Lake
Databricks Delta: makes data ready for AI
Data Reliability
Schema Enforcement
ACID Transactions
Query Performance
Very Fast at Scale
Indexing (10-100x Faster)
Reporting
Machine Learning
Alerting
Dashboards
Delta
Data and AI technology silos2
Data & AI technology silos
Great for Data, but not AI Great for AI, but not for data
TFServing TensorBoard
Supporting and Deployment Libraries
Data & AI technology silos
Great for Data, but not AI Great for AI, but not for data
TFServing TensorBoard
Supporting and Deployment Libraries
3 Data scientists and engineers are in silos
Data scientists & engineers are in silos
Data Engineers Data Scientists
Challenging to track and reproduce
experiments
Build Model2
Have to ensure reliability,
SLAs, and quality
Deploy Model3
Data Prep
Hard to make pipelines reliable
1
Databricks MLflow:
unifies data scientists & engineers
Data Engineers Data Scientists
Build reliable data pipelines
Track the datasets
Databricks Delta
Track Experiments
Reproduce experiments
MLflow Project & Tracker
Databricks Runtime for ML
Deploy models in production,
track their quality
MLflow Serving
Data Prep
Deploy Model
Build Model
Databricks MLflow:
unifies data scientists & engineers
Data Engineers Data Scientists
1
3
2
Build reliable data pipelines
Track the datasets
Databricks Delta
Track Experiments
Reproduce experiments
Databricks Runtime for ML
MLflow Project & Tracker
Deploy models in production,
track their quality
MLflow Serving
Data Prep
Deploy Model
Build Model
Databricks MLflow:
unifies data scientists & engineers
Data Engineers Data Scientists
1
3
2
Announcing: time travel
+Delta
Confidential – for Gartner briefing only
Databricks makes AI possible
for the other 99% by
unifying data science and engineering
Databricks Unified Analyitcs
demo by Michael Armbrust

The Power of Unified Analytics with Ali Ghodsi

  • 1.
  • 2.
    AMPLab funded bytech companies: The beginnings of Apache Spark at UC Berkeley ● Got a glimpse at their most impactful internal projects ● They were leveraging massive amounts of data ● Doing high impact machine learning/AI We wanted to democratize Data + AI
  • 3.
    Apache Spark Streaming Spark SQL +DataFrames MLlib MachineLearning GraphX Graph Computation Spark Core API R SQL Python Scala Java
  • 4.
    Databricks started in2013 Bring Apache Spark to the Enterprise
  • 5.
    Only 1% ofenterprises successful with AI
  • 6.
    Data Engineers Other 99%struggle due to organizational silos Data Scientists IT Line of Business
  • 7.
    Databricks goal isto unify data science & engineering
  • 8.
    Data is notready for AI 3 challenges created by data & AI divide 1 2 Data and AI technology silos 3 Data scientists and engineers are in silos
  • 9.
    Data is notready for AI1
  • 10.
    Massive data indata lakes Data Lake
  • 11.
    Vision to doAI on that data Data Lake AI
  • 12.
    Data Lake Data isnot ready for AI AI Inconsistent Data Lack of Schema Slow Performance and Costly
  • 13.
    Databricks Delta Brings datareliability and performance to data lakes Fast Analytics +Data Reliability Blob Storage
  • 14.
    Data Lake Databricks Delta:makes data ready for AI Data Reliability Schema Enforcement ACID Transactions Query Performance Very Fast at Scale Indexing (10-100x Faster) Reporting Machine Learning Alerting Dashboards Delta
  • 15.
    Data and AItechnology silos2
  • 16.
    Data & AItechnology silos Great for Data, but not AI Great for AI, but not for data TFServing TensorBoard Supporting and Deployment Libraries
  • 17.
    Data & AItechnology silos Great for Data, but not AI Great for AI, but not for data TFServing TensorBoard Supporting and Deployment Libraries
  • 18.
    3 Data scientistsand engineers are in silos
  • 19.
    Data scientists &engineers are in silos Data Engineers Data Scientists Challenging to track and reproduce experiments Build Model2 Have to ensure reliability, SLAs, and quality Deploy Model3 Data Prep Hard to make pipelines reliable 1
  • 20.
    Databricks MLflow: unifies datascientists & engineers Data Engineers Data Scientists
  • 21.
    Build reliable datapipelines Track the datasets Databricks Delta Track Experiments Reproduce experiments MLflow Project & Tracker Databricks Runtime for ML Deploy models in production, track their quality MLflow Serving Data Prep Deploy Model Build Model Databricks MLflow: unifies data scientists & engineers Data Engineers Data Scientists 1 3 2
  • 22.
    Build reliable datapipelines Track the datasets Databricks Delta Track Experiments Reproduce experiments Databricks Runtime for ML MLflow Project & Tracker Deploy models in production, track their quality MLflow Serving Data Prep Deploy Model Build Model Databricks MLflow: unifies data scientists & engineers Data Engineers Data Scientists 1 3 2 Announcing: time travel +Delta
  • 23.
    Confidential – forGartner briefing only
  • 25.
    Databricks makes AIpossible for the other 99% by unifying data science and engineering
  • 26.