SlideShare a Scribd company logo
A Proposed Answer to Phil’s Question: What Does This Say About the Database Field? Daniel Abadi
We’re Addicts Addict (verb): “to devote or surrender (oneself) to something habitually or obsessively” Mounting evidence that relational database technology is unsuitable for Web-scale data management Yet we cling to our RDBMS technology, refusing to acknowledge this evidence Addiction is a very serious matter Puts one at a disadvantage --- we’re being  left behind Highest impact research on Web scale data management is being published outside of SIGMOD/VLDB
What should we do? There are lots of resources for addicts Many programs work in steps to help addicts gradually kick the addiction Stepwise programs generally designed for individuals, but straightforward to extend to entire research communities
Step 1: Admit You Have a Problem Case study: Facebook 2.5 petabyte enterprise data warehouse Adding 15TB of new data a day RDBMSs should theoretically scale to this amount of data (esp. Gamma-style parallel DBMSs) They use Hadoop instead But their analysts don’t speak MapReduce! So they allocate a team of superstar developers to build an SQL layer on top of Hadoop -- Hive Entire companies are being started that specialize in using Hadoop to create data warehouses But data warehousing has always been the domain of relational database systems!
Step 2: Believe in a Higher Power Greater Than Yourself The higher power is … Google / systems community MapReduce published in OSDI Dynamo published in SOSP  BigTable published in OSDI Dryad published in EuroSys
Step 3: Make a Searching and Fearless Inventory of Yourself People who chose not to use database systems aren’t dumb There must be a reason We’re too expensive  Free / open source databases like MySQL/PostgreSQL/Ingres don’t scale out of the box Proprietary solutions price by the TB We’re too hard to use We don’t scale Seriously, we don’t scale Yes, I know we should scale in theory. But in practice we don’t scale. Even the expensive solutions.
Step 4: Admit the Exact Nature of Our Wrongs Admitting all of our wrongs is too overwhelming For now, let’s focus on our wrongs for analytical workloads Parallel databases should be able to scale indefinitely Current implementations have limitations Sometimes caused by first-order effects like hard limits required by various system components More often caused by second-order effects Systems are designed assuming failures are a rare event (not true at scale!) Systems designed assuming each node has predictable performance (not true at scale!)
Step 5: Remove Our Shortcomings Need more focus on fault tolerant systems research Need more focus on runtime scheduling Need better parallelization of UDFs Need to convince one of the parallel DBMS upstarts to release their code open source
Bottom Line Additions are hard to kick Need to work hard to remove our shortcomings Need to reclaim our leadership in the data management arena

More Related Content

PPT
Daniel Abadi HadoopWorld 2010
Daniel Abadi
 
PPTX
From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...
Daniel Abadi
 
PPTX
Beckman abadi-5min-pres
Daniel Abadi
 
PPTX
Hadoop and Graph Data Management: Challenges and Opportunities
Daniel Abadi
 
PDF
Shared slides-edbt-keynote-03-19-13
Daniel Abadi
 
PPT
Boston Hadoop Meetup, April 26 2012
Daniel Abadi
 
PPTX
SQL-on-Hadoop Tutorial
Daniel Abadi
 
PPT
Presentation on Hadoop Technology
OpenDev
 
Daniel Abadi HadoopWorld 2010
Daniel Abadi
 
From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...
Daniel Abadi
 
Beckman abadi-5min-pres
Daniel Abadi
 
Hadoop and Graph Data Management: Challenges and Opportunities
Daniel Abadi
 
Shared slides-edbt-keynote-03-19-13
Daniel Abadi
 
Boston Hadoop Meetup, April 26 2012
Daniel Abadi
 
SQL-on-Hadoop Tutorial
Daniel Abadi
 
Presentation on Hadoop Technology
OpenDev
 

What's hot (20)

PPTX
Jstorm introduction-0.9.6
longda feng
 
PPTX
Hadoop and Big Data
Harshdeep Kaur
 
PPTX
Hadoop Tutorial For Beginners
Dataflair Web Services Pvt Ltd
 
PPTX
Big Data & Hadoop Tutorial
Edureka!
 
PPTX
Hadoop introduction
musrath mohammad
 
PPTX
Big data Hadoop presentation
Shivanee garg
 
PDF
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Mahantesh Angadi
 
PDF
Why Talend for Big Data?
Edureka!
 
PPTX
Big Data Introduction
Durga Gadiraju
 
PPTX
Whatisbigdataandwhylearnhadoop
Edureka!
 
PDF
Seminar_Report_hadoop
Varun Narang
 
PPTX
Big Data and Hadoop
Flavio Vit
 
PPTX
Hadoop and big data
Sharad Pandey
 
PPTX
عصر کلان داده، چرا و چگونه؟
datastack
 
DOCX
Hadoop technology doc
tipanagiriharika
 
PDF
Introduction to Bigdata and HADOOP
vinoth kumar
 
PPTX
Big data concepts
Serkan Özal
 
PDF
Apache Hadoop - Big Data Engineering
BADR
 
PPTX
Apache Hadoop
Ajit Koti
 
Jstorm introduction-0.9.6
longda feng
 
Hadoop and Big Data
Harshdeep Kaur
 
Hadoop Tutorial For Beginners
Dataflair Web Services Pvt Ltd
 
Big Data & Hadoop Tutorial
Edureka!
 
Hadoop introduction
musrath mohammad
 
Big data Hadoop presentation
Shivanee garg
 
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Mahantesh Angadi
 
Why Talend for Big Data?
Edureka!
 
Big Data Introduction
Durga Gadiraju
 
Whatisbigdataandwhylearnhadoop
Edureka!
 
Seminar_Report_hadoop
Varun Narang
 
Big Data and Hadoop
Flavio Vit
 
Hadoop and big data
Sharad Pandey
 
عصر کلان داده، چرا و چگونه؟
datastack
 
Hadoop technology doc
tipanagiriharika
 
Introduction to Bigdata and HADOOP
vinoth kumar
 
Big data concepts
Serkan Özal
 
Apache Hadoop - Big Data Engineering
BADR
 
Apache Hadoop
Ajit Koti
 
Ad

Viewers also liked (7)

PDF
Invisible loading
Daniel Abadi
 
PPTX
Leopard: Lightweight Partitioning and Replication for Dynamic Graphs
Daniel Abadi
 
PDF
Consistency Tradeoffs in Modern Distributed Database System Design
Arinto Murdopo
 
PDF
VLDB 2009 Tutorial on Column-Stores
Daniel Abadi
 
PPTX
The Power of Determinism in Database Systems
Daniel Abadi
 
PPT
CAP, PACELC, and Determinism
Daniel Abadi
 
PPT
Column-Stores vs. Row-Stores: How Different are they Really?
Daniel Abadi
 
Invisible loading
Daniel Abadi
 
Leopard: Lightweight Partitioning and Replication for Dynamic Graphs
Daniel Abadi
 
Consistency Tradeoffs in Modern Distributed Database System Design
Arinto Murdopo
 
VLDB 2009 Tutorial on Column-Stores
Daniel Abadi
 
The Power of Determinism in Database Systems
Daniel Abadi
 
CAP, PACELC, and Determinism
Daniel Abadi
 
Column-Stores vs. Row-Stores: How Different are they Really?
Daniel Abadi
 
Ad

Similar to Daniel Abadi: VLDB 2009 Panel (20)

PPTX
Data science unit2
varshakumar21
 
PPTX
data science chapter-4,5,6
varshakumar21
 
PPTX
2014 aus-agta
c.titus.brown
 
PPTX
Data Management - Basic Concepts
Sr Edith Bogue
 
PDF
Hadoop hdfs interview questions
Kalyan Hadoop
 
PPTX
Building an Open Source Staff-Facing Tablet App for Library Assessment
Jason Casden
 
PPT
ccna course 2
S Sridhar
 
PDF
Everything Has Changed Except Us: Modernizing the Data Warehouse
mark madsen
 
DOCX
Big data abstract
nandhiniarumugam619
 
PPTX
Data Mining and Data Warehouse
Anupam Sharma
 
PPTX
big data and machine learning ppt.pptx
NATASHABANO
 
PDF
Lesson 1 introduction to_big_data_and_hadoop.pptx
Pankajkumar496281
 
PDF
Nuts and bolts
NBER
 
PDF
Everything has changed except us
mark madsen
 
PDF
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera, Inc.
 
PPTX
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Scott Mitchell
 
PPT
Big Data
NGDATA
 
PPT
DBA Best Practices.ppt
RamaKrishna320476
 
PDF
Research paper on big data and hadoop
Shree M.L.Kakadiya MCA mahila college, Amreli
 
PPTX
Metric Abuse: Frequently Misused Metrics in Oracle
Steve Karam
 
Data science unit2
varshakumar21
 
data science chapter-4,5,6
varshakumar21
 
2014 aus-agta
c.titus.brown
 
Data Management - Basic Concepts
Sr Edith Bogue
 
Hadoop hdfs interview questions
Kalyan Hadoop
 
Building an Open Source Staff-Facing Tablet App for Library Assessment
Jason Casden
 
ccna course 2
S Sridhar
 
Everything Has Changed Except Us: Modernizing the Data Warehouse
mark madsen
 
Big data abstract
nandhiniarumugam619
 
Data Mining and Data Warehouse
Anupam Sharma
 
big data and machine learning ppt.pptx
NATASHABANO
 
Lesson 1 introduction to_big_data_and_hadoop.pptx
Pankajkumar496281
 
Nuts and bolts
NBER
 
Everything has changed except us
mark madsen
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera, Inc.
 
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Scott Mitchell
 
Big Data
NGDATA
 
DBA Best Practices.ppt
RamaKrishna320476
 
Research paper on big data and hadoop
Shree M.L.Kakadiya MCA mahila college, Amreli
 
Metric Abuse: Frequently Misused Metrics in Oracle
Steve Karam
 

Recently uploaded (20)

PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PPTX
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
PPTX
The Future of AI & Machine Learning.pptx
pritsen4700
 
PDF
Software Development Methodologies in 2025
KodekX
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PDF
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PDF
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
PPTX
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PDF
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
The Future of AI & Machine Learning.pptx
pritsen4700
 
Software Development Methodologies in 2025
KodekX
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 

Daniel Abadi: VLDB 2009 Panel

  • 1. A Proposed Answer to Phil’s Question: What Does This Say About the Database Field? Daniel Abadi
  • 2. We’re Addicts Addict (verb): “to devote or surrender (oneself) to something habitually or obsessively” Mounting evidence that relational database technology is unsuitable for Web-scale data management Yet we cling to our RDBMS technology, refusing to acknowledge this evidence Addiction is a very serious matter Puts one at a disadvantage --- we’re being left behind Highest impact research on Web scale data management is being published outside of SIGMOD/VLDB
  • 3. What should we do? There are lots of resources for addicts Many programs work in steps to help addicts gradually kick the addiction Stepwise programs generally designed for individuals, but straightforward to extend to entire research communities
  • 4. Step 1: Admit You Have a Problem Case study: Facebook 2.5 petabyte enterprise data warehouse Adding 15TB of new data a day RDBMSs should theoretically scale to this amount of data (esp. Gamma-style parallel DBMSs) They use Hadoop instead But their analysts don’t speak MapReduce! So they allocate a team of superstar developers to build an SQL layer on top of Hadoop -- Hive Entire companies are being started that specialize in using Hadoop to create data warehouses But data warehousing has always been the domain of relational database systems!
  • 5. Step 2: Believe in a Higher Power Greater Than Yourself The higher power is … Google / systems community MapReduce published in OSDI Dynamo published in SOSP BigTable published in OSDI Dryad published in EuroSys
  • 6. Step 3: Make a Searching and Fearless Inventory of Yourself People who chose not to use database systems aren’t dumb There must be a reason We’re too expensive Free / open source databases like MySQL/PostgreSQL/Ingres don’t scale out of the box Proprietary solutions price by the TB We’re too hard to use We don’t scale Seriously, we don’t scale Yes, I know we should scale in theory. But in practice we don’t scale. Even the expensive solutions.
  • 7. Step 4: Admit the Exact Nature of Our Wrongs Admitting all of our wrongs is too overwhelming For now, let’s focus on our wrongs for analytical workloads Parallel databases should be able to scale indefinitely Current implementations have limitations Sometimes caused by first-order effects like hard limits required by various system components More often caused by second-order effects Systems are designed assuming failures are a rare event (not true at scale!) Systems designed assuming each node has predictable performance (not true at scale!)
  • 8. Step 5: Remove Our Shortcomings Need more focus on fault tolerant systems research Need more focus on runtime scheduling Need better parallelization of UDFs Need to convince one of the parallel DBMS upstarts to release their code open source
  • 9. Bottom Line Additions are hard to kick Need to work hard to remove our shortcomings Need to reclaim our leadership in the data management arena