Daniel Abadi: VLDB 2009 Panel

A Proposed Answer to Phil’s Question: What Does This Say About the Database Field? Daniel Abadi

We’re Addicts Addict (verb): “to devote or surrender (oneself) to something habitually or obsessively” Mounting evidence that relational database technology is unsuitable for Web-scale data management Yet we cling to our RDBMS technology, refusing to acknowledge this evidence Addiction is a very serious matter Puts one at a disadvantage --- we’re being left behind Highest impact research on Web scale data management is being published outside of SIGMOD/VLDB

What should we do? There are lots of resources for addicts Many programs work in steps to help addicts gradually kick the addiction Stepwise programs generally designed for individuals, but straightforward to extend to entire research communities

Step 1: Admit You Have a Problem Case study: Facebook 2.5 petabyte enterprise data warehouse Adding 15TB of new data a day RDBMSs should theoretically scale to this amount of data (esp. Gamma-style parallel DBMSs) They use Hadoop instead But their analysts don’t speak MapReduce! So they allocate a team of superstar developers to build an SQL layer on top of Hadoop -- Hive Entire companies are being started that specialize in using Hadoop to create data warehouses But data warehousing has always been the domain of relational database systems!

Step 2: Believe in a Higher Power Greater Than Yourself The higher power is … Google / systems community MapReduce published in OSDI Dynamo published in SOSP BigTable published in OSDI Dryad published in EuroSys

Step 3: Make a Searching and Fearless Inventory of Yourself People who chose not to use database systems aren’t dumb There must be a reason We’re too expensive Free / open source databases like MySQL/PostgreSQL/Ingres don’t scale out of the box Proprietary solutions price by the TB We’re too hard to use We don’t scale Seriously, we don’t scale Yes, I know we should scale in theory. But in practice we don’t scale. Even the expensive solutions.

Step 4: Admit the Exact Nature of Our Wrongs Admitting all of our wrongs is too overwhelming For now, let’s focus on our wrongs for analytical workloads Parallel databases should be able to scale indefinitely Current implementations have limitations Sometimes caused by first-order effects like hard limits required by various system components More often caused by second-order effects Systems are designed assuming failures are a rare event (not true at scale!) Systems designed assuming each node has predictable performance (not true at scale!)

Step 5: Remove Our Shortcomings Need more focus on fault tolerant systems research Need more focus on runtime scheduling Need better parallelization of UDFs Need to convince one of the parallel DBMS upstarts to release their code open source

Bottom Line Additions are hard to kick Need to work hard to remove our shortcomings Need to reclaim our leadership in the data management arena

Daniel Abadi: VLDB 2009 Panel

More Related Content

What's hot (20)

Viewers also liked (7)

Similar to Daniel Abadi: VLDB 2009 Panel (20)

Recently uploaded (20)

Daniel Abadi: VLDB 2009 Panel