SlideShare a Scribd company logo
Dremel: Interactive Analysis of Web-
Scale Datasets
 Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer,
 Shiva Shivakumar, Matt Tolton, Theo Vassilakis
 Google, Inc.
 VLDB, 2010

 Presentation by ensky (enskylin@gmail.com)
Outline
 Introduction
 Nested columnar storage
 Query processing
 Experiments
 Conclusion




                            2
Introduction
   Dremel is an query system
    For analysis of read-only nested data

   Use case

        Interactive                 Trends
           Tools                   Detection


             Web     Spam         Network
          Dashboards             Optimization
                                            3
DocId: 10
                                           Links


           Features
                                             Forward: 20
                                           Name
                                             Language
                                               Code: 'en-us'
                                               Country: 'us'
             Multi-level structure          Url: 'http://A'
                                           Name

             SQL-like query language
                                             Url: 'http://B'



             Fast: Trillion-row tables in second level
             Big scale: thousands of CPUs and
              petabytes of data
             Widely used by google: in production
              since 2006
SELECT A, COUNT(B) FROM T
GROUP BY A
T = {/gfs/1, /gfs/2, …, /gfs/100000}
                                                               4
Contribution
   This paper presented two major
    technique in Dremel:

   Nested columnar storage
    ◦ store / split into columns
    ◦ Assembly to record
   Query
    ◦ Language
    ◦ Execution
                                     5
Outline
 Introduction
 Nested columnar storage
 Query processing
 Experiments
 Conclusion




                            6
Why nested?
   Flexible
    ◦ Data in web and scientific is often non-
      relational
   Reduce cost
    ◦ Normalizing and recombining nested data
      is often prohibited in TB, PB scale of data.




                                                     7
Why column?
                  DocId: 10
                  Links
                                           column-oriented
                    Forward: 20
                  Name                              A
                                                  *    *
                                                    ...
                    Language
                      Code: 'en-us'
                      Country: 'us'             B          E
                    Url: 'http://A'                   *
                                               C           D
                                                               r1
                  Name
                    Url: 'http://B'

                                          r1
 r1                                                   r1
                                          r2                   r2
 r2                                   Read less,      r2
                                      cheaper
 Record-oriented                      decompression

Challenge: preserve structure, reconstruct from a subset of fields
                                                                 8
DocId: 10
                    Nested data model
Links
  Forward: 20        message Document {
  Forward: 40
  Forward: 60          required int64 DocId;           [1,1]
Name                   optional group Links {
  Language
    Code: 'en-us'
                         repeated int64 Backward;      [0,*]
    Country: 'us'        repeated int64 Forward;
  Language             }
    Code: 'en'
  Url: 'http://A'      repeated group Name {
Name                     repeated group Language {
  Url: 'http://B'
Name                       required string Code;
  Language                 optional string Country;    [0,1]
    Code: 'en-gb'
    Country: 'gb'
                         }
                         optional string Url;
DocId: 20              }
Links                }
  Backward: 10
  Backward: 30
  Forward: 80
Name
                     http://code.google.com/apis/protocolbuffers
  Url: 'http://C'                                                  9
Column-striped representation
       DocId            Name.Url                Links.Forward        Links.Backward
       value    r   d   value       r   d       value   r   d        value    r   d
        10     0    0   http://A    0   2        20     0   2        NULL     0   1
        20     0    0   http://B    1   2        40     1   2            10   0   2

DocId: 10
                         NULL       1   1        60     1   2            30   1   2
Links                   http://C    0   2        80     0   2
  Forward: 20
  Forward: 40
  Forward: 60
Name
  Language                 Name.Language.Code               Name.Language.Country
    Code: 'en-us'
    Country: 'us'           value       r   d               value    r    d
  Language                  en-us       0   2                   us   0    3
    Code: 'en'
  Url: 'http://A'            en         2   2                NULL    2    2
Name
  Url: 'http://B'           NULL        1   1                NULL    1    1
Name
  Language                  en-gb       1   2                   gb   1    3
    Code: 'en-gb'           NULL        0   1                NULL    0    1
    Country: 'gb'                                                                     10
Column-oriented problem
   When query sub-tree, there is no way for
    you to know the whole structure(even the
    position of yourself)             A
                                           *        *
                                       B        ...          E
   To solve this problem,                 *
    this paper presents            C            D
    repetition & definition
                                                        r1
                              r1
                                           r1
                              r2                        r2
           Where am I?
                                           r2
                                                                 11
Repetition and definition levels
                                                                        r
                                                        DocId: 10
                                                        Links             1
                                                          Forward: 20
                                                          Forward: 40
    Name.Language.Code                                    Forward: 60
     value    r   d                                     Name
                             First time, repeat=0         Language
     en-us    0   2                                         Code: 'en-us'
                          Language repeat, level = 2        Country: 'us'
       en     2   2                                       Language
                                                            Code: 'en'
     NULL     1   1                                       Url: 'http://A'
     en-gb    1   2                                     Name
                                                          Url: 'http://B'
     NULL     0   1                                     Name
                                                          Language
                                                            Code: 'en-gb'
                               None repeat, level = 0       Country: 'gb'


                                                        DocId: 20
                                                                         2  r
                                                        Links
                                                          Backward: 10
                                                          Backward: 30
    r: At what repeated field in the field's path         Forward: 80
       the value has repeated                           Name
                                                          Url: 'http://C'12
Repetition and definition levels
                                                                       r
                                                       DocId: 10
                                                       Links             1
                                                         Forward: 20
                                                         Forward: 40
    Name.Language.Country                                Forward: 60
     value    r   d                                    Name
                                                         Language
       us     0   3                                        Code: 'en-us'
                                                           Country: 'us'
      NULL    2   2                                      Language
                                                           Code: 'en'
      NULL    1   1                                      Url: 'http://A'
       gb     1   3                                    Name
                                                         Url: 'http://B'
      NULL    0   1                                    Name
                                                         Language
                                                           Code: 'en-gb'
                                                           Country: 'gb'


                                                       DocId: 20
                                                                        2  r
                                                       Links
                                                         Backward: 10
                                                         Backward: 30
    d: How many fields in paths that could be            Forward: 80
                                                       Name
       undefined (opt. or rep.) are actually present     Url: 'http://C'13
Column-oriented is all right,
but what if I still need record-
oriented query?

                  (e.g., MapReduce)


                                      14
Record assembly FSM
                                   Transitions labeled with
                                   repetition levels
                         DocId
                    0
                            0                     1
     1     Links.Backward        Links.Forward
                            0

                         0,1,2
    Name.Language.Code           Name.Language.Country
                            2
                         Name.Ur         0,1
                1
                             l
                           0



For record-oriented data processing (e.g., MapReduce)
                                                              15
Reading two fields

                                DocId: 10      s 1
                                Name
                                  Language
                DocId               Country: 'us'
                                  Language
                0               Name
                                Name
 1,2    Name.Language.Country     Language
                0                   Country: 'gb'

                                DocId: 20      s2
                                Name



Structure of parent fields is preserved.
Useful for queries like /Name[3]/Language[1]/Country

                                                       16
Outline
 Introduction
 Nested columnar storage
 Query processing
 Experiments
 Conclusion




                            17
Query language
 Based on SQL
 Performs
    ◦   Projection
    ◦   Selection
    ◦   Nested subqueries
    ◦   inner and intra-record aggregation
    ◦   Top-k
    ◦   Joins
    ◦   User-defined functions

                                             18
DocId: 10
                                                  Links
                                                    Forward: 20
                                                    Forward: 40

          Example usage                             Forward: 60
                                                  Name
                                                    Language
                                                      Code: 'en-us'
                                                      Country: 'us'
SELECT DocId AS Id,                                 Language
                                                      Code: 'en'
  COUNT(Name.Language.Code) WITHIN Name AS Cnt,     Url: 'http://A'
                                                  Name
  Name.Url + ',' + Name.Language.Code AS Str        Url: 'http://B'
                                                  Name
FROM t                                              Language
WHERE REGEXP(Name.Url, '^http') AND DocId < 20;       Code: 'en-gb'
                                                      Country: 'gb'



Output table                 Output schema
Id: 10                 t1    message QueryResult {
Name                           required int64 Id;
  Cnt: 2                       repeated group Name {
  Language                       optional uint64 Cnt;
    Str: 'http://A,en-us'        repeated group Language {
    Str: 'http://A,en'             optional string Str;
Name                             }
  Cnt: 0                       }
                             }
                                                                19
Query execution
                   client
                                     • Parallelizes scheduling
root server                            and aggregation
                                     • Fault tolerance
intermediate
                            ...      • query dispatcher can
servers
                                       dispatch servers

leaf servers       ...
(with local                  ...
 storage)


         storage layer (e.g., GFS)                               20
Example: count()
      SELECT A, COUNT(B) FROM T GROUP          SELECT A, SUM(c)
0     BY A                                     FROM (R11 UNION ALL R110)
      T = {/gfs/1, /gfs/2, …, /gfs/100000}     GROUP BY A




      R11                                R12
      SELECT A, COUNT(B) AS c            SELECT A, COUNT(B) AS c
1     FROM T11 GROUP BY A                FROM T12 GROUP BY A
      T11 = {/gfs/1, …, /gfs/10000}      T12 = {/gfs/10001, …, /gfs/20000}

...
      SELECT A, COUNT(B) AS c
3     FROM T31 GROUP BY A             ...
      T31 = {/gfs/1}

                Data access ops                                              21
Outline
 Introduction
 Nested columnar storage
 Query processing
 Experiments
 Conclusion




                            22
Experiment Data
• 1 PB of real data
  (uncompressed, non-replicated)
• 100K-800K tablets per table
• Experiments run during business hours

Table   Number of       Size (unrepl.,   Number       Data     Repl.
name    records         compressed)      of fields    center   factor
  T1      85 billion             87 TB          270     A        3×
  T2      24 billion             13 TB          530     A        3×
  T3        4 billion            70 TB         1200     A        3×
  T4      1+ trillion          105 TB            50     B        3×
  T5      1+ trillion            20 TB           30     B        2×
                                                                        23
Column v.s. Record"cold" time on local disk,
                                  time (sec)                            averaged over 30 runs
                                                                                 (e) parse as
                   from records                                                      C++ objects
10x speedup
using columnar                                       objects
storage                                                                          (d) read +
                                                                                     decompress
                                                               records
                                                                                 (c) parse as
                   from columns




                                           columns
                                                                                     C++ objects
                                                                                 (b) assemble
2-4x overhead of                                                                      records
using records                                                                    (a) read +
                                                                                     decompress

                                                     number of fields


            Table partition: 375 MB (compressed), 300K rows, 125 columns 24
MR and Dremel execution
 Avg # of terms in txtField in 85 billion record table T1
     execution time (sec) on 3000 nodes
                                             Sawzall program ran on MR:

                                             num_recs: table sum of int;
                                             num_words: table sum of int;
                                             emit num_recs <- 1;
                                             emit num_words <-

                                             count_words(input.txtField);

          87 TB        0.5 TB       0.5 TB



Q1: SELECT SUM(count_words(txtField)) / COUNT(*)
    FROM T1

MR overheads: launch jobs, schedule 0.5M tasks, assemble record
                                                           25
Impact of serving tree depth
      execution time (sec)




       (returns 100s of records)   (returns 1M records)


Q2:   SELECT country, SUM(item.amount) FROM T2
      GROUP BY country

Q3:   SELECT domain, SUM(item.amount) FROM T2
      WHERE domain CONTAINS ’.net’
      GROUP BY domain
                                   40 billion nested items26
Scalability
     execution time (sec)




                                        number of
                                        leaf servers




Q5 on a trillion-row table T4:
     SELECT TOP(aid, 20), COUNT(*) FROM T4
                                                 27
Outline
 Introduction
 Nested columnar storage
 Query processing
 Experiments
 Conclusion




                            28
Observation
                          Monthly query workload
                          of one 3000-node
percentage of queries     Dremel instance




                                          execution
                                          time (sec)




     Most queries complete under 10 sec                29
Conclusion
 Dremel is an query system
  For analysis of read-only nested data
 Main feature is fast(interactive response
  time), column-oriented, SQL-like Query
  language
 Introduced two major method:
    ◦ Nested columnar storage – in order to solve
      partial query problem.
    ◦ Query processing – parallel processing &
      distributing, decompressing queries


                                                    30
Comments
 Google is awesome!
 Pros
    ◦ Nested storage gives us more flexibility.
    ◦ repetition & definition is an novel idea
      and it can solve the locality problem
      easily.
    ◦ Distributed serving tree is awesome,
      faster than MR.



                                                  31
Comments
   Cons
    ◦ Read-only may not fit every requirement
    ◦ Dremel is not a database, so you’ll need
      to convert your real data into dremel
      when analyzing
      Converting may cost lost of time and space
      Google doesn’t care this problem, they have
       GFS and many servers.




                                                     32
Thank you for listening.




                           33

More Related Content

PPT
Dremel: Interactive Analysis of Web-Scale Datasets
robertlz
 
PPTX
Dremel interactive analysis of web scale datasets
Carl Lu
 
PPTX
Query Compilation in Impala
Cloudera, Inc.
 
PPTX
Introduction to Graph Databases
Max De Marzi
 
PDF
Dremel Paper Review
Arinto Murdopo
 
PPTX
String matching algorithms
Ashikapokiya12345
 
PPTX
Bloom filters
Devesh Maru
 
PPTX
Backtracking
subhradeep mitra
 
Dremel: Interactive Analysis of Web-Scale Datasets
robertlz
 
Dremel interactive analysis of web scale datasets
Carl Lu
 
Query Compilation in Impala
Cloudera, Inc.
 
Introduction to Graph Databases
Max De Marzi
 
Dremel Paper Review
Arinto Murdopo
 
String matching algorithms
Ashikapokiya12345
 
Bloom filters
Devesh Maru
 
Backtracking
subhradeep mitra
 

What's hot (20)

PDF
Binary Search Tree
International Islamic University
 
ODP
Google's Dremel
Maria Stylianou
 
PPTX
Introduction to Scala
Mohammad Hossein Rimaz
 
PDF
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Cloudera, Inc.
 
PDF
NoSQL databases
Marin Dimitrov
 
PPTX
SQL, Embedded SQL, Dynamic SQL and SQLJ
Dharita Chokshi
 
PPT
Red black tree
Rajendran
 
PPT
Satisfiability
Jim Kukula
 
PDF
Advanced data structures vol. 1
Christalin Nelson
 
PDF
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Databricks
 
ODP
Presto
Knoldus Inc.
 
PDF
Ericas-CWNA-Study-Guide
Erica StJohn
 
PDF
All pairs shortest path algorithm
Srikrishnan Suresh
 
PDF
Bloom filter
Hamid Feizabadi
 
PDF
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Databricks
 
PPTX
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Simplilearn
 
PPTX
Spark architecture
GauravBiswas9
 
PPT
Server Side Technologies
tawi123
 
PPTX
Apache hive introduction
Mahmood Reza Esmaili Zand
 
PDF
Introduction to Map-Reduce
Brendan Tierney
 
Google's Dremel
Maria Stylianou
 
Introduction to Scala
Mohammad Hossein Rimaz
 
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Cloudera, Inc.
 
NoSQL databases
Marin Dimitrov
 
SQL, Embedded SQL, Dynamic SQL and SQLJ
Dharita Chokshi
 
Red black tree
Rajendran
 
Satisfiability
Jim Kukula
 
Advanced data structures vol. 1
Christalin Nelson
 
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Databricks
 
Presto
Knoldus Inc.
 
Ericas-CWNA-Study-Guide
Erica StJohn
 
All pairs shortest path algorithm
Srikrishnan Suresh
 
Bloom filter
Hamid Feizabadi
 
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Databricks
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Simplilearn
 
Spark architecture
GauravBiswas9
 
Server Side Technologies
tawi123
 
Apache hive introduction
Mahmood Reza Esmaili Zand
 
Introduction to Map-Reduce
Brendan Tierney
 
Ad

Similar to Dremel: interactive analysis of web-scale datasets (20)

PDF
Apache parquet - Apache big data North America 2017
techmaddy
 
PPTX
Understanding hdfs
Thirunavukkarasu Ps
 
PDF
Code as Data workshop: Using source{d} Engine to extract insights from git re...
source{d}
 
PDF
C Package 100 Knock 1 Hour Mastery Series 2024 Edition text version Tenko
ifwfkvr8684
 
PDF
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon
 
PDF
Seminar_Final
Cheng Zhang
 
PDF
Polygot persistence for Java Developers - August 2011 / @Oakjug
Chris Richardson
 
PDF
Programming Design Guidelines
intuitiv.de
 
PDF
Network and DNS Vulnerabilities
n|u - The Open Security Community
 
PDF
Schema on read is obsolete. Welcome metaprogramming..pdf
Lars Albertsson
 
PPTX
Using RDA for Archives and Manuscripts
Adrienne Pruitt
 
PDF
Practical Implementation of Space-Efficient Dynamic Keyword Dictionaries
Shunsuke Kanda
 
PDF
Apache thrift-RPC service cross languages
Jimmy Lai
 
KEY
Keg.io
Heroku
 
PPT
Chado-XML
Chris Mungall
 
PDF
Polyglot persistence for Java developers - moving out of the relational comfo...
Chris Richardson
 
PPTX
Hadoop architecture meetup
vmoorthy
 
PDF
C++primer
leonlongli
 
PPTX
Avro
Eric Turcotte
 
PDF
2011 09-pdfjs
Julian Viereck
 
Apache parquet - Apache big data North America 2017
techmaddy
 
Understanding hdfs
Thirunavukkarasu Ps
 
Code as Data workshop: Using source{d} Engine to extract insights from git re...
source{d}
 
C Package 100 Knock 1 Hour Mastery Series 2024 Edition text version Tenko
ifwfkvr8684
 
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon
 
Seminar_Final
Cheng Zhang
 
Polygot persistence for Java Developers - August 2011 / @Oakjug
Chris Richardson
 
Programming Design Guidelines
intuitiv.de
 
Network and DNS Vulnerabilities
n|u - The Open Security Community
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Lars Albertsson
 
Using RDA for Archives and Manuscripts
Adrienne Pruitt
 
Practical Implementation of Space-Efficient Dynamic Keyword Dictionaries
Shunsuke Kanda
 
Apache thrift-RPC service cross languages
Jimmy Lai
 
Keg.io
Heroku
 
Chado-XML
Chris Mungall
 
Polyglot persistence for Java developers - moving out of the relational comfo...
Chris Richardson
 
Hadoop architecture meetup
vmoorthy
 
C++primer
leonlongli
 
2011 09-pdfjs
Julian Viereck
 
Ad

More from Hung-yu Lin (11)

PDF
2014 database - course 2 - php
Hung-yu Lin
 
PDF
2014 database - course 3 - PHP and MySQL
Hung-yu Lin
 
PDF
2014 database - course 1 - www introduction
Hung-yu Lin
 
PDF
OpenWebSchool - 11 - CodeIgniter
Hung-yu Lin
 
PDF
OpenWebSchool - 06 - PHP + MySQL
Hung-yu Lin
 
PDF
OpenWebSchool - 05 - MySQL
Hung-yu Lin
 
PDF
OpenWebSchool - 02 - PHP Part I
Hung-yu Lin
 
PDF
OpenWebSchool - 01 - WWW Intro
Hung-yu Lin
 
PDF
OpenWebSchool - 03 - PHP Part II
Hung-yu Lin
 
PDF
Google App Engine
Hung-yu Lin
 
PDF
Redis
Hung-yu Lin
 
2014 database - course 2 - php
Hung-yu Lin
 
2014 database - course 3 - PHP and MySQL
Hung-yu Lin
 
2014 database - course 1 - www introduction
Hung-yu Lin
 
OpenWebSchool - 11 - CodeIgniter
Hung-yu Lin
 
OpenWebSchool - 06 - PHP + MySQL
Hung-yu Lin
 
OpenWebSchool - 05 - MySQL
Hung-yu Lin
 
OpenWebSchool - 02 - PHP Part I
Hung-yu Lin
 
OpenWebSchool - 01 - WWW Intro
Hung-yu Lin
 
OpenWebSchool - 03 - PHP Part II
Hung-yu Lin
 
Google App Engine
Hung-yu Lin
 

Recently uploaded (20)

PDF
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
PDF
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PDF
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PDF
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
The Future of Artificial Intelligence (AI)
Mukul
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 

Dremel: interactive analysis of web-scale datasets

  • 1. Dremel: Interactive Analysis of Web- Scale Datasets Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, Theo Vassilakis Google, Inc. VLDB, 2010 Presentation by ensky ([email protected])
  • 2. Outline  Introduction  Nested columnar storage  Query processing  Experiments  Conclusion 2
  • 3. Introduction  Dremel is an query system For analysis of read-only nested data  Use case Interactive Trends Tools Detection Web Spam Network Dashboards Optimization 3
  • 4. DocId: 10 Links Features Forward: 20 Name Language Code: 'en-us' Country: 'us'  Multi-level structure Url: 'http://A' Name  SQL-like query language Url: 'http://B'  Fast: Trillion-row tables in second level  Big scale: thousands of CPUs and petabytes of data  Widely used by google: in production since 2006 SELECT A, COUNT(B) FROM T GROUP BY A T = {/gfs/1, /gfs/2, …, /gfs/100000} 4
  • 5. Contribution  This paper presented two major technique in Dremel:  Nested columnar storage ◦ store / split into columns ◦ Assembly to record  Query ◦ Language ◦ Execution 5
  • 6. Outline  Introduction  Nested columnar storage  Query processing  Experiments  Conclusion 6
  • 7. Why nested?  Flexible ◦ Data in web and scientific is often non- relational  Reduce cost ◦ Normalizing and recombining nested data is often prohibited in TB, PB scale of data. 7
  • 8. Why column? DocId: 10 Links column-oriented Forward: 20 Name A * * ... Language Code: 'en-us' Country: 'us' B E Url: 'http://A' * C D r1 Name Url: 'http://B' r1 r1 r1 r2 r2 r2 Read less, r2 cheaper Record-oriented decompression Challenge: preserve structure, reconstruct from a subset of fields 8
  • 9. DocId: 10 Nested data model Links Forward: 20 message Document { Forward: 40 Forward: 60 required int64 DocId; [1,1] Name optional group Links { Language Code: 'en-us' repeated int64 Backward; [0,*] Country: 'us' repeated int64 Forward; Language } Code: 'en' Url: 'http://A' repeated group Name { Name repeated group Language { Url: 'http://B' Name required string Code; Language optional string Country; [0,1] Code: 'en-gb' Country: 'gb' } optional string Url; DocId: 20 } Links } Backward: 10 Backward: 30 Forward: 80 Name http://code.google.com/apis/protocolbuffers Url: 'http://C' 9
  • 10. Column-striped representation DocId Name.Url Links.Forward Links.Backward value r d value r d value r d value r d 10 0 0 http://A 0 2 20 0 2 NULL 0 1 20 0 0 http://B 1 2 40 1 2 10 0 2 DocId: 10 NULL 1 1 60 1 2 30 1 2 Links http://C 0 2 80 0 2 Forward: 20 Forward: 40 Forward: 60 Name Language Name.Language.Code Name.Language.Country Code: 'en-us' Country: 'us' value r d value r d Language en-us 0 2 us 0 3 Code: 'en' Url: 'http://A' en 2 2 NULL 2 2 Name Url: 'http://B' NULL 1 1 NULL 1 1 Name Language en-gb 1 2 gb 1 3 Code: 'en-gb' NULL 0 1 NULL 0 1 Country: 'gb' 10
  • 11. Column-oriented problem  When query sub-tree, there is no way for you to know the whole structure(even the position of yourself) A * * B ... E  To solve this problem, * this paper presents C D repetition & definition r1 r1 r1 r2 r2 Where am I? r2 11
  • 12. Repetition and definition levels r DocId: 10 Links 1 Forward: 20 Forward: 40 Name.Language.Code Forward: 60 value r d Name First time, repeat=0 Language en-us 0 2 Code: 'en-us' Language repeat, level = 2 Country: 'us' en 2 2 Language Code: 'en' NULL 1 1 Url: 'http://A' en-gb 1 2 Name Url: 'http://B' NULL 0 1 Name Language Code: 'en-gb' None repeat, level = 0 Country: 'gb' DocId: 20 2 r Links Backward: 10 Backward: 30 r: At what repeated field in the field's path Forward: 80 the value has repeated Name Url: 'http://C'12
  • 13. Repetition and definition levels r DocId: 10 Links 1 Forward: 20 Forward: 40 Name.Language.Country Forward: 60 value r d Name Language us 0 3 Code: 'en-us' Country: 'us' NULL 2 2 Language Code: 'en' NULL 1 1 Url: 'http://A' gb 1 3 Name Url: 'http://B' NULL 0 1 Name Language Code: 'en-gb' Country: 'gb' DocId: 20 2 r Links Backward: 10 Backward: 30 d: How many fields in paths that could be Forward: 80 Name undefined (opt. or rep.) are actually present Url: 'http://C'13
  • 14. Column-oriented is all right, but what if I still need record- oriented query? (e.g., MapReduce) 14
  • 15. Record assembly FSM Transitions labeled with repetition levels DocId 0 0 1 1 Links.Backward Links.Forward 0 0,1,2 Name.Language.Code Name.Language.Country 2 Name.Ur 0,1 1 l 0 For record-oriented data processing (e.g., MapReduce) 15
  • 16. Reading two fields DocId: 10 s 1 Name Language DocId Country: 'us' Language 0 Name Name 1,2 Name.Language.Country Language 0 Country: 'gb' DocId: 20 s2 Name Structure of parent fields is preserved. Useful for queries like /Name[3]/Language[1]/Country 16
  • 17. Outline  Introduction  Nested columnar storage  Query processing  Experiments  Conclusion 17
  • 18. Query language  Based on SQL  Performs ◦ Projection ◦ Selection ◦ Nested subqueries ◦ inner and intra-record aggregation ◦ Top-k ◦ Joins ◦ User-defined functions 18
  • 19. DocId: 10 Links Forward: 20 Forward: 40 Example usage Forward: 60 Name Language Code: 'en-us' Country: 'us' SELECT DocId AS Id, Language Code: 'en' COUNT(Name.Language.Code) WITHIN Name AS Cnt, Url: 'http://A' Name Name.Url + ',' + Name.Language.Code AS Str Url: 'http://B' Name FROM t Language WHERE REGEXP(Name.Url, '^http') AND DocId < 20; Code: 'en-gb' Country: 'gb' Output table Output schema Id: 10 t1 message QueryResult { Name required int64 Id; Cnt: 2 repeated group Name { Language optional uint64 Cnt; Str: 'http://A,en-us' repeated group Language { Str: 'http://A,en' optional string Str; Name } Cnt: 0 } } 19
  • 20. Query execution client • Parallelizes scheduling root server and aggregation • Fault tolerance intermediate ... • query dispatcher can servers dispatch servers leaf servers ... (with local ... storage) storage layer (e.g., GFS) 20
  • 21. Example: count() SELECT A, COUNT(B) FROM T GROUP SELECT A, SUM(c) 0 BY A FROM (R11 UNION ALL R110) T = {/gfs/1, /gfs/2, …, /gfs/100000} GROUP BY A R11 R12 SELECT A, COUNT(B) AS c SELECT A, COUNT(B) AS c 1 FROM T11 GROUP BY A FROM T12 GROUP BY A T11 = {/gfs/1, …, /gfs/10000} T12 = {/gfs/10001, …, /gfs/20000} ... SELECT A, COUNT(B) AS c 3 FROM T31 GROUP BY A ... T31 = {/gfs/1} Data access ops 21
  • 22. Outline  Introduction  Nested columnar storage  Query processing  Experiments  Conclusion 22
  • 23. Experiment Data • 1 PB of real data (uncompressed, non-replicated) • 100K-800K tablets per table • Experiments run during business hours Table Number of Size (unrepl., Number Data Repl. name records compressed) of fields center factor T1 85 billion 87 TB 270 A 3× T2 24 billion 13 TB 530 A 3× T3 4 billion 70 TB 1200 A 3× T4 1+ trillion 105 TB 50 B 3× T5 1+ trillion 20 TB 30 B 2× 23
  • 24. Column v.s. Record"cold" time on local disk, time (sec) averaged over 30 runs (e) parse as from records C++ objects 10x speedup using columnar objects storage (d) read + decompress records (c) parse as from columns columns C++ objects (b) assemble 2-4x overhead of records using records (a) read + decompress number of fields Table partition: 375 MB (compressed), 300K rows, 125 columns 24
  • 25. MR and Dremel execution Avg # of terms in txtField in 85 billion record table T1 execution time (sec) on 3000 nodes Sawzall program ran on MR: num_recs: table sum of int; num_words: table sum of int; emit num_recs <- 1; emit num_words <- count_words(input.txtField); 87 TB 0.5 TB 0.5 TB Q1: SELECT SUM(count_words(txtField)) / COUNT(*) FROM T1 MR overheads: launch jobs, schedule 0.5M tasks, assemble record 25
  • 26. Impact of serving tree depth execution time (sec) (returns 100s of records) (returns 1M records) Q2: SELECT country, SUM(item.amount) FROM T2 GROUP BY country Q3: SELECT domain, SUM(item.amount) FROM T2 WHERE domain CONTAINS ’.net’ GROUP BY domain 40 billion nested items26
  • 27. Scalability execution time (sec) number of leaf servers Q5 on a trillion-row table T4: SELECT TOP(aid, 20), COUNT(*) FROM T4 27
  • 28. Outline  Introduction  Nested columnar storage  Query processing  Experiments  Conclusion 28
  • 29. Observation Monthly query workload of one 3000-node percentage of queries Dremel instance execution time (sec) Most queries complete under 10 sec 29
  • 30. Conclusion  Dremel is an query system For analysis of read-only nested data  Main feature is fast(interactive response time), column-oriented, SQL-like Query language  Introduced two major method: ◦ Nested columnar storage – in order to solve partial query problem. ◦ Query processing – parallel processing & distributing, decompressing queries 30
  • 31. Comments  Google is awesome!  Pros ◦ Nested storage gives us more flexibility. ◦ repetition & definition is an novel idea and it can solve the locality problem easily. ◦ Distributed serving tree is awesome, faster than MR. 31
  • 32. Comments  Cons ◦ Read-only may not fit every requirement ◦ Dremel is not a database, so you’ll need to convert your real data into dremel when analyzing  Converting may cost lost of time and space  Google doesn’t care this problem, they have GFS and many servers. 32
  • 33. Thank you for listening. 33