SlideShare a Scribd company logo
© 2014 MapR Technologies 1© 2014 MapR Technologies
Using Apache Drill
© 2014 MapR Technologies 2
Agenda
• About Apache Drill
• Query Execution
• Demonstration
• Q and A
© 2014 MapR Technologies 3© 2014 MapR Technologies
About Apache Drill
© 2014 MapR Technologies 4
© 2014 MapR Technologies 5
Community
• Mentors
– MapR, Lucid Works, Elasticsearch, University members
• Notable Committers
– MapR, Microsoft, Hortonworks, Concurrent, Oracle, Ohm Data
© 2014 MapR Technologies 6
• Pioneering Data Agility for Hadoop
• Apache open source project
• Scale-out execution engine for low-latency queries
• Unified SQL-based API for analytics & operational applications
APACHE DRILL
40+ contributors
150+ years of experience building
databases and distributed systems
© 2014 MapR Technologies 7
Rethink SQL for Big Data
• ANSI SQL
– Ubiquitous
• Familiar
– No context switch BI/Analytics
• One technology
– Painful to manage different
technologies
• Enterprise ready
– System-of-record, HA, DR,
Security, multi-tenancy, …
• Flexible data-model
– Allow schemas to evolve rapidly
– Support semi-structured data
types
• Agility
– Self-service possible when
developer and DBA is same
• Scalability
– In all dimensions: schemas,
processes, management
Preserve Invent
© 2014 MapR Technologies 8
Drill Supports Schema Discovery On-The-Fly
• Fixed schema
• Leverage schema in centralized
repository (Hive Metastore)
• Fixed schema, evolving schema or
schema-less
• Leverage schema in centralized
repository or self-describing data
2Schema Discovered On-The-FlySchema Declared In Advance
SCHEMA ON
WRITE
SCHEMA
BEFORE READ
SCHEMA ON THE
FLY
© 2014 MapR Technologies 9
SQL
select * from A
where A.a in (
select B.b from B where B.b = A.c);
Did you know Apache HIVE cannot compute this query?
– e.g. Hive, Impala, Spark SQL
© 2014 MapR Technologies 10
YOU CAN’T HANDLE REAL SQL!
© 2014 MapR Technologies 11
Semi-structured Data
select cf.month, cf.year
from hbase.table1;
• Of course you know an RDBMS cannot handle this query?
– Nor can HIVE and its variants like Impala, Spark SQL
• There’s no meta-store definition available
© 2014 MapR Technologies 12
YOU CAN’T HANDLE AN HBASE API!
© 2014 MapR Technologies 13
Interactive SQL-on-Hadoop options
Drill 1.0 Hive 0.13 w/
Tez
Impala 1.x Shark 0.9 Presto 0.56
Latency Low Medium Low Medium Low
Files Yes (all Hive file
formats, plus
JSON, Text, …)
Yes (all Hive file
formats)
Yes (Parquet,
Sequence, …)
Yes (all Hive file
formats)
Yes (RC,
Sequence, Text)
HBase/MapR-DB Yes Yes Various issues Yes No
Schema Hive or schema-
less
Hive Hive Hive Hive
SQL support ANSI SQL HiveQL HiveQL (subset) HiveQL ANSI SQL
Client support ODBC/JDBC ODBC/JDBC ODBC/JDBC ODBC/JDBC JDBC
Hive compat High High Low High High
Large joins Yes Yes No No No
Nested data Yes Limited No Limited Limited
Concurrency High Limited Medium Limited Medium
© 2014 MapR Technologies 14
Data is Stored in Many Forms
• Flat files in DFS
– Complex data (Thrift, Avro, protobuf)
– Columnar data (Parquet, ORC)
– Loosely defined (JSON)
– Traditional files (CSV, TSV)
• Data stored in NoSQL stores
– Relational-like (rows, columns)
– Sparse data (NoSQL maps)
– Embedded blobs (JSON)
– Document stores (nested objects)
{
name: {
first: Michael,
last: Smith
},
hobbies: [skiing, soccer],
district: Los Altos
}
{
name: {
first: Jennifer,
last: Gates
},
hobbies: [singing],
preschool: CCLC
}
© 2014 MapR Technologies 15
Drill’s Data Model is Flexible
HBase
JSON
BSON
CSV
TSV
Parquet
Avro
Schema-lessFixed schema
Flat
Complex
Flexibility
Flexibility
Name Gender Age
Michael M 6
Jennifer F 3
{
name: {
first: Michael,
last: Smith
},
hobbies: [skiing, soccer],
district: Los Altos
}
{
name: {
first: Jennifer,
last: Gates
},
hobbies: [singing],
preschool: CCLC
}
RDBMS/SQL-on-Hadoop table
Apache Drill table
© 2014 MapR Technologies 16© 2014 MapR Technologies
Query Execution
© 2014 MapR Technologies 17
A storage engine instance
- DFS
- HBase
- Hive Metastore/HCatalog
A workspace
- Sub-directory
- Hive database
- HBase namespace
A table
- pathnames
- HBase table
- Hive table
Data Source is in the Query
SELECT timestamp, message
FROM dfs1.logs.`AppServerLogs/2014/Jan/p001.parquet`
WHERE errorLevel > 2
© 2014 MapR Technologies 18
Runtime Compilation is Faster
• JIT is smart, but
more gains with
runtime
compilation
• Janino: Java-
based Java
compiler
From http://bit.ly/16Xk32x
© 2014 MapR Technologies 19
Drill Compiler
Loaded class
Merge byte-code
of the two classes
Janino compiles
runtime
byte-code
CodeModel
generates code
Precompiled
byte-code
templates
© 2014 MapR Technologies 20
Basic query flow
Zookeeper
DFS /
HBase
DFS /
HBase
DFS /
HBase
Drillbit
Distributed Cache
Drillbit
Distributed Cache
Drillbit
Distributed Cache
Query
1. Query comes to any Drillbit (JDBC, ODBC, CLI)
2. Drillbit generates execution plan based on query optimization & locality
3. Fragments are farmed to individual nodes
4. Data is returned to driving node
*Curator/Zookeeper for ephemeral cluster membership info
© 2014 MapR Technologies 21© 2014 MapR Technologies
Demonstration
© 2014 MapR Technologies 22
Download and try Drill!
http://incubator.apache.org/drill/
© 2014 MapR Technologies 23
Q&A
@mapr maprtech
jscott@mapr.com
Engage with us!
MapR
maprtech
mapr-technologies

More Related Content

What's hot (20)

PPTX
Analyzing Real-World Data with Apache Drill
tshiran
 
PPTX
M7 and Apache Drill, Micheal Hausenblas
Modern Data Stack France
 
PPTX
Introduction to Apache Drill
Swiss Big Data User Group
 
PDF
Hadoop User Group - Status Apache Drill
MapR Technologies
 
PPTX
Working with Delimited Data in Apache Drill 1.6.0
Vince Gonzalez
 
PPTX
Putting Apache Drill into Production
MapR Technologies
 
PPTX
Drilling into Data with Apache Drill
MapR Technologies
 
PPTX
SQL-on-Hadoop with Apache Drill
MapR Technologies
 
PDF
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Charles Givre
 
PDF
Apache Drill @ PJUG, Jan 15, 2013
Gera Shegalov
 
PPTX
Hadoop And Their Ecosystem
sunera pathan
 
PDF
NoSQL HBase schema design and SQL with Apache Drill
Carol McDonald
 
PDF
Sql on everything with drill
Julien Le Dem
 
DOCX
Apache Drill with Oracle, Hive and HBase
Nag Arvind Gudiseva
 
KEY
Building a Business on Hadoop, HBase, and Open Source Distributed Computing
Bradford Stephens
 
PPTX
Apache drill
MapR Technologies
 
PDF
Scaling HDFS to Manage Billions of Files with Key-Value Stores
DataWorks Summit
 
PPTX
Free Code Friday: Drill 101 - Basics of Apache Drill
MapR Technologies
 
Analyzing Real-World Data with Apache Drill
tshiran
 
M7 and Apache Drill, Micheal Hausenblas
Modern Data Stack France
 
Introduction to Apache Drill
Swiss Big Data User Group
 
Hadoop User Group - Status Apache Drill
MapR Technologies
 
Working with Delimited Data in Apache Drill 1.6.0
Vince Gonzalez
 
Putting Apache Drill into Production
MapR Technologies
 
Drilling into Data with Apache Drill
MapR Technologies
 
SQL-on-Hadoop with Apache Drill
MapR Technologies
 
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Charles Givre
 
Apache Drill @ PJUG, Jan 15, 2013
Gera Shegalov
 
Hadoop And Their Ecosystem
sunera pathan
 
NoSQL HBase schema design and SQL with Apache Drill
Carol McDonald
 
Sql on everything with drill
Julien Le Dem
 
Apache Drill with Oracle, Hive and HBase
Nag Arvind Gudiseva
 
Building a Business on Hadoop, HBase, and Open Source Distributed Computing
Bradford Stephens
 
Apache drill
MapR Technologies
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
DataWorks Summit
 
Free Code Friday: Drill 101 - Basics of Apache Drill
MapR Technologies
 

Viewers also liked (16)

PPTX
Hadoop, SQL and NoSQL, No longer an either/or question
DataWorks Summit
 
PDF
20150207 何故scalaを選んだのか
Katsunori Kanda
 
PDF
Treasure Data and OSS
N Masahiro
 
PDF
Real-time Big Data Analytics Engine using Impala
Jason Shih
 
PPT
Apache Hive - Introduction
Muralidharan Deenathayalan
 
PPT
BigData Analytics with Hadoop and BIRT
Amrit Chhetri
 
PDF
[Azure Deep Dive] Spark と Azure HDInsight によるビッグ データ分析入門 (2017/03/27)
Naoki (Neo) SATO
 
PDF
Hadoop最新情報 - YARN, Omni, Drill, Impala, Shark, Vertica - MapR CTO Meetup 2014...
MapR Technologies Japan
 
PPTX
Hadoopカンファレンス20140707
Recruit Technologies
 
PDF
Simplifying Big Data Analytics with Apache Spark
Databricks
 
PDF
Introduction to Apache Spark
datamantra
 
PDF
Apache Spark 2.0: Faster, Easier, and Smarter
Databricks
 
PDF
ゼロから始めるSparkSQL徹底活用!
Nagato Kasaki
 
PPTX
Introduction to Apache Spark
Rahul Jain
 
PDF
Impala Architecture presentation
hadooparchbook
 
PPTX
Apache Spark Architecture
Alexey Grishchenko
 
Hadoop, SQL and NoSQL, No longer an either/or question
DataWorks Summit
 
20150207 何故scalaを選んだのか
Katsunori Kanda
 
Treasure Data and OSS
N Masahiro
 
Real-time Big Data Analytics Engine using Impala
Jason Shih
 
Apache Hive - Introduction
Muralidharan Deenathayalan
 
BigData Analytics with Hadoop and BIRT
Amrit Chhetri
 
[Azure Deep Dive] Spark と Azure HDInsight によるビッグ データ分析入門 (2017/03/27)
Naoki (Neo) SATO
 
Hadoop最新情報 - YARN, Omni, Drill, Impala, Shark, Vertica - MapR CTO Meetup 2014...
MapR Technologies Japan
 
Hadoopカンファレンス20140707
Recruit Technologies
 
Simplifying Big Data Analytics with Apache Spark
Databricks
 
Introduction to Apache Spark
datamantra
 
Apache Spark 2.0: Faster, Easier, and Smarter
Databricks
 
ゼロから始めるSparkSQL徹底活用!
Nagato Kasaki
 
Introduction to Apache Spark
Rahul Jain
 
Impala Architecture presentation
hadooparchbook
 
Apache Spark Architecture
Alexey Grishchenko
 
Ad

Similar to Using Apache Drill (20)

PDF
2014 08-20-pit-hug
Andy Pernsteiner
 
PDF
Webinar: Selecting the Right SQL-on-Hadoop Solution
MapR Technologies
 
PPTX
Real Time and Big Data – It’s About Time
MapR Technologies
 
PPTX
Real Time and Big Data – It’s About Time
DataWorks Summit
 
PPTX
Hive and querying data
KarthigaGunasekaran1
 
PDF
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
Hortonworks
 
PPTX
Big Data Everywhere Chicago: SQL on Hadoop
BigDataEverywhere
 
PPTX
Apache HBase Application Archetypes
Cloudera, Inc.
 
PDF
Hopsworks in the cloud Berlin Buzzwords 2019
Jim Dowling
 
ODP
The other Apache technologies your big data solution needs!
gagravarr
 
PPT
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
James Chen
 
PPTX
Hive ppt on the basis of importance of big data
computer87914
 
PPTX
hive_slides_Webinar_Session_1.pptx
vishwasgarade1
 
PPTX
Big Data Concepts
Ahmed Salman
 
PPTX
Drill at the Chug 9-19-12
Ted Dunning
 
PPTX
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Lester Martin
 
PDF
Big Data Hoopla Simplified - TDWI Memphis 2014
Rajan Kanitkar
 
PDF
Big Data Journey
Tugdual Grall
 
PDF
Mar 2012 HUG: Hive with HBase
Yahoo Developer Network
 
PDF
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
inside-BigData.com
 
2014 08-20-pit-hug
Andy Pernsteiner
 
Webinar: Selecting the Right SQL-on-Hadoop Solution
MapR Technologies
 
Real Time and Big Data – It’s About Time
MapR Technologies
 
Real Time and Big Data – It’s About Time
DataWorks Summit
 
Hive and querying data
KarthigaGunasekaran1
 
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
Hortonworks
 
Big Data Everywhere Chicago: SQL on Hadoop
BigDataEverywhere
 
Apache HBase Application Archetypes
Cloudera, Inc.
 
Hopsworks in the cloud Berlin Buzzwords 2019
Jim Dowling
 
The other Apache technologies your big data solution needs!
gagravarr
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
James Chen
 
Hive ppt on the basis of importance of big data
computer87914
 
hive_slides_Webinar_Session_1.pptx
vishwasgarade1
 
Big Data Concepts
Ahmed Salman
 
Drill at the Chug 9-19-12
Ted Dunning
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Lester Martin
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Rajan Kanitkar
 
Big Data Journey
Tugdual Grall
 
Mar 2012 HUG: Hive with HBase
Yahoo Developer Network
 
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
inside-BigData.com
 
Ad

More from Chicago Hadoop Users Group (18)

PDF
Kinetica master chug_9.12
Chicago Hadoop Users Group
 
PPTX
Chug dl presentation
Chicago Hadoop Users Group
 
PPTX
Yahoo compares Storm and Spark
Chicago Hadoop Users Group
 
PPTX
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Chicago Hadoop Users Group
 
PPT
Choosing the Right Big Data Architecture for your Business
Chicago Hadoop Users Group
 
PDF
An Overview of Ambari
Chicago Hadoop Users Group
 
PPTX
Hadoop and Big Data Security
Chicago Hadoop Users Group
 
PPTX
Introduction to MapReduce
Chicago Hadoop Users Group
 
PPTX
Advanced Oozie
Chicago Hadoop Users Group
 
PDF
Scalding for Hadoop
Chicago Hadoop Users Group
 
PDF
Financial Data Analytics with Hadoop
Chicago Hadoop Users Group
 
PPTX
Everything you wanted to know, but were afraid to ask about Oozie
Chicago Hadoop Users Group
 
PDF
An Introduction to Impala – Low Latency Queries for Apache Hadoop
Chicago Hadoop Users Group
 
PDF
HCatalog: Table Management for Hadoop - CHUG - 20120917
Chicago Hadoop Users Group
 
PDF
Map Reduce v2 and YARN - CHUG - 20120604
Chicago Hadoop Users Group
 
PPTX
Hadoop in a Windows Shop - CHUG - 20120416
Chicago Hadoop Users Group
 
PDF
Running R on Hadoop - CHUG - 20120815
Chicago Hadoop Users Group
 
PPTX
Avro - More Than Just a Serialization Framework - CHUG - 20120416
Chicago Hadoop Users Group
 
Kinetica master chug_9.12
Chicago Hadoop Users Group
 
Chug dl presentation
Chicago Hadoop Users Group
 
Yahoo compares Storm and Spark
Chicago Hadoop Users Group
 
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Chicago Hadoop Users Group
 
Choosing the Right Big Data Architecture for your Business
Chicago Hadoop Users Group
 
An Overview of Ambari
Chicago Hadoop Users Group
 
Hadoop and Big Data Security
Chicago Hadoop Users Group
 
Introduction to MapReduce
Chicago Hadoop Users Group
 
Scalding for Hadoop
Chicago Hadoop Users Group
 
Financial Data Analytics with Hadoop
Chicago Hadoop Users Group
 
Everything you wanted to know, but were afraid to ask about Oozie
Chicago Hadoop Users Group
 
An Introduction to Impala – Low Latency Queries for Apache Hadoop
Chicago Hadoop Users Group
 
HCatalog: Table Management for Hadoop - CHUG - 20120917
Chicago Hadoop Users Group
 
Map Reduce v2 and YARN - CHUG - 20120604
Chicago Hadoop Users Group
 
Hadoop in a Windows Shop - CHUG - 20120416
Chicago Hadoop Users Group
 
Running R on Hadoop - CHUG - 20120815
Chicago Hadoop Users Group
 
Avro - More Than Just a Serialization Framework - CHUG - 20120416
Chicago Hadoop Users Group
 

Recently uploaded (20)

PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PPT
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PPTX
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 

Using Apache Drill

  • 1. © 2014 MapR Technologies 1© 2014 MapR Technologies Using Apache Drill
  • 2. © 2014 MapR Technologies 2 Agenda • About Apache Drill • Query Execution • Demonstration • Q and A
  • 3. © 2014 MapR Technologies 3© 2014 MapR Technologies About Apache Drill
  • 4. © 2014 MapR Technologies 4
  • 5. © 2014 MapR Technologies 5 Community • Mentors – MapR, Lucid Works, Elasticsearch, University members • Notable Committers – MapR, Microsoft, Hortonworks, Concurrent, Oracle, Ohm Data
  • 6. © 2014 MapR Technologies 6 • Pioneering Data Agility for Hadoop • Apache open source project • Scale-out execution engine for low-latency queries • Unified SQL-based API for analytics & operational applications APACHE DRILL 40+ contributors 150+ years of experience building databases and distributed systems
  • 7. © 2014 MapR Technologies 7 Rethink SQL for Big Data • ANSI SQL – Ubiquitous • Familiar – No context switch BI/Analytics • One technology – Painful to manage different technologies • Enterprise ready – System-of-record, HA, DR, Security, multi-tenancy, … • Flexible data-model – Allow schemas to evolve rapidly – Support semi-structured data types • Agility – Self-service possible when developer and DBA is same • Scalability – In all dimensions: schemas, processes, management Preserve Invent
  • 8. © 2014 MapR Technologies 8 Drill Supports Schema Discovery On-The-Fly • Fixed schema • Leverage schema in centralized repository (Hive Metastore) • Fixed schema, evolving schema or schema-less • Leverage schema in centralized repository or self-describing data 2Schema Discovered On-The-FlySchema Declared In Advance SCHEMA ON WRITE SCHEMA BEFORE READ SCHEMA ON THE FLY
  • 9. © 2014 MapR Technologies 9 SQL select * from A where A.a in ( select B.b from B where B.b = A.c); Did you know Apache HIVE cannot compute this query? – e.g. Hive, Impala, Spark SQL
  • 10. © 2014 MapR Technologies 10 YOU CAN’T HANDLE REAL SQL!
  • 11. © 2014 MapR Technologies 11 Semi-structured Data select cf.month, cf.year from hbase.table1; • Of course you know an RDBMS cannot handle this query? – Nor can HIVE and its variants like Impala, Spark SQL • There’s no meta-store definition available
  • 12. © 2014 MapR Technologies 12 YOU CAN’T HANDLE AN HBASE API!
  • 13. © 2014 MapR Technologies 13 Interactive SQL-on-Hadoop options Drill 1.0 Hive 0.13 w/ Tez Impala 1.x Shark 0.9 Presto 0.56 Latency Low Medium Low Medium Low Files Yes (all Hive file formats, plus JSON, Text, …) Yes (all Hive file formats) Yes (Parquet, Sequence, …) Yes (all Hive file formats) Yes (RC, Sequence, Text) HBase/MapR-DB Yes Yes Various issues Yes No Schema Hive or schema- less Hive Hive Hive Hive SQL support ANSI SQL HiveQL HiveQL (subset) HiveQL ANSI SQL Client support ODBC/JDBC ODBC/JDBC ODBC/JDBC ODBC/JDBC JDBC Hive compat High High Low High High Large joins Yes Yes No No No Nested data Yes Limited No Limited Limited Concurrency High Limited Medium Limited Medium
  • 14. © 2014 MapR Technologies 14 Data is Stored in Many Forms • Flat files in DFS – Complex data (Thrift, Avro, protobuf) – Columnar data (Parquet, ORC) – Loosely defined (JSON) – Traditional files (CSV, TSV) • Data stored in NoSQL stores – Relational-like (rows, columns) – Sparse data (NoSQL maps) – Embedded blobs (JSON) – Document stores (nested objects) { name: { first: Michael, last: Smith }, hobbies: [skiing, soccer], district: Los Altos } { name: { first: Jennifer, last: Gates }, hobbies: [singing], preschool: CCLC }
  • 15. © 2014 MapR Technologies 15 Drill’s Data Model is Flexible HBase JSON BSON CSV TSV Parquet Avro Schema-lessFixed schema Flat Complex Flexibility Flexibility Name Gender Age Michael M 6 Jennifer F 3 { name: { first: Michael, last: Smith }, hobbies: [skiing, soccer], district: Los Altos } { name: { first: Jennifer, last: Gates }, hobbies: [singing], preschool: CCLC } RDBMS/SQL-on-Hadoop table Apache Drill table
  • 16. © 2014 MapR Technologies 16© 2014 MapR Technologies Query Execution
  • 17. © 2014 MapR Technologies 17 A storage engine instance - DFS - HBase - Hive Metastore/HCatalog A workspace - Sub-directory - Hive database - HBase namespace A table - pathnames - HBase table - Hive table Data Source is in the Query SELECT timestamp, message FROM dfs1.logs.`AppServerLogs/2014/Jan/p001.parquet` WHERE errorLevel > 2
  • 18. © 2014 MapR Technologies 18 Runtime Compilation is Faster • JIT is smart, but more gains with runtime compilation • Janino: Java- based Java compiler From http://bit.ly/16Xk32x
  • 19. © 2014 MapR Technologies 19 Drill Compiler Loaded class Merge byte-code of the two classes Janino compiles runtime byte-code CodeModel generates code Precompiled byte-code templates
  • 20. © 2014 MapR Technologies 20 Basic query flow Zookeeper DFS / HBase DFS / HBase DFS / HBase Drillbit Distributed Cache Drillbit Distributed Cache Drillbit Distributed Cache Query 1. Query comes to any Drillbit (JDBC, ODBC, CLI) 2. Drillbit generates execution plan based on query optimization & locality 3. Fragments are farmed to individual nodes 4. Data is returned to driving node *Curator/Zookeeper for ephemeral cluster membership info
  • 21. © 2014 MapR Technologies 21© 2014 MapR Technologies Demonstration
  • 22. © 2014 MapR Technologies 22 Download and try Drill! http://incubator.apache.org/drill/
  • 23. © 2014 MapR Technologies 23 Q&A @mapr maprtech [email protected] Engage with us! MapR maprtech mapr-technologies

Editor's Notes

  • #5: Modeled after Dremel based on the white paper from Google With additional flexibility required to support a broader range of data formats and data sources The design goal is to scale to 10,000+ servers and to be able to process petabyes of data and trillions of records in seconds
  • #6: Hortonworks has used code from drill in Tez
  • #7: These are not people who can only create an Abstract Syntax Tree – They have worked on Oracle, DB2, ParAccel, Teradata, SQLServer, Vertica You don’t use a QWERTY-like keyboard Do you really want to use another SQL-like syntax Contributors Facebook, Visa, Mesosphere, many universities, etc --- Even Oracle
  • #8: So many tools and applications. Great performance One technology – standard across multiple databases As applications evolve, schemas change rapidly
  • #10: Why do we tolerate applications that only support the parts and pieces they choose for SQL?
  • #21: A DrillBit is simply a Drill process running in any particular node in the cluster Have I mentioned JDBC and ODBC drivers? This means you can use standard database interfaces that support standards.