SlideShare a Scribd company logo
Splice Machine
Open Source
RDBMS
September 26, 2016
Daniel Gómez Ferro
John Leach
Open Source Stack: Spark, Hadoop and Derby
Apache Derby
▪ ANSI SQL-99 RDBMS
▪ Java-based
▪ ODBC/JDBC Compliant
Apache HBase/Hadoop
▪ Auto-sharding
▪ High availability
▪ Scalability to 100s of PBs
Apache Spark
▪ Analytical engine
▪ Fast, in-memory technology
▪ Memory resilient to node failure
2
Splice Machine: Query Execution
3
Splice Machine: Query Execution
4
1. Parse SQL
• Generate Abstract Syntax Tree
(AST)
• Bind AST to Transactional
Dictionary
Splice Machine: Query Execution
5
1. Parse SQL
2. Optimize query plan
• Determine join order and storage
structure (e.g., base table, index)
using table statistics (e.g.,
cardinality estimates)
• Push predicates
• Unroll nested subqueries
Splice Machine: Query Execution
6
3. Generate optimal byte code
1. Parse SQL
2. Optimize query plan
Splice Machine: Query Execution
7
OLTP Execution on HBase
4a. Execute OLTP query from
byte code
5a. Use block cache and bloom
filters to optimize data access
6a. Return results
3. Generate optimal byte code
1. Parse SQL
2. Optimize query plan
Splice Machine: Query Execution
8
OLAP Execution on Spark
4b. Generate Spark execution plan
OLTP Execution on HBase
4a. Execute OLTP query from
byte code
5a. Use block cache and bloom
filters to optimize data access
6a. Return results
3. Generate optimal byte code
1. Parse SQL
2. Optimize query plan
OLAP Execution on Spark
4b. Generate Spark execution plan
5b. Submit Spark plan with byte code
6b. Fair scheduling of distributed of tasks
7b. Generate RDD from HFiles and Memstore
8b. Execute query and return results
Architectural Differences:
Don’t we already have SQL on HBase?
Transactional System Tephra Centralized SI Two Phase Commit
Hierarchical Distributed
SI
Analytical Engine
HBase Coprocessors,
JDBC Client
HBase Coprocessors,
Executor Services
Processes
Spark on Yarn
Import Process Python or MapReduce MapReduce via Hive
JDBC Command
Spark job
Scanning Data
Coprocessor Internal
Scans,
HBase Scans
Coprocessor Internal
Scans,
HBase Scans
File Oriented Hybrid
Scanner
Compaction HBase Compaction HBase Compaction Spark Compaction
Resource Management HBase Call Queues
Workload Management
System
Spark Job Scheduling
(FAIR)
TPCH 100 Load Times
Tables Row Count
LINEITEM 600037902 5:19:27 1:25:46 0:22:34
ORDERS 150000000 0:51:28 0:15:29 0:09:58
PARTSUPP 80000000 0:18:41 0:08:52 0:06:28
PART 20000000 0:07:26 0:02:27 0:02:14
CUSTOMERS 15000000 0:05:37 0:02:03 0:01:42
SUPPLIER 1000000 0:01:48 0:00:26 0:00:18
NATION 25 0:00:41 0:00:07 0:00:01
REGION 5 0:00:43 0:00:05 0:00:01
TPCH 100 Load Throughput
Write Pipeline
▪ Features
▪ Batched writes per region server
▪ Congestion control, retries
▪ Asynchronous writes
▪ Constraint checking (PK, FK…)
▪ Index updates
▪ One-for-all pipeline
▪ OLTP queries
▪ Batch data ingestion (Imports, Hadoop OutputFormat, OLAP query inserts...)
▪ Streaming data ingestion (Kafka, Spark streaming…)
Spark Compactions
13
Spark UI
▪ Out of process compactions
▪ Minor and Major
▪ Decrease Regionserver load
▪ Increase stability
▪ Remote compactions
▪ Prioritized by Spark’s fair scheduler
TPCH 100 Query Times (seconds)
Query
1 395 TRAFODION-2237 99
2 PHOENIX-3322 516 44
3 PHOENIX-3322 TRAFODION-2237 126
4 PHOENIX-3322 TBD 133
5 PHOENIX-3322 TBD 192
6 74 3178 38
7 PHOENIX-3322 4442 220
8 PHOENIX-3322 TRAFODION-2239 620
9 PHOENIX-3322 941 273
10 PHOENIX-3322 TRAFODION-2241 101
11 PHOENIX-3317 463 56
TPCH 100 Query Times (seconds)
Query
12 379 TBD 85
13 PHOENIX-3318 TBD 71
14 PHOENIX-3322 TBD 50
15 PHOENIX-3319 TBD 102
16 PHOENIX-3322 TBD 33
17 PHOENIX-3322 TBD 929
18 PHOENIX-3322 TBD SPLICE-34
19 PHOENIX-3322 TBD 57
20 PHOENIX-3320 TBD SPLICE-410
21 PHOENIX-3321 TBD 479
22 PHOENIX-3322 TBD 219
Splice Machine: Advanced Spark Integration
16
Innovative, High-Performance
RDD Creation
▪ Fast access to HFiles in HDFS
▪ Merged with deltas from Memstore
▪ Avoids slower HBase API
▪ Reduces load in HBase
Universal Execution Plan
and Byte Code
▪ Optimizer, plan and code shared
across Spark or HBase execution
•••
HBase Region Server
HDFS
•••
Region 1
Memstore
Spark Worker
•••RDD 1
HFile HFile•••
PHYSICAL NODE
RDD N
HFile••• HFile•••
Region N
Memstore
HBase Region Server
HDFS
•••
Region 1
Memstore
Spark Worker
•••RDD 1
HFile HFile•••
PHYSICAL NODE
RDD N
HFile••• HFile•••
Region N
Memstore
Resources
▪ Do you trust us? Nah...
▪ Give it a shot yourself and let us know what you find...
▪ https://github.com/splicemachine/benchmarks
▪ Want to get involved?
▪ http://community.splicemachine.com/
▪ Want to code? Yeah, me too...
▪ https://github.com/splicemachine/spliceengine

More Related Content

What's hot (20)

PDF
ApacheCon 2020 - Flink SQL in 2020: Time to show off!
Timo Walther
 
PDF
HBaseConAsia2018 Keynote1: Apache HBase Project Status
Michael Stack
 
PDF
#GeodeSummit - Redis to Geode Adaptor
PivotalOpenSourceHub
 
PDF
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
HBaseCon
 
PDF
Flurry Analytic Backend - Processing Terabytes of Data in Real-time
Trieu Nguyen
 
PPTX
Bullet: A Real Time Data Query Engine
DataWorks Summit
 
PDF
HBaseCon2017 Apache HBase at Didi
HBaseCon
 
PDF
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
Michael Stack
 
PPTX
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Matt Fuller
 
PDF
Hoodie: How (And Why) We built an analytical datastore on Spark
Vinoth Chandar
 
PPTX
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
Cloudera, Inc.
 
PPTX
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
Michael Stack
 
PPTX
Unified Batch & Stream Processing with Apache Samza
DataWorks Summit
 
PDF
Presto Strata Hadoop SJ 2016 short talk
kbajda
 
PDF
GNW03: Stream Processing with Apache Kafka by Gwen Shapira
gluent.
 
PDF
Presto at Twitter
Bill Graham
 
PDF
Continuous Processing in Structured Streaming with Jose Torres
Databricks
 
PDF
Facebook Presto presentation
Cyanny LIANG
 
PDF
Large-Scale Stream Processing in the Hadoop Ecosystem
Gyula Fóra
 
PDF
Faster Data Integration Pipeline Execution using Spark-Jobserver
Databricks
 
ApacheCon 2020 - Flink SQL in 2020: Time to show off!
Timo Walther
 
HBaseConAsia2018 Keynote1: Apache HBase Project Status
Michael Stack
 
#GeodeSummit - Redis to Geode Adaptor
PivotalOpenSourceHub
 
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
HBaseCon
 
Flurry Analytic Backend - Processing Terabytes of Data in Real-time
Trieu Nguyen
 
Bullet: A Real Time Data Query Engine
DataWorks Summit
 
HBaseCon2017 Apache HBase at Didi
HBaseCon
 
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
Michael Stack
 
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Matt Fuller
 
Hoodie: How (And Why) We built an analytical datastore on Spark
Vinoth Chandar
 
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
Cloudera, Inc.
 
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
Michael Stack
 
Unified Batch & Stream Processing with Apache Samza
DataWorks Summit
 
Presto Strata Hadoop SJ 2016 short talk
kbajda
 
GNW03: Stream Processing with Apache Kafka by Gwen Shapira
gluent.
 
Presto at Twitter
Bill Graham
 
Continuous Processing in Structured Streaming with Jose Torres
Databricks
 
Facebook Presto presentation
Cyanny LIANG
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Gyula Fóra
 
Faster Data Integration Pipeline Execution using Spark-Jobserver
Databricks
 

Viewers also liked (15)

PPTX
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
Yahoo Developer Network
 
PDF
Spark meetup - Zoomdata Streaming
Zoomdata
 
PDF
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Inside Analysis
 
PDF
Splice machine-bloor-webinar-data-lakes
Edgar Alejandro Villegas
 
PPTX
Hybrid Data Platform
DataWorks Summit/Hadoop Summit
 
PDF
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Sematext Group, Inc.
 
PDF
Hadoop and the Relational Database: The Best of Both Worlds
Inside Analysis
 
PDF
Crawl, Walk, Run: How to Get Started with Hadoop
Inside Analysis
 
PDF
SQL on Hadoop
nvvrajesh
 
PPTX
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
Cloudera, Inc.
 
PPTX
HBaseCon 2013: Apache HBase Table Snapshots
Cloudera, Inc.
 
PDF
[R022] Problem Solving e Decision Making
LEN Learning Education Network
 
PPTX
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
Yahoo Developer Network
 
PPTX
Keynote: The Future of Apache HBase
HBaseCon
 
PDF
reveal.js 3.0.0
Hakim El Hattab
 
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
Yahoo Developer Network
 
Spark meetup - Zoomdata Streaming
Zoomdata
 
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Inside Analysis
 
Splice machine-bloor-webinar-data-lakes
Edgar Alejandro Villegas
 
Hybrid Data Platform
DataWorks Summit/Hadoop Summit
 
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Sematext Group, Inc.
 
Hadoop and the Relational Database: The Best of Both Worlds
Inside Analysis
 
Crawl, Walk, Run: How to Get Started with Hadoop
Inside Analysis
 
SQL on Hadoop
nvvrajesh
 
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
Cloudera, Inc.
 
HBaseCon 2013: Apache HBase Table Snapshots
Cloudera, Inc.
 
[R022] Problem Solving e Decision Making
LEN Learning Education Network
 
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
Yahoo Developer Network
 
Keynote: The Future of Apache HBase
HBaseCon
 
reveal.js 3.0.0
Hakim El Hattab
 
Ad

Similar to HBaseConEast2016: Splice machine open source rdbms (20)

PDF
BigDataSpain 2016: Introduction to Apache Apex
Thomas Weise
 
PDF
KSQL Intro
confluent
 
PDF
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Yaroslav Tkachenko
 
PDF
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
HostedbyConfluent
 
PDF
Chicago Kafka Meetup
Cliff Gilmore
 
PDF
How QBerg scaled to store data longer, query it faster
MariaDB plc
 
PDF
From Batch to Streaming ET(L) with Apache Apex
DataWorks Summit
 
PDF
Low latency high throughput streaming using Apache Apex and Apache Kudu
DataWorks Summit
 
PPTX
Proving out flash storage array performance using swingbench and slob
Kapil Goyal
 
PPTX
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Apache Apex
 
PPTX
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Apache Apex
 
PDF
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
Thomas Weise
 
PDF
From Batch to Streaming with Apache Apex Dataworks Summit 2017
Apache Apex
 
PPTX
xPatterns - Spark Summit 2014
Claudiu Barbura
 
PPTX
Deep Dive into Apache Apex App Development
Apache Apex
 
PPTX
Optimizing applications and database performance
Inam Bukhary
 
PPTX
Intro to Apache Apex @ Women in Big Data
Apache Apex
 
PDF
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Databricks
 
PDF
Introduction to Apache Apex by Thomas Weise
Big Data Spain
 
PPTX
Architectual Comparison of Apache Apex and Spark Streaming
Apache Apex
 
BigDataSpain 2016: Introduction to Apache Apex
Thomas Weise
 
KSQL Intro
confluent
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Yaroslav Tkachenko
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
HostedbyConfluent
 
Chicago Kafka Meetup
Cliff Gilmore
 
How QBerg scaled to store data longer, query it faster
MariaDB plc
 
From Batch to Streaming ET(L) with Apache Apex
DataWorks Summit
 
Low latency high throughput streaming using Apache Apex and Apache Kudu
DataWorks Summit
 
Proving out flash storage array performance using swingbench and slob
Kapil Goyal
 
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Apache Apex
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Apache Apex
 
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
Thomas Weise
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
Apache Apex
 
xPatterns - Spark Summit 2014
Claudiu Barbura
 
Deep Dive into Apache Apex App Development
Apache Apex
 
Optimizing applications and database performance
Inam Bukhary
 
Intro to Apache Apex @ Women in Big Data
Apache Apex
 
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Databricks
 
Introduction to Apache Apex by Thomas Weise
Big Data Spain
 
Architectual Comparison of Apache Apex and Spark Streaming
Apache Apex
 
Ad

More from Michael Stack (20)

PDF
hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
Michael Stack
 
PDF
hbaseconasia2019 Recent work on HBase at Pinterest
Michael Stack
 
PDF
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
Michael Stack
 
PDF
hbaseconasia2019 HBase at Didi
Michael Stack
 
PDF
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
Michael Stack
 
PDF
hbaseconasia2019 HBase at Tencent
Michael Stack
 
PDF
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
Michael Stack
 
PDF
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
Michael Stack
 
PDF
hbaseconasia2019 Pharos as a Pluggable Secondary Index Component
Michael Stack
 
PDF
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
Michael Stack
 
PDF
hbaseconasia2019 OpenTSDB at Xiaomi
Michael Stack
 
PDF
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
Michael Stack
 
PDF
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
Michael Stack
 
PDF
hbaseconasia2019 Distributed Bitmap Index Solution
Michael Stack
 
PDF
hbaseconasia2019 HBase Bucket Cache on Persistent Memory
Michael Stack
 
PDF
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
Michael Stack
 
PDF
hbaseconasia2019 BDS: A data synchronization platform for HBase
Michael Stack
 
PDF
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
Michael Stack
 
PDF
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
Michael Stack
 
PDF
HBaseConAsia2019 Keynote
Michael Stack
 
hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
Michael Stack
 
hbaseconasia2019 Recent work on HBase at Pinterest
Michael Stack
 
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
Michael Stack
 
hbaseconasia2019 HBase at Didi
Michael Stack
 
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
Michael Stack
 
hbaseconasia2019 HBase at Tencent
Michael Stack
 
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
Michael Stack
 
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
Michael Stack
 
hbaseconasia2019 Pharos as a Pluggable Secondary Index Component
Michael Stack
 
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
Michael Stack
 
hbaseconasia2019 OpenTSDB at Xiaomi
Michael Stack
 
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
Michael Stack
 
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
Michael Stack
 
hbaseconasia2019 Distributed Bitmap Index Solution
Michael Stack
 
hbaseconasia2019 HBase Bucket Cache on Persistent Memory
Michael Stack
 
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
Michael Stack
 
hbaseconasia2019 BDS: A data synchronization platform for HBase
Michael Stack
 
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
Michael Stack
 
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
Michael Stack
 
HBaseConAsia2019 Keynote
Michael Stack
 

Recently uploaded (20)

PPTX
DATA BASE MANAGEMENT AND RELATIONAL DATA
gomathisankariv2
 
PDF
Pressure Measurement training for engineers and Technicians
AIESOLUTIONS
 
PDF
AI TECHNIQUES FOR IDENTIFYING ALTERATIONS IN THE HUMAN GUT MICROBIOME IN MULT...
vidyalalltv1
 
PDF
Biomechanics of Gait: Engineering Solutions for Rehabilitation (www.kiu.ac.ug)
publication11
 
PPTX
Server Side Web Development Unit 1 of Nodejs.pptx
sneha852132
 
PDF
Design Thinking basics for Engineers.pdf
CMR University
 
PPTX
VITEEE 2026 Exam Details , Important Dates
SonaliSingh127098
 
PPTX
Element 11. ELECTRICITY safety and hazards
merrandomohandas
 
PPTX
美国电子版毕业证南卡罗莱纳大学上州分校水印成绩单USC学费发票定做学位证书编号怎么查
Taqyea
 
PPTX
Damage of stability of a ship and how its change .pptx
ehamadulhaque
 
PPTX
Depth First Search Algorithm in 🧠 DFS in Artificial Intelligence (AI)
rafeeqshaik212002
 
DOCX
CS-802 (A) BDH Lab manual IPS Academy Indore
thegodhimself05
 
PPTX
Heart Bleed Bug - A case study (Course: Cryptography and Network Security)
Adri Jovin
 
PPTX
Worm gear strength and wear calculation as per standard VB Bhandari Databook.
shahveer210504
 
PDF
AI TECHNIQUES FOR IDENTIFYING ALTERATIONS IN THE HUMAN GUT MICROBIOME IN MULT...
vidyalalltv1
 
PPTX
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
PDF
MAD Unit - 1 Introduction of Android IT Department
JappanMavani
 
PDF
Viol_Alessandro_Presentazione_prelaurea.pdf
dsecqyvhbowrzxshhf
 
PPTX
Arduino Based Gas Leakage Detector Project
CircuitDigest
 
PPTX
Day2 B2 Best.pptx
helenjenefa1
 
DATA BASE MANAGEMENT AND RELATIONAL DATA
gomathisankariv2
 
Pressure Measurement training for engineers and Technicians
AIESOLUTIONS
 
AI TECHNIQUES FOR IDENTIFYING ALTERATIONS IN THE HUMAN GUT MICROBIOME IN MULT...
vidyalalltv1
 
Biomechanics of Gait: Engineering Solutions for Rehabilitation (www.kiu.ac.ug)
publication11
 
Server Side Web Development Unit 1 of Nodejs.pptx
sneha852132
 
Design Thinking basics for Engineers.pdf
CMR University
 
VITEEE 2026 Exam Details , Important Dates
SonaliSingh127098
 
Element 11. ELECTRICITY safety and hazards
merrandomohandas
 
美国电子版毕业证南卡罗莱纳大学上州分校水印成绩单USC学费发票定做学位证书编号怎么查
Taqyea
 
Damage of stability of a ship and how its change .pptx
ehamadulhaque
 
Depth First Search Algorithm in 🧠 DFS in Artificial Intelligence (AI)
rafeeqshaik212002
 
CS-802 (A) BDH Lab manual IPS Academy Indore
thegodhimself05
 
Heart Bleed Bug - A case study (Course: Cryptography and Network Security)
Adri Jovin
 
Worm gear strength and wear calculation as per standard VB Bhandari Databook.
shahveer210504
 
AI TECHNIQUES FOR IDENTIFYING ALTERATIONS IN THE HUMAN GUT MICROBIOME IN MULT...
vidyalalltv1
 
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
MAD Unit - 1 Introduction of Android IT Department
JappanMavani
 
Viol_Alessandro_Presentazione_prelaurea.pdf
dsecqyvhbowrzxshhf
 
Arduino Based Gas Leakage Detector Project
CircuitDigest
 
Day2 B2 Best.pptx
helenjenefa1
 

HBaseConEast2016: Splice machine open source rdbms

  • 1. Splice Machine Open Source RDBMS September 26, 2016 Daniel Gómez Ferro John Leach
  • 2. Open Source Stack: Spark, Hadoop and Derby Apache Derby ▪ ANSI SQL-99 RDBMS ▪ Java-based ▪ ODBC/JDBC Compliant Apache HBase/Hadoop ▪ Auto-sharding ▪ High availability ▪ Scalability to 100s of PBs Apache Spark ▪ Analytical engine ▪ Fast, in-memory technology ▪ Memory resilient to node failure 2
  • 3. Splice Machine: Query Execution 3
  • 4. Splice Machine: Query Execution 4 1. Parse SQL • Generate Abstract Syntax Tree (AST) • Bind AST to Transactional Dictionary
  • 5. Splice Machine: Query Execution 5 1. Parse SQL 2. Optimize query plan • Determine join order and storage structure (e.g., base table, index) using table statistics (e.g., cardinality estimates) • Push predicates • Unroll nested subqueries
  • 6. Splice Machine: Query Execution 6 3. Generate optimal byte code 1. Parse SQL 2. Optimize query plan
  • 7. Splice Machine: Query Execution 7 OLTP Execution on HBase 4a. Execute OLTP query from byte code 5a. Use block cache and bloom filters to optimize data access 6a. Return results 3. Generate optimal byte code 1. Parse SQL 2. Optimize query plan
  • 8. Splice Machine: Query Execution 8 OLAP Execution on Spark 4b. Generate Spark execution plan OLTP Execution on HBase 4a. Execute OLTP query from byte code 5a. Use block cache and bloom filters to optimize data access 6a. Return results 3. Generate optimal byte code 1. Parse SQL 2. Optimize query plan OLAP Execution on Spark 4b. Generate Spark execution plan 5b. Submit Spark plan with byte code 6b. Fair scheduling of distributed of tasks 7b. Generate RDD from HFiles and Memstore 8b. Execute query and return results
  • 9. Architectural Differences: Don’t we already have SQL on HBase? Transactional System Tephra Centralized SI Two Phase Commit Hierarchical Distributed SI Analytical Engine HBase Coprocessors, JDBC Client HBase Coprocessors, Executor Services Processes Spark on Yarn Import Process Python or MapReduce MapReduce via Hive JDBC Command Spark job Scanning Data Coprocessor Internal Scans, HBase Scans Coprocessor Internal Scans, HBase Scans File Oriented Hybrid Scanner Compaction HBase Compaction HBase Compaction Spark Compaction Resource Management HBase Call Queues Workload Management System Spark Job Scheduling (FAIR)
  • 10. TPCH 100 Load Times Tables Row Count LINEITEM 600037902 5:19:27 1:25:46 0:22:34 ORDERS 150000000 0:51:28 0:15:29 0:09:58 PARTSUPP 80000000 0:18:41 0:08:52 0:06:28 PART 20000000 0:07:26 0:02:27 0:02:14 CUSTOMERS 15000000 0:05:37 0:02:03 0:01:42 SUPPLIER 1000000 0:01:48 0:00:26 0:00:18 NATION 25 0:00:41 0:00:07 0:00:01 REGION 5 0:00:43 0:00:05 0:00:01
  • 11. TPCH 100 Load Throughput
  • 12. Write Pipeline ▪ Features ▪ Batched writes per region server ▪ Congestion control, retries ▪ Asynchronous writes ▪ Constraint checking (PK, FK…) ▪ Index updates ▪ One-for-all pipeline ▪ OLTP queries ▪ Batch data ingestion (Imports, Hadoop OutputFormat, OLAP query inserts...) ▪ Streaming data ingestion (Kafka, Spark streaming…)
  • 13. Spark Compactions 13 Spark UI ▪ Out of process compactions ▪ Minor and Major ▪ Decrease Regionserver load ▪ Increase stability ▪ Remote compactions ▪ Prioritized by Spark’s fair scheduler
  • 14. TPCH 100 Query Times (seconds) Query 1 395 TRAFODION-2237 99 2 PHOENIX-3322 516 44 3 PHOENIX-3322 TRAFODION-2237 126 4 PHOENIX-3322 TBD 133 5 PHOENIX-3322 TBD 192 6 74 3178 38 7 PHOENIX-3322 4442 220 8 PHOENIX-3322 TRAFODION-2239 620 9 PHOENIX-3322 941 273 10 PHOENIX-3322 TRAFODION-2241 101 11 PHOENIX-3317 463 56
  • 15. TPCH 100 Query Times (seconds) Query 12 379 TBD 85 13 PHOENIX-3318 TBD 71 14 PHOENIX-3322 TBD 50 15 PHOENIX-3319 TBD 102 16 PHOENIX-3322 TBD 33 17 PHOENIX-3322 TBD 929 18 PHOENIX-3322 TBD SPLICE-34 19 PHOENIX-3322 TBD 57 20 PHOENIX-3320 TBD SPLICE-410 21 PHOENIX-3321 TBD 479 22 PHOENIX-3322 TBD 219
  • 16. Splice Machine: Advanced Spark Integration 16 Innovative, High-Performance RDD Creation ▪ Fast access to HFiles in HDFS ▪ Merged with deltas from Memstore ▪ Avoids slower HBase API ▪ Reduces load in HBase Universal Execution Plan and Byte Code ▪ Optimizer, plan and code shared across Spark or HBase execution ••• HBase Region Server HDFS ••• Region 1 Memstore Spark Worker •••RDD 1 HFile HFile••• PHYSICAL NODE RDD N HFile••• HFile••• Region N Memstore HBase Region Server HDFS ••• Region 1 Memstore Spark Worker •••RDD 1 HFile HFile••• PHYSICAL NODE RDD N HFile••• HFile••• Region N Memstore
  • 17. Resources ▪ Do you trust us? Nah... ▪ Give it a shot yourself and let us know what you find... ▪ https://github.com/splicemachine/benchmarks ▪ Want to get involved? ▪ http://community.splicemachine.com/ ▪ Want to code? Yeah, me too... ▪ https://github.com/splicemachine/spliceengine