SlideShare a Scribd company logo
Powering OLTP Apps
on Hadoop
Monte Zweben
Co-Founder and CEO
January 24, 2015
2
Who are We?
THE ONLY
HADOOP RDBMS
Replace your old RDBMS
with a scale-out SQL database
Affordable, Scale-Out
ACID Transactions
No Application Rewrites
10x
Better
Price/Perf
3
Campaign Management: Harte-Hanks
Overview
Digital marketing services provider
Unified Customer Profile
Real-time campaign management
Complex OLTP and OLAP environment
Challenges
Oracle RAC too expensive to scale
Queries too slow – even up to ½ hour
Getting worse – expect 30-50% data growth
Looked for 9 months for a cost-effective solution
Solution Diagram Initial Results
¼ cost
with commodity scale out
3-7x faster
through parallelized queries
10-20x price/perf
with no application, BI or ETL rewrites
Cross-Channel
Campaigns
Real-Time
Personalization
Real-Time Actions
4
Reference Architecture: Operational Apps
Provide affordable scale-out for applications with a high concurrency of real-time reads/writes
3rd Party
Data Sources
Operational App
(e.g., CRM, Supply Chain, eCommerce,
Unica Campaign Mgmt)
Customers
Operational
Employees
Operational
Reports &
Analytics
5
Reference Architecture: Operational Data Lake
Offload real-time reporting and analytics from expensive OLTP and DW systems
OLTP
Systems
Ad Hoc
Analytics
Operational
Data Lake
Executive
Business
Reports
Operational
Reports &
Analytics
ERP
CRM
Supply
Chain
HR
…
Data
Warehouse
Datamart
Stream or
Batch
Updates
ETL
Real-Time,
Event-Driven
Apps
6
Reference Architecture: Unified Customer Profile
Improve marketing ROI with deeper customer intelligence and better cross-channel coordination
Unified
Customer Profile
(aka DMP)
Operational Reports for
Campaign Performance
Social
Feeds
Web/eCommerc
e Clickstreams
WebsiteDatamart
Stream or Batch
Updates
BI Tools
Demand Side
Platform (DSP)
Ad Exchange
1st Party/
CRM Data
3rd Party Data
(e.g., Axciom)
Ad Perf. Data
(e.g., Doubleclick)
Email Mktg Data
Call Center Data
POS Data
Email
Marketing
App
Ad Hoc Audience
Segmentation
BI Tools
7
Proven Building Blocks: Hadoop and Derby
APACHE DERBY
 ANSI SQL-99 RDBMS
 Java-based
 ODBC/JDBC Compliant
APACHE HBASE/HDFS
 Auto-sharding
 Real-time updates
 Fault-tolerance
 Scalability to 100s of PBs
 Data replication
Derby
 100% JAVA ANSI SQL RDBMS – CLI, JDBC, embedded
 Modular, Lightweight, Unicode
 Authentication and Authorization
 Concurrency
 Project History
 Started as Cloudscape in 1996
 Acquired by Informix… then IBM…
 IBM Contributed code to Apache project in 2004
 An active Apache project with conservative development
 DB2 influence. Many of the same limits/features
 Has Oracle’s stamp of approval – Java DB and included in JDK6
8
Derby Advanced Features
 Java Stored Procedures
 Triggers
 Two-phase commit (XA Support)
 Updatable SQL Views
 Full Transaction Isolation Support
 Encryption
 Custom Functions
9
Splice SQL Processing
 PreparedStatement ps = conn.prepareStatement(“SELECT * FROM
T WHERE ID = ?”);
1. Look up in cache using exact text match (skip to 6 if plan found
in cache)
2. Parse using JavaCC generated parser
3. Bind to dictionary, acquire types
4. Optimize Plan
5. Generate code for plan
6. Create instance of plan
10
Splice Details
 Parse Phase
 Forms explicit tree of query nodes representing statement
 Generate Phase
 Generate Java byte code (an Activation) directly into an in-memory byte array
 Loaded with special ClassLoader that loads from the byte array
 Binds arguments to proper types
 Optimize Phase
 Determine feasible join strategies
 Optimize based on cost estimates
 Execute Phase
 Instantiates arguments to represent specific statement state
 Expressions are methods on Activation
 Trees of ResultSets generated that represent the state of the query
11
Splice Modifications to Derby
12
Derby Component Derby Splice Version
Store Block File-based HBase Tables
Indexes B-Tree Dense index in HBase Table
Concurrency Lock-based, Aries MVCC - Snapshot Isolation
Project-Restrict Plan Predicates on centralized file
scanner
Predicates pushed to shards
and locally applied
Aggregation Plan Aggregation serially computed Aggregations pushed to shards
and spliced together
Join Plan Centralized Hash and NLJ
chosen by optimizer
Distributed Broadcast, Sort-
Merge, Merge, NLJ, and Batch
NLJ chosen by optimizer
Resource Management Number of Connections and
Memory Limitations
Task Resource Queues and
Write Governor
13
HBase: Proven Scale-Out
 Auto-sharding
 Scales with commodity hardware
 Cost-effective from GBs to PBs
 High availability thru failover
and replication
 LSM-trees
14
Distributed, Parallelized Query Execution
Parallelized computation across cluster
Moves computation to the data
Utilizes HBase co-processors
No MapReduce
Splice HBase Extensions
 Asynchronous Write Pipeline
 Non-blocking, flushable writes
 Writes data, indexes, and constraints (index) concurrently
 Batches writes in chunks for bulk WAL Edits vs. single WAL Edits
 Synchronization free internal scanner vs. synchronized external scanner
 Linux Scheduler Modeled Resource Manager
 Resource Queues that handle DDL, DML, Dictionary, and Maintenance
Operations
 Sparse Data Support
 Efficiently store sparse data
 Does not store nulls
15
Schema Advantages
 Non-Blocking Schema Changes
 Add columns in a DDL transaction
 No read/write locks while adding columns
 Sparse Data Support
 Efficiently store sparse data
 Does not store nulls
16
ANSI SQL-99 Coverage
17
 Data types – e.g., INTEGER, REAL,
CHARACTER, DATE, BOOLEAN, BIGINT
 DDL – e.g., CREATE TABLE, CREATE SCHEMA,
ALTER TABLE, DELETE, UPDATE
 Predicates – e.g., IN, BETWEEN, LIKE, EXISTS
 DML – e.g., INSERT, DELETE, UPDATE, SELECT
 Query specification – e.g., SELECT DISTINCT,
GROUP BY, HAVING
 SET functions – e.g., UNION, ABS, MOD, ALL,
CHECK
 Aggregation functions – e.g., AVG, MAX,
COUNT
 String functions – e.g., SUBSTRING,
concatenation, UPPER, LOWER, POSITION,
TRIM, LENGTH
 Conditional functions – e.g., CASE, searched
CASE
 Privileges – e.g., privileges for SELECT,
DELETE, INSERT, EXECUTE
 Cursors – e.g., updatable, read-only,
positioned DELETE/UPDATE
 Joins – e.g., INNER JOIN, LEFT OUTER JOIN
 Transactions – e.g., COMMIT, ROLLBACK,
READ COMMITTED, REPEATABLE READ, READ
UNCOMMITTED, Snapshot Isolation
 Sub-queries
 Triggers
 User-defined functions (UDFs)
 Views – including grouped views
18
Lockless, ACID transactions
State-of-the-Art Snapshot Isolation
18
Adds multi-row, multi-table transactions
to HBase with rollback
Fast, lockless, high concurrency
ZooKeeper coordination
Extends research from Google
Percolator, Yahoo Labs, U of Waterloo
Transaction A
Transaction B
Transaction C
Ts Tc
19
BI and SQL tool support via ODBC
No application rewrites needed
19
SQL Database Ecosystem
20
Ad-hoc Analytics Operational (OLTP + OLAP)
New SQL
IN-MEMORY
RDBMSMPP
New SQL
Proprietary HW
Lower
Cost
Higher
Cost
Hadoop
RDBMS
SQL-on-Hadoop
Phoenix
SQL-on-HBase
What People are Saying…
21
Recognized as a key innovator in databases
Scaling out on Splice
Machine presented
some major benefits
over Oracle
...automatic balancing between
clusters...avoiding the costly
licensing issues.
Quotes
Awards
An alternative to today’s
RDBMSes,
Splice Machine effectively
combines traditional relational
database technology with
the scale-out capabilities
of Hadoop.
The unique claim of … Splice
Machine is that it can run
transactional applications
as well as support analytics on
top of Hadoop.
22
Summary
THE ONLY
HADOOP RDBMS
Replace your old RDBMS
with a scale-out SQL database
Affordable, Scale-Out
ACID Transactions
No Application Rewrites
10x
Better
Price/Perf
Questions?
Monte Zweben
mzweben@splicemachine.com

More Related Content

What's hot (20)

PPTX
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Jeremy Beard
 
PPTX
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
DataWorks Summit
 
PDF
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
Rakuten Group, Inc.
 
PPTX
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
Yahoo Developer Network
 
PDF
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Data Con LA
 
PPTX
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
Yahoo Developer Network
 
PPTX
Simplified Cluster Operation & Troubleshooting
DataWorks Summit/Hadoop Summit
 
PDF
Hive on spark berlin buzzwords
Szehon Ho
 
PPTX
Introduction to Apache Kudu
Jeff Holoman
 
PPTX
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
Newton Alex
 
PDF
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
Cloudera, Inc.
 
PPTX
A brave new world in mutable big data relational storage (Strata NYC 2017)
Todd Lipcon
 
PPTX
Hive vs. Impala
Omid Vahdaty
 
PDF
Develop Scalable Applications with DataStax Drivers (Alex Popescu, Bulat Shak...
DataStax
 
PDF
Hadoop 3 @ Hadoop Summit San Jose 2017
Junping Du
 
PDF
New Data Transfer Tools for Hadoop: Sqoop 2
DataWorks Summit
 
PPTX
High concurrency,
Low latency analytics
using Spark/Kudu
Chris George
 
PDF
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop
jdcryans
 
PDF
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
DataStax
 
PPTX
April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...
Yahoo Developer Network
 
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Jeremy Beard
 
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
DataWorks Summit
 
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
Rakuten Group, Inc.
 
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
Yahoo Developer Network
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Data Con LA
 
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
Yahoo Developer Network
 
Simplified Cluster Operation & Troubleshooting
DataWorks Summit/Hadoop Summit
 
Hive on spark berlin buzzwords
Szehon Ho
 
Introduction to Apache Kudu
Jeff Holoman
 
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
Newton Alex
 
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
Cloudera, Inc.
 
A brave new world in mutable big data relational storage (Strata NYC 2017)
Todd Lipcon
 
Hive vs. Impala
Omid Vahdaty
 
Develop Scalable Applications with DataStax Drivers (Alex Popescu, Bulat Shak...
DataStax
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Junping Du
 
New Data Transfer Tools for Hadoop: Sqoop 2
DataWorks Summit
 
High concurrency,
Low latency analytics
using Spark/Kudu
Chris George
 
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop
jdcryans
 
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
DataStax
 
April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...
Yahoo Developer Network
 

Viewers also liked (20)

PPTX
November 2014 HUG: Apache Tez - A Performance View into Large Scale Data-proc...
Yahoo Developer Network
 
PPTX
November 2014 HUG: Lessons from Hadoop 2+Java8 migration at LinkedIn
Yahoo Developer Network
 
PPTX
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
Yahoo Developer Network
 
PPTX
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
Yahoo Developer Network
 
PPT
Hadoop @ Yahoo! - Internet Scale Data Processing
Yahoo Developer Network
 
PPTX
October 2016 HUG: The Pillars of Effective Data Archiving and Tiering in Hadoop
Yahoo Developer Network
 
PDF
October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...
Yahoo Developer Network
 
PPTX
January 2015 HUG: Apache Flink: Fast and reliable large-scale data processing
Yahoo Developer Network
 
PDF
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Inside Analysis
 
PDF
Splice machine-bloor-webinar-data-lakes
Edgar Alejandro Villegas
 
PPTX
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Chicago Hadoop Users Group
 
PPTX
HBaseConEast2016: Splice machine open source rdbms
Michael Stack
 
PPTX
Splice Machine Overview
Kunal Gupta
 
PDF
Hadoop and the Relational Database: The Best of Both Worlds
Inside Analysis
 
PDF
Crawl, Walk, Run: How to Get Started with Hadoop
Inside Analysis
 
PDF
87.careers in microbiology
entranzz123
 
PPTX
Dynamic Faces - Analyzing Data with a Face
Excel Effects
 
PDF
How to be cooler online than you are in real life
Nicole Dion
 
PPT
Sponsordossier Vsp
prdsutte
 
PDF
The Witch's Halloween
Fortuna Lu
 
November 2014 HUG: Apache Tez - A Performance View into Large Scale Data-proc...
Yahoo Developer Network
 
November 2014 HUG: Lessons from Hadoop 2+Java8 migration at LinkedIn
Yahoo Developer Network
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
Yahoo Developer Network
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
Yahoo Developer Network
 
Hadoop @ Yahoo! - Internet Scale Data Processing
Yahoo Developer Network
 
October 2016 HUG: The Pillars of Effective Data Archiving and Tiering in Hadoop
Yahoo Developer Network
 
October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...
Yahoo Developer Network
 
January 2015 HUG: Apache Flink: Fast and reliable large-scale data processing
Yahoo Developer Network
 
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Inside Analysis
 
Splice machine-bloor-webinar-data-lakes
Edgar Alejandro Villegas
 
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Chicago Hadoop Users Group
 
HBaseConEast2016: Splice machine open source rdbms
Michael Stack
 
Splice Machine Overview
Kunal Gupta
 
Hadoop and the Relational Database: The Best of Both Worlds
Inside Analysis
 
Crawl, Walk, Run: How to Get Started with Hadoop
Inside Analysis
 
87.careers in microbiology
entranzz123
 
Dynamic Faces - Analyzing Data with a Face
Excel Effects
 
How to be cooler online than you are in real life
Nicole Dion
 
Sponsordossier Vsp
prdsutte
 
The Witch's Halloween
Fortuna Lu
 
Ad

Similar to January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transactional RDBMS (20)

ODP
Front Range PHP NoSQL Databases
Jon Meredith
 
PPTX
Nosql seminar
Shreyashkumar Nangnurwar
 
PPTX
RavenDB overview
Igor Moochnick
 
ODP
HadoopDB
Miguel Pastor
 
PPT
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Cloudera, Inc.
 
PPTX
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
StreamNative
 
PPT
Percona Lucid Db
guestd3896369
 
PDF
Azure and cloud design patterns
Venkatesh Narayanan
 
PPTX
Handling Data in Mega Scale Systems
Directi Group
 
PPTX
SQL Server Integration Services and Analysis Services
Mohan Arumugam
 
PPT
Cloud Computing: Hadoop
darugar
 
PDF
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
GeeksLab Odessa
 
PPT
Ops Jumpstart: MongoDB Administration 101
MongoDB
 
PPTX
Big data concepts
Serkan Özal
 
ODP
The other Apache Technologies your Big Data solution needs
gagravarr
 
PDF
Camunda BPM 7.2: Performance and Scalability (English)
camunda services GmbH
 
PPTX
1 extreme performance - part i
sqlserver.co.il
 
PPT
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Bhupesh Bansal
 
PPT
Hadoop and Voldemort @ LinkedIn
Hadoop User Group
 
PDF
sudoers: Benchmarking Hadoop with ALOJA
Nicolas Poggi
 
Front Range PHP NoSQL Databases
Jon Meredith
 
RavenDB overview
Igor Moochnick
 
HadoopDB
Miguel Pastor
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Cloudera, Inc.
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
StreamNative
 
Percona Lucid Db
guestd3896369
 
Azure and cloud design patterns
Venkatesh Narayanan
 
Handling Data in Mega Scale Systems
Directi Group
 
SQL Server Integration Services and Analysis Services
Mohan Arumugam
 
Cloud Computing: Hadoop
darugar
 
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
GeeksLab Odessa
 
Ops Jumpstart: MongoDB Administration 101
MongoDB
 
Big data concepts
Serkan Özal
 
The other Apache Technologies your Big Data solution needs
gagravarr
 
Camunda BPM 7.2: Performance and Scalability (English)
camunda services GmbH
 
1 extreme performance - part i
sqlserver.co.il
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop User Group
 
sudoers: Benchmarking Hadoop with ALOJA
Nicolas Poggi
 
Ad

More from Yahoo Developer Network (20)

PDF
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Yahoo Developer Network
 
PDF
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Yahoo Developer Network
 
PDF
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Yahoo Developer Network
 
PDF
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Yahoo Developer Network
 
PDF
CICD at Oath using Screwdriver
Yahoo Developer Network
 
PDF
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Yahoo Developer Network
 
PPTX
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
Yahoo Developer Network
 
PDF
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
Yahoo Developer Network
 
PPTX
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Yahoo Developer Network
 
PPTX
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Yahoo Developer Network
 
PDF
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
Yahoo Developer Network
 
PPTX
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Yahoo Developer Network
 
PDF
Moving the Oath Grid to Docker, Eric Badger, Oath
Yahoo Developer Network
 
PDF
Architecting Petabyte Scale AI Applications
Yahoo Developer Network
 
PDF
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Yahoo Developer Network
 
PPTX
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Yahoo Developer Network
 
PDF
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Yahoo Developer Network
 
PPTX
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
Yahoo Developer Network
 
PPTX
August 2016 HUG: Recent development in Apache Oozie
Yahoo Developer Network
 
PDF
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
Yahoo Developer Network
 
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Yahoo Developer Network
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Yahoo Developer Network
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Yahoo Developer Network
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Yahoo Developer Network
 
CICD at Oath using Screwdriver
Yahoo Developer Network
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Yahoo Developer Network
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
Yahoo Developer Network
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
Yahoo Developer Network
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Yahoo Developer Network
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Yahoo Developer Network
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
Yahoo Developer Network
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Yahoo Developer Network
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Yahoo Developer Network
 
Architecting Petabyte Scale AI Applications
Yahoo Developer Network
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Yahoo Developer Network
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Yahoo Developer Network
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Yahoo Developer Network
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
Yahoo Developer Network
 
August 2016 HUG: Recent development in Apache Oozie
Yahoo Developer Network
 
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
Yahoo Developer Network
 

Recently uploaded (20)

PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
Digital Circuits, important subject in CS
contactparinay1
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 

January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transactional RDBMS

  • 1. Powering OLTP Apps on Hadoop Monte Zweben Co-Founder and CEO January 24, 2015
  • 2. 2 Who are We? THE ONLY HADOOP RDBMS Replace your old RDBMS with a scale-out SQL database Affordable, Scale-Out ACID Transactions No Application Rewrites 10x Better Price/Perf
  • 3. 3 Campaign Management: Harte-Hanks Overview Digital marketing services provider Unified Customer Profile Real-time campaign management Complex OLTP and OLAP environment Challenges Oracle RAC too expensive to scale Queries too slow – even up to ½ hour Getting worse – expect 30-50% data growth Looked for 9 months for a cost-effective solution Solution Diagram Initial Results ¼ cost with commodity scale out 3-7x faster through parallelized queries 10-20x price/perf with no application, BI or ETL rewrites Cross-Channel Campaigns Real-Time Personalization Real-Time Actions
  • 4. 4 Reference Architecture: Operational Apps Provide affordable scale-out for applications with a high concurrency of real-time reads/writes 3rd Party Data Sources Operational App (e.g., CRM, Supply Chain, eCommerce, Unica Campaign Mgmt) Customers Operational Employees Operational Reports & Analytics
  • 5. 5 Reference Architecture: Operational Data Lake Offload real-time reporting and analytics from expensive OLTP and DW systems OLTP Systems Ad Hoc Analytics Operational Data Lake Executive Business Reports Operational Reports & Analytics ERP CRM Supply Chain HR … Data Warehouse Datamart Stream or Batch Updates ETL Real-Time, Event-Driven Apps
  • 6. 6 Reference Architecture: Unified Customer Profile Improve marketing ROI with deeper customer intelligence and better cross-channel coordination Unified Customer Profile (aka DMP) Operational Reports for Campaign Performance Social Feeds Web/eCommerc e Clickstreams WebsiteDatamart Stream or Batch Updates BI Tools Demand Side Platform (DSP) Ad Exchange 1st Party/ CRM Data 3rd Party Data (e.g., Axciom) Ad Perf. Data (e.g., Doubleclick) Email Mktg Data Call Center Data POS Data Email Marketing App Ad Hoc Audience Segmentation BI Tools
  • 7. 7 Proven Building Blocks: Hadoop and Derby APACHE DERBY  ANSI SQL-99 RDBMS  Java-based  ODBC/JDBC Compliant APACHE HBASE/HDFS  Auto-sharding  Real-time updates  Fault-tolerance  Scalability to 100s of PBs  Data replication
  • 8. Derby  100% JAVA ANSI SQL RDBMS – CLI, JDBC, embedded  Modular, Lightweight, Unicode  Authentication and Authorization  Concurrency  Project History  Started as Cloudscape in 1996  Acquired by Informix… then IBM…  IBM Contributed code to Apache project in 2004  An active Apache project with conservative development  DB2 influence. Many of the same limits/features  Has Oracle’s stamp of approval – Java DB and included in JDK6 8
  • 9. Derby Advanced Features  Java Stored Procedures  Triggers  Two-phase commit (XA Support)  Updatable SQL Views  Full Transaction Isolation Support  Encryption  Custom Functions 9
  • 10. Splice SQL Processing  PreparedStatement ps = conn.prepareStatement(“SELECT * FROM T WHERE ID = ?”); 1. Look up in cache using exact text match (skip to 6 if plan found in cache) 2. Parse using JavaCC generated parser 3. Bind to dictionary, acquire types 4. Optimize Plan 5. Generate code for plan 6. Create instance of plan 10
  • 11. Splice Details  Parse Phase  Forms explicit tree of query nodes representing statement  Generate Phase  Generate Java byte code (an Activation) directly into an in-memory byte array  Loaded with special ClassLoader that loads from the byte array  Binds arguments to proper types  Optimize Phase  Determine feasible join strategies  Optimize based on cost estimates  Execute Phase  Instantiates arguments to represent specific statement state  Expressions are methods on Activation  Trees of ResultSets generated that represent the state of the query 11
  • 12. Splice Modifications to Derby 12 Derby Component Derby Splice Version Store Block File-based HBase Tables Indexes B-Tree Dense index in HBase Table Concurrency Lock-based, Aries MVCC - Snapshot Isolation Project-Restrict Plan Predicates on centralized file scanner Predicates pushed to shards and locally applied Aggregation Plan Aggregation serially computed Aggregations pushed to shards and spliced together Join Plan Centralized Hash and NLJ chosen by optimizer Distributed Broadcast, Sort- Merge, Merge, NLJ, and Batch NLJ chosen by optimizer Resource Management Number of Connections and Memory Limitations Task Resource Queues and Write Governor
  • 13. 13 HBase: Proven Scale-Out  Auto-sharding  Scales with commodity hardware  Cost-effective from GBs to PBs  High availability thru failover and replication  LSM-trees
  • 14. 14 Distributed, Parallelized Query Execution Parallelized computation across cluster Moves computation to the data Utilizes HBase co-processors No MapReduce
  • 15. Splice HBase Extensions  Asynchronous Write Pipeline  Non-blocking, flushable writes  Writes data, indexes, and constraints (index) concurrently  Batches writes in chunks for bulk WAL Edits vs. single WAL Edits  Synchronization free internal scanner vs. synchronized external scanner  Linux Scheduler Modeled Resource Manager  Resource Queues that handle DDL, DML, Dictionary, and Maintenance Operations  Sparse Data Support  Efficiently store sparse data  Does not store nulls 15
  • 16. Schema Advantages  Non-Blocking Schema Changes  Add columns in a DDL transaction  No read/write locks while adding columns  Sparse Data Support  Efficiently store sparse data  Does not store nulls 16
  • 17. ANSI SQL-99 Coverage 17  Data types – e.g., INTEGER, REAL, CHARACTER, DATE, BOOLEAN, BIGINT  DDL – e.g., CREATE TABLE, CREATE SCHEMA, ALTER TABLE, DELETE, UPDATE  Predicates – e.g., IN, BETWEEN, LIKE, EXISTS  DML – e.g., INSERT, DELETE, UPDATE, SELECT  Query specification – e.g., SELECT DISTINCT, GROUP BY, HAVING  SET functions – e.g., UNION, ABS, MOD, ALL, CHECK  Aggregation functions – e.g., AVG, MAX, COUNT  String functions – e.g., SUBSTRING, concatenation, UPPER, LOWER, POSITION, TRIM, LENGTH  Conditional functions – e.g., CASE, searched CASE  Privileges – e.g., privileges for SELECT, DELETE, INSERT, EXECUTE  Cursors – e.g., updatable, read-only, positioned DELETE/UPDATE  Joins – e.g., INNER JOIN, LEFT OUTER JOIN  Transactions – e.g., COMMIT, ROLLBACK, READ COMMITTED, REPEATABLE READ, READ UNCOMMITTED, Snapshot Isolation  Sub-queries  Triggers  User-defined functions (UDFs)  Views – including grouped views
  • 18. 18 Lockless, ACID transactions State-of-the-Art Snapshot Isolation 18 Adds multi-row, multi-table transactions to HBase with rollback Fast, lockless, high concurrency ZooKeeper coordination Extends research from Google Percolator, Yahoo Labs, U of Waterloo Transaction A Transaction B Transaction C Ts Tc
  • 19. 19 BI and SQL tool support via ODBC No application rewrites needed 19
  • 20. SQL Database Ecosystem 20 Ad-hoc Analytics Operational (OLTP + OLAP) New SQL IN-MEMORY RDBMSMPP New SQL Proprietary HW Lower Cost Higher Cost Hadoop RDBMS SQL-on-Hadoop Phoenix SQL-on-HBase
  • 21. What People are Saying… 21 Recognized as a key innovator in databases Scaling out on Splice Machine presented some major benefits over Oracle ...automatic balancing between clusters...avoiding the costly licensing issues. Quotes Awards An alternative to today’s RDBMSes, Splice Machine effectively combines traditional relational database technology with the scale-out capabilities of Hadoop. The unique claim of … Splice Machine is that it can run transactional applications as well as support analytics on top of Hadoop.
  • 22. 22 Summary THE ONLY HADOOP RDBMS Replace your old RDBMS with a scale-out SQL database Affordable, Scale-Out ACID Transactions No Application Rewrites 10x Better Price/Perf