Scaling Cloud-Scale Translytics Workloads
with Omid and Phoenix
Ohad Shacham
Yahoo Research
Edward Bortnikov
Yahoo Research
RESEARCH
Yonatan Gottesman
Yahoo Research
Agenda
2
Translytics = Transactions + Analytics
Cloud-Scale Use Cases
Doing it in the HBase-Omid-Phoenix World
Omid and Phoenix Deep Dive
Real-Time Data Processing on the Rise
3
The Applications Perspective
4
Event-to-action/insight latency becomes king
Stream processing, asynchronous execution
Data consistency becomes nontrivial
Complex processing patterns (online reporting to AI)
Data integration across multiple feeds and schemas
OLTP World
Analytics World
Translytics Platforms Vision
5
The best of all worlds: OLTP and Analytics all-in-one
Enable complex, consistent, real-time data processing
Simple API’s with strong guarantees
Built to scale on top of NoSQL data platforms
OLTP Coming to NoSQL
6
Traditional NoSQL guarantees row-level atomicity
Translytics applications often bundle reads and writes
Asynchronous design patterns drive concurrency
Without ACID guarantees, chaos rules!
ACID transactions
Multiple data accesses in a single logical operation
Atomic
“All or nothing” – no partial effect observable
Consistent
The DB transitions from one valid state to another
Isolated
Appear to execute in isolation
Durable
Committed data cannot disappear
Use Case: Audience Targeting for Ads
8
Advertisers optimize campaigns to reach the right user audiences
Ad-tech platforms build and sell audience segments (identity sets)
Segmentation is based on user features (demographics, behavior, …)
Algorithms vary from rule-based heuristics to AI classification
Timeliness directly affects revenue
Real-Time Targeting Platform
9
Storm for Compute
Audience segmentation algorithms embedded in bolts
HBase for Storage
User Profiles (U), Segments (S), and U ↔ S relationships
Kafka for Messaging
Scale: trillions of touchpoints/month
Challenge: Keeping the Data Consistent
10
Shared data is accessed in parallel by multiple bolts
Access patterns are complex
User profile update: read+compute+write
User↔Segment mapping update: two writes
Segment query (scan): read multiple rows
HBase read/write API does not provide atomic guarantees
Omid Comes to Help
11
Transaction Processing layer for Apache HBase
Apache Incubation (started 2015, graduation planned 2019)
Easy-to-use API (good old NoSQL)
Popular consistency model (snapshot isolation)
Battle tested (in prod @Yahoo since 2015, new customers onboarding)
Omid Programming
12
TransactionManager tm = HBaseTransactionManager.newInstance();
TTable txTable = new TTable("MY_TX_TABLE”);
Transaction tx = tm.begin(); // Control path
Put row1 = new Put(Bytes.toBytes("EXAMPLE_ROW1"));
row1.add(family, qualifier, Bytes.toBytes("val1"));
txTable.put(tx, row1); // Data path
Put row2 = new Put(Bytes.toBytes("EXAMPLE_ROW2"));
row2.add(family, qualifier, Bytes.toBytes("val2"));
txTable.put(tx, row2); // Data path
tm.commit(tx); // Control path
SQL Coming to NoSQL
13
NoSQL API is simple but crude and non-standardized
Hard to manage complex schemas (low-level data abstraction)
Hard to implement analytics queries (low-level access primitives)
Hard to optimize for speed (server-side programming required)
Hard to integrate with relational data sources
Use Case: Real-Time Ad Inventory Ingestion
14
Advertisers deploy campaign content & metadata in the marketplace
SQL-speaking external client
Complex schema (many campaign types and optimization goals)
High scalability (growing market)
Campaign operations run multidimensional inventory analytics
Aggregate queries by advertiser, product, time, etc.
ML pipeline learns recommendation models for new campaigns
NoSQL-style access to data
Phoenix comes to Help
15
OLTP and Real-Time Analytics for HBase
Query optimizer transforms SQL to native HBase API calls
Standard SQL interface with JDBC API’s
High level data abstractions (e.g., secondary indexes)
High performance (leverages server-side coprocessors)
Phoenix/Omid Integration
16
Phoenix is designed for public-cloud scale (>10K query servers)
Omid is extremely scalable (>600k tps), low-latency (<5ms), and HA
New Omid release (1.0.1) - SQL features, improved performance
Supports secondary indexes, extended Snapshot Isolation, downstream
filters
Phoenix releases 4.15 and 5.1 include Omid as Phoenix Tps
Phoenix refactored to support multiple TP backends (Omid is default)
Phoenix/Omid Integration performance
17
1M initial inserts, 1Kb each row
Omid in Sync post commit mode
Why do we care?
18
SQL transactions
SELECT * FROM my_table; -- This will start a transaction
UPSERT INTO my_table VALUES (1,'A’);
SELECT count(*) FROM my_table WHERE k=1;
DELETE FROM my_other_table WHERE k=2;
!commit -- Other transactions will now see your updates and you will see theirs
Why do we care?
1919
Non-transactional secondary index update might breaks consistency
(k1, [v1,v2,v3])
Table Index
(v1, k1)
Write (k1, [v1,v2,v3])
Why do we care?
20
Updating the secondary index fails
Out of handlers
Many jiras discuss this issue
20
(k1, [v1,v2,v3])
Table Index
Write (k1, [v1,v2,v3])
Transactions and snapshot isolation
Aborts only on write-write conflicts
Read
point
Write
point
begin commitread(x) write(y) write(x) read(y)
Omid architecture
Client
Begin/Commit
Data Data Data
Commit
Table
Persist
Commit
Verify commitRead/Write
Conflict
Detection
22
Transaction
Manager
Results/Timestamp
Omid low latency (LL) architecture
Client
Begin/Commit
Data Data Data
Commit
Table
Persist
Commit
Verify commitRead/Write
23
Transaction
Manager
Results/Timestamp
Client
Begin
Data Data Data
Commit
Table
t1
Write (k1, v1, t1) Write (k2, v2, t1)
Read (k’, last committed t’ < t1)
(k1, v1, t1) (k2, v2, t1)
Execution example
tr = t1
Transaction
Manager
24
Client
Commit: t1, {k1, k2}
Data Data Data
Commit
Table
t2
(k1, v1, t1) (k2, v2, t1)
Write (t1, t2)
(t1, t2)
Execution example
tr = t1
tc = t2
25
Transaction
Manager
Client
Data Data Data
Commit
Table
Read (k1, t3)
(k1, v1, t1) (k2, v2, t1)
Read (t1)
Execution example
tr = t3
26
Bottleneck!
TSO
(t1, t2)
Client
Data Data Data
Commit
Table
t2
(k1,v1,t1,t2) (k2,v2,t1,t2)
Delete(t1)
Post-Commit
tr = t1
tc = t2
Update
commit
cells
27
TSO
(t1, t2)
Data Data Data
Commit
Table
Read (k1, t3)
Using Commit Cells
Client
tr = t3
28
TSO
(k1,v1,t1,t2) (k2,v2,t1,t2)
Durability
Client
Begin/Commit
Data Data Data
Commit
Table
Persist
Commit
Verify commitRead/Write
29
Transaction
Manager
Results/Timestamp
HBase
table
What about high availability?
Client
Begin/Commit
Data Data Data
Commit
Table
Persist
Commit
Verify commitRead/Write
Single
point of
failure
30
Transaction
Manager
Results/Timestamp
High availability
Client
Begin/Commit
Data Data Data
Commit
Table
Verify commitRead/Write
31
Results/Timestamp
Transaction
Manager
(TSO)
Transaction
Manager
(TSO)
Recovery
state
Force abortPersist
Commit
Benchmark: single-write transaction workload
Easily scales beyond 500K tps
Latency problem solved
TSO latency
bottleneck!TSO latency
bottleneck!
New scenarios for Omid
33
Secondary Indexes
Atomic Updates
How can we update metadata?
On-the-Fly Index Creation
What should we do with inflight transaction?
Extended Snapshot Isolation
Read-Your-Own-Writes Queries
Does not match to snapshot isolation
Secondary index: creation and maintenance
34
T1
T2
T3
CREATE
INDEX
started
T4
CREATE
INDEX
complete
T5
T6
Secondary index: creation and maintenance
35
T1
T2
T3
CREATE
INDEX
started
T4
CREATE
INDEX
complete
T5
T6
Bulk-Insert
into index
Abort
(enforced
upon
commit)
Added by
a
coproces
sor
Added by
a
coproces
sor
Index
update
(stored
procedure)
Extended snapshot isolation
36
BEGIN;
INSERT INTO T
SELECT ID+10 FROM T;
INSERT INTO T
SELECT ID+100 FROM T;
COMMIT;
CREATE TABLE T (ID INT);
...
Moving snapshot implementation
37
Checkpoint for
Statement 1
Checkpoint for
Statement 2
Writes by
Statement 1
Timestamps allocated by TM in blocks.
Client promotes the checkpoint.
Summary
38
Apache Phoenix is a relational database layer for HBase
Apache Phoenix need a scalable and HA Tps
Omid is Battle-Tested, Highly Scalable, Low-Latency Tps
Phoenix-Omid integration provides an efficient OLTP for Hadoop
Cloud-scale use cases in Yahoo

More Related Content

PPTX
Omid: A Transactional Framework for HBase
PDF
Stream Processing made simple with Kafka
PPTX
Time-Series Apache HBase
PPTX
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
PDF
Argus Production Monitoring at Salesforce
PDF
Bellevue Big Data meetup: Dive Deep into Spark Streaming
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
PPTX
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
Omid: A Transactional Framework for HBase
Stream Processing made simple with Kafka
Time-Series Apache HBase
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Argus Production Monitoring at Salesforce
Bellevue Big Data meetup: Dive Deep into Spark Streaming
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...

What's hot (20)

PDF
Streaming Data from Cassandra into Kafka
PDF
Voldemort Nosql
PDF
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
PPTX
Will it Scale? The Secrets behind Scaling Stream Processing Applications
PDF
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...
PDF
HBaseCon2017 Highly-Available HBase
PDF
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
PDF
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
PDF
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
PDF
Apache Sqoop: A Data Transfer Tool for Hadoop
PDF
Patterns of the Lambda Architecture -- 2015 April -- Hadoop Summit, Europe
PPTX
Change Data Capture in Scylla
PPTX
Rolling Out Apache HBase for Mobile Offerings at Visa
PDF
MariaDB ColumnStore
PDF
ApacheCon 2020 - Flink SQL in 2020: Time to show off!
PDF
Principles in Data Stream Processing | Matthias J Sax, Confluent
PDF
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
PDF
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
PPTX
stream-processing-at-linkedin-with-apache-samza
Streaming Data from Cassandra into Kafka
Voldemort Nosql
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
Will it Scale? The Secrets behind Scaling Stream Processing Applications
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...
HBaseCon2017 Highly-Available HBase
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
Apache Sqoop: A Data Transfer Tool for Hadoop
Patterns of the Lambda Architecture -- 2015 April -- Hadoop Summit, Europe
Change Data Capture in Scylla
Rolling Out Apache HBase for Mobile Offerings at Visa
MariaDB ColumnStore
ApacheCon 2020 - Flink SQL in 2020: Time to show off!
Principles in Data Stream Processing | Matthias J Sax, Confluent
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
stream-processing-at-linkedin-with-apache-samza
Ad

Similar to Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix (20)

PPTX
Omid: scalable and highly available transaction processing for Apache Phoenix
PPTX
Omid: scalable and highly available transaction processing for Apache Phoenix
PDF
Omid: Scalable and Highly Available Transaction Processing for Phoenix
PDF
Omid Efficient Transaction Mgmt and Processing for HBase
PDF
Omid: Efficient Transaction Management and Incremental Processing for HBase (...
PDF
hbaseconasia2019 Recent work on HBase at Pinterest
PPTX
How YugaByte DB Implements Distributed PostgreSQL
PPTX
Massively Scalable Applications - TechFerry
PDF
Transaction in HBase, by Andreas Neumann, Cask
PPTX
La creación de una capa operacional con MongoDB
PPTX
Middle Tier Scalability - Present and Future
PDF
Database Systems - A Historical Perspective
PDF
NewSQL - The Future of Databases?
PPTX
Apache Phoenix: Use Cases and New Features
PPTX
The Evolution of a Relational Database Layer over HBase
PPTX
Augmenting MySQL with NoSQL options - Data Lifecycles
PDF
Webinar: NoSQL as the New Normal
PPTX
eHarmony @ Hbase Conference 2016 by vijay vangapandu.
PDF
Guide to NoSQL with MySQL
PPTX
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
Omid: scalable and highly available transaction processing for Apache Phoenix
Omid: scalable and highly available transaction processing for Apache Phoenix
Omid: Scalable and Highly Available Transaction Processing for Phoenix
Omid Efficient Transaction Mgmt and Processing for HBase
Omid: Efficient Transaction Management and Incremental Processing for HBase (...
hbaseconasia2019 Recent work on HBase at Pinterest
How YugaByte DB Implements Distributed PostgreSQL
Massively Scalable Applications - TechFerry
Transaction in HBase, by Andreas Neumann, Cask
La creación de una capa operacional con MongoDB
Middle Tier Scalability - Present and Future
Database Systems - A Historical Perspective
NewSQL - The Future of Databases?
Apache Phoenix: Use Cases and New Features
The Evolution of a Relational Database Layer over HBase
Augmenting MySQL with NoSQL options - Data Lifecycles
Webinar: NoSQL as the New Normal
eHarmony @ Hbase Conference 2016 by vijay vangapandu.
Guide to NoSQL with MySQL
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
PPTX
Managing the Dewey Decimal System
PPTX
Practical NoSQL: Accumulo's dirlist Example
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
PPTX
Security Framework for Multitenant Architecture
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
PPTX
Extending Twitter's Data Platform to Google Cloud
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
PDF
Computer Vision: Coming to a Store Near You
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
PPTX
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
PPTX
Applying Noisy Knowledge Graphs to Real Problems
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Applying Noisy Knowledge Graphs to Real Problems

Recently uploaded (20)

PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
Chapter 5: Probability Theory and Statistics
PPTX
Modernising the Digital Integration Hub
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Five Habits of High-Impact Board Members
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
Unlock new opportunities with location data.pdf
PPTX
Benefits of Physical activity for teenagers.pptx
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
Hindi spoken digit analysis for native and non-native speakers
Developing a website for English-speaking practice to English as a foreign la...
A novel scalable deep ensemble learning framework for big data classification...
DP Operators-handbook-extract for the Mautical Institute
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Taming the Chaos: How to Turn Unstructured Data into Decisions
Group 1 Presentation -Planning and Decision Making .pptx
Chapter 5: Probability Theory and Statistics
Modernising the Digital Integration Hub
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Five Habits of High-Impact Board Members
A comparative study of natural language inference in Swahili using monolingua...
sustainability-14-14877-v2.pddhzftheheeeee
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
Final SEM Unit 1 for mit wpu at pune .pptx
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
Unlock new opportunities with location data.pdf
Benefits of Physical activity for teenagers.pptx

Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix

  • 1. Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix Ohad Shacham Yahoo Research Edward Bortnikov Yahoo Research RESEARCH Yonatan Gottesman Yahoo Research
  • 2. Agenda 2 Translytics = Transactions + Analytics Cloud-Scale Use Cases Doing it in the HBase-Omid-Phoenix World Omid and Phoenix Deep Dive
  • 4. The Applications Perspective 4 Event-to-action/insight latency becomes king Stream processing, asynchronous execution Data consistency becomes nontrivial Complex processing patterns (online reporting to AI) Data integration across multiple feeds and schemas OLTP World Analytics World
  • 5. Translytics Platforms Vision 5 The best of all worlds: OLTP and Analytics all-in-one Enable complex, consistent, real-time data processing Simple API’s with strong guarantees Built to scale on top of NoSQL data platforms
  • 6. OLTP Coming to NoSQL 6 Traditional NoSQL guarantees row-level atomicity Translytics applications often bundle reads and writes Asynchronous design patterns drive concurrency Without ACID guarantees, chaos rules!
  • 7. ACID transactions Multiple data accesses in a single logical operation Atomic “All or nothing” – no partial effect observable Consistent The DB transitions from one valid state to another Isolated Appear to execute in isolation Durable Committed data cannot disappear
  • 8. Use Case: Audience Targeting for Ads 8 Advertisers optimize campaigns to reach the right user audiences Ad-tech platforms build and sell audience segments (identity sets) Segmentation is based on user features (demographics, behavior, …) Algorithms vary from rule-based heuristics to AI classification Timeliness directly affects revenue
  • 9. Real-Time Targeting Platform 9 Storm for Compute Audience segmentation algorithms embedded in bolts HBase for Storage User Profiles (U), Segments (S), and U ↔ S relationships Kafka for Messaging Scale: trillions of touchpoints/month
  • 10. Challenge: Keeping the Data Consistent 10 Shared data is accessed in parallel by multiple bolts Access patterns are complex User profile update: read+compute+write User↔Segment mapping update: two writes Segment query (scan): read multiple rows HBase read/write API does not provide atomic guarantees
  • 11. Omid Comes to Help 11 Transaction Processing layer for Apache HBase Apache Incubation (started 2015, graduation planned 2019) Easy-to-use API (good old NoSQL) Popular consistency model (snapshot isolation) Battle tested (in prod @Yahoo since 2015, new customers onboarding)
  • 12. Omid Programming 12 TransactionManager tm = HBaseTransactionManager.newInstance(); TTable txTable = new TTable("MY_TX_TABLE”); Transaction tx = tm.begin(); // Control path Put row1 = new Put(Bytes.toBytes("EXAMPLE_ROW1")); row1.add(family, qualifier, Bytes.toBytes("val1")); txTable.put(tx, row1); // Data path Put row2 = new Put(Bytes.toBytes("EXAMPLE_ROW2")); row2.add(family, qualifier, Bytes.toBytes("val2")); txTable.put(tx, row2); // Data path tm.commit(tx); // Control path
  • 13. SQL Coming to NoSQL 13 NoSQL API is simple but crude and non-standardized Hard to manage complex schemas (low-level data abstraction) Hard to implement analytics queries (low-level access primitives) Hard to optimize for speed (server-side programming required) Hard to integrate with relational data sources
  • 14. Use Case: Real-Time Ad Inventory Ingestion 14 Advertisers deploy campaign content & metadata in the marketplace SQL-speaking external client Complex schema (many campaign types and optimization goals) High scalability (growing market) Campaign operations run multidimensional inventory analytics Aggregate queries by advertiser, product, time, etc. ML pipeline learns recommendation models for new campaigns NoSQL-style access to data
  • 15. Phoenix comes to Help 15 OLTP and Real-Time Analytics for HBase Query optimizer transforms SQL to native HBase API calls Standard SQL interface with JDBC API’s High level data abstractions (e.g., secondary indexes) High performance (leverages server-side coprocessors)
  • 16. Phoenix/Omid Integration 16 Phoenix is designed for public-cloud scale (>10K query servers) Omid is extremely scalable (>600k tps), low-latency (<5ms), and HA New Omid release (1.0.1) - SQL features, improved performance Supports secondary indexes, extended Snapshot Isolation, downstream filters Phoenix releases 4.15 and 5.1 include Omid as Phoenix Tps Phoenix refactored to support multiple TP backends (Omid is default)
  • 17. Phoenix/Omid Integration performance 17 1M initial inserts, 1Kb each row Omid in Sync post commit mode
  • 18. Why do we care? 18 SQL transactions SELECT * FROM my_table; -- This will start a transaction UPSERT INTO my_table VALUES (1,'A’); SELECT count(*) FROM my_table WHERE k=1; DELETE FROM my_other_table WHERE k=2; !commit -- Other transactions will now see your updates and you will see theirs
  • 19. Why do we care? 1919 Non-transactional secondary index update might breaks consistency (k1, [v1,v2,v3]) Table Index (v1, k1) Write (k1, [v1,v2,v3])
  • 20. Why do we care? 20 Updating the secondary index fails Out of handlers Many jiras discuss this issue 20 (k1, [v1,v2,v3]) Table Index Write (k1, [v1,v2,v3])
  • 21. Transactions and snapshot isolation Aborts only on write-write conflicts Read point Write point begin commitread(x) write(y) write(x) read(y)
  • 22. Omid architecture Client Begin/Commit Data Data Data Commit Table Persist Commit Verify commitRead/Write Conflict Detection 22 Transaction Manager Results/Timestamp
  • 23. Omid low latency (LL) architecture Client Begin/Commit Data Data Data Commit Table Persist Commit Verify commitRead/Write 23 Transaction Manager Results/Timestamp
  • 24. Client Begin Data Data Data Commit Table t1 Write (k1, v1, t1) Write (k2, v2, t1) Read (k’, last committed t’ < t1) (k1, v1, t1) (k2, v2, t1) Execution example tr = t1 Transaction Manager 24
  • 25. Client Commit: t1, {k1, k2} Data Data Data Commit Table t2 (k1, v1, t1) (k2, v2, t1) Write (t1, t2) (t1, t2) Execution example tr = t1 tc = t2 25 Transaction Manager
  • 26. Client Data Data Data Commit Table Read (k1, t3) (k1, v1, t1) (k2, v2, t1) Read (t1) Execution example tr = t3 26 Bottleneck! TSO (t1, t2)
  • 27. Client Data Data Data Commit Table t2 (k1,v1,t1,t2) (k2,v2,t1,t2) Delete(t1) Post-Commit tr = t1 tc = t2 Update commit cells 27 TSO (t1, t2)
  • 28. Data Data Data Commit Table Read (k1, t3) Using Commit Cells Client tr = t3 28 TSO (k1,v1,t1,t2) (k2,v2,t1,t2)
  • 29. Durability Client Begin/Commit Data Data Data Commit Table Persist Commit Verify commitRead/Write 29 Transaction Manager Results/Timestamp HBase table
  • 30. What about high availability? Client Begin/Commit Data Data Data Commit Table Persist Commit Verify commitRead/Write Single point of failure 30 Transaction Manager Results/Timestamp
  • 31. High availability Client Begin/Commit Data Data Data Commit Table Verify commitRead/Write 31 Results/Timestamp Transaction Manager (TSO) Transaction Manager (TSO) Recovery state Force abortPersist Commit
  • 32. Benchmark: single-write transaction workload Easily scales beyond 500K tps Latency problem solved TSO latency bottleneck!TSO latency bottleneck!
  • 33. New scenarios for Omid 33 Secondary Indexes Atomic Updates How can we update metadata? On-the-Fly Index Creation What should we do with inflight transaction? Extended Snapshot Isolation Read-Your-Own-Writes Queries Does not match to snapshot isolation
  • 34. Secondary index: creation and maintenance 34 T1 T2 T3 CREATE INDEX started T4 CREATE INDEX complete T5 T6
  • 35. Secondary index: creation and maintenance 35 T1 T2 T3 CREATE INDEX started T4 CREATE INDEX complete T5 T6 Bulk-Insert into index Abort (enforced upon commit) Added by a coproces sor Added by a coproces sor Index update (stored procedure)
  • 36. Extended snapshot isolation 36 BEGIN; INSERT INTO T SELECT ID+10 FROM T; INSERT INTO T SELECT ID+100 FROM T; COMMIT; CREATE TABLE T (ID INT); ...
  • 37. Moving snapshot implementation 37 Checkpoint for Statement 1 Checkpoint for Statement 2 Writes by Statement 1 Timestamps allocated by TM in blocks. Client promotes the checkpoint.
  • 38. Summary 38 Apache Phoenix is a relational database layer for HBase Apache Phoenix need a scalable and HA Tps Omid is Battle-Tested, Highly Scalable, Low-Latency Tps Phoenix-Omid integration provides an efficient OLTP for Hadoop Cloud-scale use cases in Yahoo