SlideShare a Scribd company logo
© 2020 Scalar, inc.
Scalar DB: Universal Transaction Manager
20 Jan, 2022 at Big Data System class in Keio University
Hiroyuki Yamada
CTO&CEO at Scalar, Inc.
1
© 2020 Scalar, inc.
What is Scalar DB
• A universal transaction manager
– Provide a database-agnostic ACID transaction
– The architecture is inspired by Deuteronomy [CIDR’09,11]
4
https://github.com/scalar-labs/scalardb
© 2020 Scalar, inc.
Motivation / Use Cases
• Database abstraction
• Transaction manager for non-transactional databases (NoSQLs)
• Transaction manager for heterogeneous databases
6
MySQL Amazon
DynamoDB
Scalar
DB
Scalar
DB
App App
Enables database migration
without modifying the App
Apache
Cassandra
Scalar
DB
App
Adds transaction capability to
non-transactional databases
PostgreSQL Azure
Cosmos DB
Scalar
DB
App
Achieves transaction over
multiple different databases
Database abstraction Transaction manager for
NoSQLs
Transaction manager for
heterogeneous databases
© 2020 Scalar, inc.
Pros and Cons of Scalar DB Approach
• Universal
– Can work on various
database systems
• Non-invasive
– Any modifications to the
underlying databases are
not required
• Flexible Scalability
– Transaction layer and
storage layer can be
independently scaled
6
• Slower than Distributed SQLs
– More abstraction layers and
storage-oblivious
transaction manager
• Hard to optimize
– Transaction manager has
not much information about
storage
• No SQL support
– A transaction has to be
written procedurally with a
programming language
– (Now working on SQL I/F)
© 2020 Scalar, inc.
System Architecture
6
gRPC
(HTTP/2)
Scalar DB
transaction library
(transaction logic)
Command
execution
/ HTTP
Databases
Scalar DB
Client
Command
execution
/ HTTP
Databases
Scalar DB Server
(transaction logic)
Application
program
• Scalar DB can be used in two ways:
Application
program
Database-
specific protocol
Database-
specific protocol
© 2020 Scalar, inc.
Programming Interface
• CRUD interface
– put, get, scan (partition-level), delete
• Begin and commit semantics
– Arbitrary number of operations can be handled
8
DistributedTranasctionManager manager = …;
DistributedTransaction transaction = manager.start();
Get get = createGet();
Optional<Result> result = transaction.get(get);
Pub put = createPut(result);
transaction.put(put);
transaction.commit();
© 2020 Scalar, inc.
Data Model
• Multi-dimensional map [OSDI’06]
– (partition-key, clustering-key, value-name) -> value-content
– Assumed to be hash partitioned
9
© 2020 Scalar, inc.
Transaction Management - Overview
• Based on Cherry Garcia [ICDE’15]
– Two phase commit with linearizable operations (for Atomicity)
– Protocol correction is our extended work
– Distributed WAL records (for Atomicity and Durability)
– Single version optimistic concurrency control (for Isolation)
– Serializability support is our extended work
• Requirements in underlining databases/storages
– Linearizable read and linearizable conditional/CAS write
– An ability to store metadata for each record
10
© 2020 Scalar, inc.
Transaction Commit Protocol (for Atomicity)
• Two phase commit protocol (2PC) with linearizable operations
– Similar to Paxos Commit [TODS’06]
– Two phase commit on distributed records
• The protocol
– Prepare phase: prepare records
– Commit phase 1: commit status record
– This is where a transaction is regarded as committed or aborted
– Commit phase 2: commit records
• Lazy recovery
– Uncommitted records will be rollforwarded or rollbacked based on the
status of a transaction when the records are read
11
© 2020 Scalar, inc.
Distributed WAL (for Atomicity and Durability)
• WAL (Write-Ahead Logging) is distributed into records
12
Application data Transaction metadata
After image Before image
Application data
(Before)
Transaction metadata
(Before)
Status Version TxID
Status
(before)
Version
(before)
TxID
(before)
TxID Status Other metadata
Status Record
in coordinator
table
User/Application
Record
in user tables
Application data
(managed by users)
Transaction metadata
(managed by Scalar DB)
© 2020 Scalar, inc.
Concurrency Control (for Isolation)
• Single version OCC
– Simple implementation of Snapshot Isolation
– Conflicts are detected by linearizable conditional write
– No clock dependency, no use of HLC (Hybrid Logical Clock)
• Supported isolation level
– Read-committed Snapshot Isolation (RCSI)
– Read-skew, write-skew, read-only, phantom anomalies could
happen
– Serializable
– No anomalies (Strict Serializability)
– RCSI-based but non-serializable schedules are aborted
13
© 2020 Scalar, inc.
Transaction with Example – Before Prepare
14
Tx1
Tx1’s memory space
Database
UserID Balance Status Version
1 100 C 5
TxID
XXX
2 100 C 4
YYY
© 2020 Scalar, inc.
Transaction with Example – Before Prepare
14
Tx1
Tx1’s memory space
Database
Read
UserID Balance Status Version
1 100 C 5
TxID
XXX
2 100 C 4
YYY
© 2020 Scalar, inc.
Transaction with Example – Before Prepare
14
Tx1
Tx1’s memory space
Database
Read
UserID Balance Status Version
1 100 C 5
TxID
XXX
2 100 C 4
YYY
UserID Balance Status Version
1 100 C 5
TxID
XXX
2 100 C 4
YYY
© 2020 Scalar, inc.
Transaction with Example – Before Prepare
14
Tx1
Tx1’s memory space
Database
Read
UserID Balance Status Version
1 100 C 5
TxID
XXX
2 100 C 4
YYY
1 80 P 6
Tx1
2 120 P 5
Tx1
Tx1: Transfer 20 from 1 to 2
UserID Balance Status Version
1 100 C 5
TxID
XXX
2 100 C 4
YYY
© 2020 Scalar, inc.
Transaction with Example – Prepare Phase
14
Tx1
Tx1’s memory space
Database
Read
Linearizable
conditional write
Update only if
the versions and the
TxIDs are the same as
the ones it read
UserID Balance Status Version
1 100 C 5
TxID
XXX
2 100 C 4
YYY
1 80 P 6
Tx1
2 120 P 5
Tx1
Tx1: Transfer 20 from 1 to 2
UserID Balance Status Version
1 100 C 5
TxID
XXX
2 100 C 4
YYY
© 2020 Scalar, inc.
Transaction with Example – Prepare Phase
14
Tx1
Tx1’s memory space
Database
Read
Linearizable
conditional write
Update only if
the versions and the
TxIDs are the same as
the ones it read
UserID Balance Status Version
1 100 C 5
TxID
XXX
2 100 C 4
YYY
1 80 P 6
Tx1
2 120 P 5
Tx1
Tx1: Transfer 20 from 1 to 2
UserID Balance Status Version
1 80 C 5
TxID
XXX
2 120 C 4
YYY
P 6
Tx1
P 5
Tx1
© 2020 Scalar, inc.
Transaction with Example – Prepare Phase
14
Tx1
Tx1’s memory space
Database
Read
Linearizable
conditional write
Update only if
the versions and the
TxIDs are the same as
the ones it read
UserID Balance Status Version
1 100 C 5
TxID
XXX
2 100 C 4
YYY
1 80 P 6
Tx1
2 120 P 5
Tx1
Tx1: Transfer 20 from 1 to 2
Tx2
UserID Balance Status Version
1 100 C 5
Tx2’s memory space
Tx2: Transfer 10 from 1 to 2
TxID
XXX
2 100 C 4
YYY
1 90 P 6
Tx2
2 110 P 5
Tx2
UserID Balance Status Version
1 80 C 5
TxID
XXX
2 120 C 4
YYY
P 6
Tx1
P 5
Tx1
© 2020 Scalar, inc.
Transaction with Example – Prepare Phase
14
Tx1
Tx1’s memory space
Database
Read
Linearizable
conditional write
Update only if
the versions and the
TxIDs are the same as
the ones it read
Fail due to
the condition mismatch
UserID Balance Status Version
1 100 C 5
TxID
XXX
2 100 C 4
YYY
1 80 P 6
Tx1
2 120 P 5
Tx1
Tx1: Transfer 20 from 1 to 2
Tx2
UserID Balance Status Version
1 100 C 5
Tx2’s memory space
Tx2: Transfer 10 from 1 to 2
TxID
XXX
2 100 C 4
YYY
1 90 P 6
Tx2
2 110 P 5
Tx2
UserID Balance Status Version
1 80 C 5
TxID
XXX
2 120 C 4
YYY
P 6
Tx1
P 5
Tx1
© 2020 Scalar, inc.
Transaction with Example – Commit Phase 1
15
UserID Balance Status Version
1 80 P 6
TxID
Tx1
2 120 P 5
Tx1
Status
C
TxID
XXX
C
YYY
A
ZZZ
Tx1
Database
© 2020 Scalar, inc.
Transaction with Example – Commit Phase 1
15
UserID Balance Status Version
1 80 P 6
TxID
Tx1
2 120 P 5
Tx1
Status
C
TxID
XXX
C
YYY
A
ZZZ
C
Tx1
Linearizable
conditional write
Update if
the Tx1
does not exist
Tx1
Database
© 2020 Scalar, inc.
Transaction with Example – Commit Phase 2
16
Database
UserID Balance Status Version
1 80 C 6
TxID
Tx1
2 120 C 5
Tx1
Status
C
TxID
XXX
C
YYY
A
ZZZ
C
Tx1
Linearizable
conditional write
Update status if
the record is
prepared by the Tx1
Tx1
© 2020 Scalar, inc.
Recovery
17
Prepare
Phase
Commit
Phase1
Commit
Phase2
TX1
• Recovery is lazily done when a record is read
Nothing is
needed
(local memory
space needs to
be cleared)
Recovery
process
Rollbacked by
another TX
lazily using
before image
Roll-forwarded
by another TX
lazily updating
status to C
No need for
recovery
Crash
© 2020 Scalar, inc.
Performance Optimization – Parallel Commit and Deferred Commit
18
W(X)
W(Y)
P(X)
P(Y)
C
C(X)
C(Y)
Prepare
Phase
Commit
Phase1
Commit
Phase2
Parallel
Commit
Deferred
Commit
W(X)
W(Y)
P(X) P(Y)
C
C(X) C(Y)
W(X)
W(Y)
P(X) P(Y)
C
C(X) C(Y)
• Parallel Commit
– Parallelize prepare-records and commit-records
• Deferred Commit
– Return to a caller without committing records
Executed after
the TX returns
© 2020 Scalar, inc.
Serializable Strategy
• RCSI causes some anomalies
– Read-skew, write-skew, read-only, and phantom
anomalies
• Basic strategy to make RCSI serializable
– Avoid anti/rw-dependency dangerous structure [TODS’05]
– No use of SSI [SIGMOD’08] or its variant [EuroSys’12]
– Many linearizable operations for managing in/outConflicts
or correct clock are required
– Two implementations: Extra-write and Extra-read
18
© 2020 Scalar, inc.
Serializable Strategy – Extra-write and Extra-read
18
R(X)
W(Y)
P(Y)
C
C(Y)
R(X)
W(Y)
P(Y)
C
C(Y)
P(X)
C(X)
• Extra-write
– Convert read into write. Extra care is done if a record doesn’t exist.
• Extra-read
– Check read-set after prepared to see if it is not updated by other transactions
Write the
same record
R(X)
W(Y)
P(Y)
C
C(Y)
R(X)
W(Y)
P(Y)
C
C(Y)
V(X)
Re-read
(validate) the
record and abort
if it is changed
Extra-write Extra-read
© 2020 Scalar, inc.
Transactions on Heterogeneous Databases
• Scalar DB achieves ACID transaction spanning multiple different databases
• Two types of interfaces:
– One-phase and two-phase
18
MySQL Cassandra
Scalar DB
Application
MySQL Cassandra
Scalar DB
Microservice1
Scalar DB
Microservice2
TxID
One-phase Two-phase
© 2020 Scalar, inc.
Benchmark Results with Scalar DB on Cassandra
19
Workload2 (Evidence)
Workload1 (Payment)
Each node: i3.4xlarge (16 vCPUs, 122 GB RAM, 1900 GB NVMe SSD * 2), RF: 3
• Achieved 90 % scalability in 100-node cluster
(Compared to the Ideal TPS based on the performance of 3-node cluster)
© 2020 Scalar, inc.
Verification Results for Scalar DB
• Scalar DB has been heavily tested with Jepsen and Elle [VLDB’21]
– Jepsen tests are created and conducted by Scalar
– See https://github.com/scalar-labs/scalar-jepsen for more detail
• Transaction commit protocol is verified with TLA+
– See https://github.com/scalar-labs/scalardb/tree/master/tla%2B/consensus-commit
20
Jepsen
Passed TLA+
Passed
© 2020 Scalar, inc.
Summary
• Scalar DB is a universal transaction manager
– Provide database-agnostic transactions on various databases
– Cassandra, HBase, Amazon DynamoDB, Azure Cosmos DB, MySQL,
PostgreSQL, Oracle Database, SQL Server, Amazon RDS, Amazon
Aurora, ScyllaDB
– Achieve transactions spanning heterogeneous databases
– Enhanced to guarantee strict Serializability
– Transaction consistency and scalability are verified extensively
• Future work
– GraphQL I/F, SQL I/F, More adaptors (mongodb, Kafka…)
18

More Related Content

What's hot (20)

PDF
Scalar DB: A library that makes non-ACID databases ACID-compliant
Scalar, Inc.
 
PDF
Making Cassandra more capable, faster, and more reliable (at ApacheCon@Home 2...
Scalar, Inc.
 
PPTX
トランザクションの設計と進化
Kumazaki Hiroki
 
PDF
並行実行制御の最適化手法
Sho Nakazono
 
PPTX
トランザクションをSerializableにする4つの方法
Kumazaki Hiroki
 
PDF
Transaction Management on Cassandra
Scalar, Inc.
 
PDF
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Jignesh Shah
 
PDF
MariaDB MaxScale monitor 매뉴얼
NeoClova
 
DOCX
Keepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docx
NeoClova
 
PPTX
Kafka replication apachecon_2013
Jun Rao
 
PDF
The Forefront of the Development for NVDIMM on Linux Kernel (Linux Plumbers c...
Yasunori Goto
 
PDF
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
HostedbyConfluent
 
PDF
個人データ連携から見えるSociety5.0~法令対応に向けた技術的な活用事例について~
Scalar, Inc.
 
PDF
MySQL Database Architectures - MySQL InnoDB ClusterSet 2021-11
Kenny Gryp
 
PPTX
Kafka 101
Aparna Pillai
 
PDF
Kafka・Storm・ZooKeeperの認証と認可について #kafkajp
Yahoo!デベロッパーネットワーク
 
PDF
普通の人でもわかる Paxos
tyonekura
 
PDF
Pacemakerを使いこなそう
Takatoshi Matsuo
 
PPT
Cassandraのしくみ データの読み書き編
Yuki Morishita
 
PDF
PostgreSQL16新機能紹介 - libpq接続ロード・バランシング(第41回PostgreSQLアンカンファレンス@オンライン 発表資料)
NTT DATA Technology & Innovation
 
Scalar DB: A library that makes non-ACID databases ACID-compliant
Scalar, Inc.
 
Making Cassandra more capable, faster, and more reliable (at ApacheCon@Home 2...
Scalar, Inc.
 
トランザクションの設計と進化
Kumazaki Hiroki
 
並行実行制御の最適化手法
Sho Nakazono
 
トランザクションをSerializableにする4つの方法
Kumazaki Hiroki
 
Transaction Management on Cassandra
Scalar, Inc.
 
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Jignesh Shah
 
MariaDB MaxScale monitor 매뉴얼
NeoClova
 
Keepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docx
NeoClova
 
Kafka replication apachecon_2013
Jun Rao
 
The Forefront of the Development for NVDIMM on Linux Kernel (Linux Plumbers c...
Yasunori Goto
 
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
HostedbyConfluent
 
個人データ連携から見えるSociety5.0~法令対応に向けた技術的な活用事例について~
Scalar, Inc.
 
MySQL Database Architectures - MySQL InnoDB ClusterSet 2021-11
Kenny Gryp
 
Kafka 101
Aparna Pillai
 
Kafka・Storm・ZooKeeperの認証と認可について #kafkajp
Yahoo!デベロッパーネットワーク
 
普通の人でもわかる Paxos
tyonekura
 
Pacemakerを使いこなそう
Takatoshi Matsuo
 
Cassandraのしくみ データの読み書き編
Yuki Morishita
 
PostgreSQL16新機能紹介 - libpq接続ロード・バランシング(第41回PostgreSQLアンカンファレンス@オンライン 発表資料)
NTT DATA Technology & Innovation
 

Similar to Scalar DB: Universal Transaction Manager (20)

PDF
Blockchain meets database
YongraeJo
 
PDF
GoshawkDB: Making Time with Vector Clocks
C4Media
 
PPT
ddededeeddeededeeeeeeeeeeeeeeeeeeeeeeeeeeee
aadvikdalalug23
 
PDF
Scalable Transaction Management on Cloud Data Management Systems
IOSR Journals
 
PPTX
Scylla Summit 2018: Consensus in Eventually Consistent Databases
ScyllaDB
 
PPTX
Transaction of program execution updates
kumari36
 
PDF
Transaction Management - Lecture 11 - Introduction to Databases (1007156ANR)
Beat Signer
 
DOCX
UNIT-IV: Transaction Processing Concepts
Raj vardhan
 
PPTX
Transaction Process System and Recovery
Jitendra Thakur
 
PPTX
Database ,10 Transactions
Ali Usman
 
DOCX
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
Aravind NC
 
PDF
Cs501 transaction
Kamal Singh Lodhi
 
PPTX
Transactions
Ketaki_Pattani
 
PPTX
How YugaByte DB Implements Distributed PostgreSQL
Yugabyte
 
PPTX
An Overview of Apache Cassandra
DataStax
 
PDF
UNIT 2- TRANSACTION CONCEPTS AND CONCURRENCY CONCEPTS (1).pdf
KavitaShinde26
 
PDF
Postgres-XC Write Scalable PostgreSQL Cluster
Mason Sharp
 
PPTX
DBMS_Unit-4 data bas management (1).pptx
cherukuriyuvaraju9
 
Blockchain meets database
YongraeJo
 
GoshawkDB: Making Time with Vector Clocks
C4Media
 
ddededeeddeededeeeeeeeeeeeeeeeeeeeeeeeeeeee
aadvikdalalug23
 
Scalable Transaction Management on Cloud Data Management Systems
IOSR Journals
 
Scylla Summit 2018: Consensus in Eventually Consistent Databases
ScyllaDB
 
Transaction of program execution updates
kumari36
 
Transaction Management - Lecture 11 - Introduction to Databases (1007156ANR)
Beat Signer
 
UNIT-IV: Transaction Processing Concepts
Raj vardhan
 
Transaction Process System and Recovery
Jitendra Thakur
 
Database ,10 Transactions
Ali Usman
 
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
Aravind NC
 
Cs501 transaction
Kamal Singh Lodhi
 
Transactions
Ketaki_Pattani
 
How YugaByte DB Implements Distributed PostgreSQL
Yugabyte
 
An Overview of Apache Cassandra
DataStax
 
UNIT 2- TRANSACTION CONCEPTS AND CONCURRENCY CONCEPTS (1).pdf
KavitaShinde26
 
Postgres-XC Write Scalable PostgreSQL Cluster
Mason Sharp
 
DBMS_Unit-4 data bas management (1).pptx
cherukuriyuvaraju9
 
Ad

Recently uploaded (20)

PPTX
The Role of a PHP Development Company in Modern Web Development
SEO Company for School in Delhi NCR
 
PDF
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
PPTX
A Complete Guide to Salesforce SMS Integrations Build Scalable Messaging With...
360 SMS APP
 
PDF
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
PDF
Executive Business Intelligence Dashboards
vandeslie24
 
PPT
MergeSortfbsjbjsfk sdfik k
RafishaikIT02044
 
PDF
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
PDF
Efficient, Automated Claims Processing Software for Insurers
Insurance Tech Services
 
PPTX
MailsDaddy Outlook OST to PST converter.pptx
abhishekdutt366
 
PPTX
MiniTool Power Data Recovery Full Crack Latest 2025
muhammadgurbazkhan
 
PDF
Beyond Binaries: Understanding Diversity and Allyship in a Global Workplace -...
Imma Valls Bernaus
 
PPTX
Platform for Enterprise Solution - Java EE5
abhishekoza1981
 
PDF
GetOnCRM Speeds Up Agentforce 3 Deployment for Enterprise AI Wins.pdf
GetOnCRM Solutions
 
DOCX
Import Data Form Excel to Tally Services
Tally xperts
 
PDF
Understanding the Need for Systemic Change in Open Source Through Intersectio...
Imma Valls Bernaus
 
PPTX
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
PDF
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
PDF
Powering GIS with FME and VertiGIS - Peak of Data & AI 2025
Safe Software
 
PDF
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
PPTX
Comprehensive Guide: Shoviv Exchange to Office 365 Migration Tool 2025
Shoviv Software
 
The Role of a PHP Development Company in Modern Web Development
SEO Company for School in Delhi NCR
 
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
A Complete Guide to Salesforce SMS Integrations Build Scalable Messaging With...
360 SMS APP
 
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
Executive Business Intelligence Dashboards
vandeslie24
 
MergeSortfbsjbjsfk sdfik k
RafishaikIT02044
 
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
Efficient, Automated Claims Processing Software for Insurers
Insurance Tech Services
 
MailsDaddy Outlook OST to PST converter.pptx
abhishekdutt366
 
MiniTool Power Data Recovery Full Crack Latest 2025
muhammadgurbazkhan
 
Beyond Binaries: Understanding Diversity and Allyship in a Global Workplace -...
Imma Valls Bernaus
 
Platform for Enterprise Solution - Java EE5
abhishekoza1981
 
GetOnCRM Speeds Up Agentforce 3 Deployment for Enterprise AI Wins.pdf
GetOnCRM Solutions
 
Import Data Form Excel to Tally Services
Tally xperts
 
Understanding the Need for Systemic Change in Open Source Through Intersectio...
Imma Valls Bernaus
 
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
Powering GIS with FME and VertiGIS - Peak of Data & AI 2025
Safe Software
 
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
Comprehensive Guide: Shoviv Exchange to Office 365 Migration Tool 2025
Shoviv Software
 
Ad

Scalar DB: Universal Transaction Manager

  • 1. © 2020 Scalar, inc. Scalar DB: Universal Transaction Manager 20 Jan, 2022 at Big Data System class in Keio University Hiroyuki Yamada CTO&CEO at Scalar, Inc. 1
  • 2. © 2020 Scalar, inc. What is Scalar DB • A universal transaction manager – Provide a database-agnostic ACID transaction – The architecture is inspired by Deuteronomy [CIDR’09,11] 4 https://github.com/scalar-labs/scalardb
  • 3. © 2020 Scalar, inc. Motivation / Use Cases • Database abstraction • Transaction manager for non-transactional databases (NoSQLs) • Transaction manager for heterogeneous databases 6 MySQL Amazon DynamoDB Scalar DB Scalar DB App App Enables database migration without modifying the App Apache Cassandra Scalar DB App Adds transaction capability to non-transactional databases PostgreSQL Azure Cosmos DB Scalar DB App Achieves transaction over multiple different databases Database abstraction Transaction manager for NoSQLs Transaction manager for heterogeneous databases
  • 4. © 2020 Scalar, inc. Pros and Cons of Scalar DB Approach • Universal – Can work on various database systems • Non-invasive – Any modifications to the underlying databases are not required • Flexible Scalability – Transaction layer and storage layer can be independently scaled 6 • Slower than Distributed SQLs – More abstraction layers and storage-oblivious transaction manager • Hard to optimize – Transaction manager has not much information about storage • No SQL support – A transaction has to be written procedurally with a programming language – (Now working on SQL I/F)
  • 5. © 2020 Scalar, inc. System Architecture 6 gRPC (HTTP/2) Scalar DB transaction library (transaction logic) Command execution / HTTP Databases Scalar DB Client Command execution / HTTP Databases Scalar DB Server (transaction logic) Application program • Scalar DB can be used in two ways: Application program Database- specific protocol Database- specific protocol
  • 6. © 2020 Scalar, inc. Programming Interface • CRUD interface – put, get, scan (partition-level), delete • Begin and commit semantics – Arbitrary number of operations can be handled 8 DistributedTranasctionManager manager = …; DistributedTransaction transaction = manager.start(); Get get = createGet(); Optional<Result> result = transaction.get(get); Pub put = createPut(result); transaction.put(put); transaction.commit();
  • 7. © 2020 Scalar, inc. Data Model • Multi-dimensional map [OSDI’06] – (partition-key, clustering-key, value-name) -> value-content – Assumed to be hash partitioned 9
  • 8. © 2020 Scalar, inc. Transaction Management - Overview • Based on Cherry Garcia [ICDE’15] – Two phase commit with linearizable operations (for Atomicity) – Protocol correction is our extended work – Distributed WAL records (for Atomicity and Durability) – Single version optimistic concurrency control (for Isolation) – Serializability support is our extended work • Requirements in underlining databases/storages – Linearizable read and linearizable conditional/CAS write – An ability to store metadata for each record 10
  • 9. © 2020 Scalar, inc. Transaction Commit Protocol (for Atomicity) • Two phase commit protocol (2PC) with linearizable operations – Similar to Paxos Commit [TODS’06] – Two phase commit on distributed records • The protocol – Prepare phase: prepare records – Commit phase 1: commit status record – This is where a transaction is regarded as committed or aborted – Commit phase 2: commit records • Lazy recovery – Uncommitted records will be rollforwarded or rollbacked based on the status of a transaction when the records are read 11
  • 10. © 2020 Scalar, inc. Distributed WAL (for Atomicity and Durability) • WAL (Write-Ahead Logging) is distributed into records 12 Application data Transaction metadata After image Before image Application data (Before) Transaction metadata (Before) Status Version TxID Status (before) Version (before) TxID (before) TxID Status Other metadata Status Record in coordinator table User/Application Record in user tables Application data (managed by users) Transaction metadata (managed by Scalar DB)
  • 11. © 2020 Scalar, inc. Concurrency Control (for Isolation) • Single version OCC – Simple implementation of Snapshot Isolation – Conflicts are detected by linearizable conditional write – No clock dependency, no use of HLC (Hybrid Logical Clock) • Supported isolation level – Read-committed Snapshot Isolation (RCSI) – Read-skew, write-skew, read-only, phantom anomalies could happen – Serializable – No anomalies (Strict Serializability) – RCSI-based but non-serializable schedules are aborted 13
  • 12. © 2020 Scalar, inc. Transaction with Example – Before Prepare 14 Tx1 Tx1’s memory space Database UserID Balance Status Version 1 100 C 5 TxID XXX 2 100 C 4 YYY
  • 13. © 2020 Scalar, inc. Transaction with Example – Before Prepare 14 Tx1 Tx1’s memory space Database Read UserID Balance Status Version 1 100 C 5 TxID XXX 2 100 C 4 YYY
  • 14. © 2020 Scalar, inc. Transaction with Example – Before Prepare 14 Tx1 Tx1’s memory space Database Read UserID Balance Status Version 1 100 C 5 TxID XXX 2 100 C 4 YYY UserID Balance Status Version 1 100 C 5 TxID XXX 2 100 C 4 YYY
  • 15. © 2020 Scalar, inc. Transaction with Example – Before Prepare 14 Tx1 Tx1’s memory space Database Read UserID Balance Status Version 1 100 C 5 TxID XXX 2 100 C 4 YYY 1 80 P 6 Tx1 2 120 P 5 Tx1 Tx1: Transfer 20 from 1 to 2 UserID Balance Status Version 1 100 C 5 TxID XXX 2 100 C 4 YYY
  • 16. © 2020 Scalar, inc. Transaction with Example – Prepare Phase 14 Tx1 Tx1’s memory space Database Read Linearizable conditional write Update only if the versions and the TxIDs are the same as the ones it read UserID Balance Status Version 1 100 C 5 TxID XXX 2 100 C 4 YYY 1 80 P 6 Tx1 2 120 P 5 Tx1 Tx1: Transfer 20 from 1 to 2 UserID Balance Status Version 1 100 C 5 TxID XXX 2 100 C 4 YYY
  • 17. © 2020 Scalar, inc. Transaction with Example – Prepare Phase 14 Tx1 Tx1’s memory space Database Read Linearizable conditional write Update only if the versions and the TxIDs are the same as the ones it read UserID Balance Status Version 1 100 C 5 TxID XXX 2 100 C 4 YYY 1 80 P 6 Tx1 2 120 P 5 Tx1 Tx1: Transfer 20 from 1 to 2 UserID Balance Status Version 1 80 C 5 TxID XXX 2 120 C 4 YYY P 6 Tx1 P 5 Tx1
  • 18. © 2020 Scalar, inc. Transaction with Example – Prepare Phase 14 Tx1 Tx1’s memory space Database Read Linearizable conditional write Update only if the versions and the TxIDs are the same as the ones it read UserID Balance Status Version 1 100 C 5 TxID XXX 2 100 C 4 YYY 1 80 P 6 Tx1 2 120 P 5 Tx1 Tx1: Transfer 20 from 1 to 2 Tx2 UserID Balance Status Version 1 100 C 5 Tx2’s memory space Tx2: Transfer 10 from 1 to 2 TxID XXX 2 100 C 4 YYY 1 90 P 6 Tx2 2 110 P 5 Tx2 UserID Balance Status Version 1 80 C 5 TxID XXX 2 120 C 4 YYY P 6 Tx1 P 5 Tx1
  • 19. © 2020 Scalar, inc. Transaction with Example – Prepare Phase 14 Tx1 Tx1’s memory space Database Read Linearizable conditional write Update only if the versions and the TxIDs are the same as the ones it read Fail due to the condition mismatch UserID Balance Status Version 1 100 C 5 TxID XXX 2 100 C 4 YYY 1 80 P 6 Tx1 2 120 P 5 Tx1 Tx1: Transfer 20 from 1 to 2 Tx2 UserID Balance Status Version 1 100 C 5 Tx2’s memory space Tx2: Transfer 10 from 1 to 2 TxID XXX 2 100 C 4 YYY 1 90 P 6 Tx2 2 110 P 5 Tx2 UserID Balance Status Version 1 80 C 5 TxID XXX 2 120 C 4 YYY P 6 Tx1 P 5 Tx1
  • 20. © 2020 Scalar, inc. Transaction with Example – Commit Phase 1 15 UserID Balance Status Version 1 80 P 6 TxID Tx1 2 120 P 5 Tx1 Status C TxID XXX C YYY A ZZZ Tx1 Database
  • 21. © 2020 Scalar, inc. Transaction with Example – Commit Phase 1 15 UserID Balance Status Version 1 80 P 6 TxID Tx1 2 120 P 5 Tx1 Status C TxID XXX C YYY A ZZZ C Tx1 Linearizable conditional write Update if the Tx1 does not exist Tx1 Database
  • 22. © 2020 Scalar, inc. Transaction with Example – Commit Phase 2 16 Database UserID Balance Status Version 1 80 C 6 TxID Tx1 2 120 C 5 Tx1 Status C TxID XXX C YYY A ZZZ C Tx1 Linearizable conditional write Update status if the record is prepared by the Tx1 Tx1
  • 23. © 2020 Scalar, inc. Recovery 17 Prepare Phase Commit Phase1 Commit Phase2 TX1 • Recovery is lazily done when a record is read Nothing is needed (local memory space needs to be cleared) Recovery process Rollbacked by another TX lazily using before image Roll-forwarded by another TX lazily updating status to C No need for recovery Crash
  • 24. © 2020 Scalar, inc. Performance Optimization – Parallel Commit and Deferred Commit 18 W(X) W(Y) P(X) P(Y) C C(X) C(Y) Prepare Phase Commit Phase1 Commit Phase2 Parallel Commit Deferred Commit W(X) W(Y) P(X) P(Y) C C(X) C(Y) W(X) W(Y) P(X) P(Y) C C(X) C(Y) • Parallel Commit – Parallelize prepare-records and commit-records • Deferred Commit – Return to a caller without committing records Executed after the TX returns
  • 25. © 2020 Scalar, inc. Serializable Strategy • RCSI causes some anomalies – Read-skew, write-skew, read-only, and phantom anomalies • Basic strategy to make RCSI serializable – Avoid anti/rw-dependency dangerous structure [TODS’05] – No use of SSI [SIGMOD’08] or its variant [EuroSys’12] – Many linearizable operations for managing in/outConflicts or correct clock are required – Two implementations: Extra-write and Extra-read 18
  • 26. © 2020 Scalar, inc. Serializable Strategy – Extra-write and Extra-read 18 R(X) W(Y) P(Y) C C(Y) R(X) W(Y) P(Y) C C(Y) P(X) C(X) • Extra-write – Convert read into write. Extra care is done if a record doesn’t exist. • Extra-read – Check read-set after prepared to see if it is not updated by other transactions Write the same record R(X) W(Y) P(Y) C C(Y) R(X) W(Y) P(Y) C C(Y) V(X) Re-read (validate) the record and abort if it is changed Extra-write Extra-read
  • 27. © 2020 Scalar, inc. Transactions on Heterogeneous Databases • Scalar DB achieves ACID transaction spanning multiple different databases • Two types of interfaces: – One-phase and two-phase 18 MySQL Cassandra Scalar DB Application MySQL Cassandra Scalar DB Microservice1 Scalar DB Microservice2 TxID One-phase Two-phase
  • 28. © 2020 Scalar, inc. Benchmark Results with Scalar DB on Cassandra 19 Workload2 (Evidence) Workload1 (Payment) Each node: i3.4xlarge (16 vCPUs, 122 GB RAM, 1900 GB NVMe SSD * 2), RF: 3 • Achieved 90 % scalability in 100-node cluster (Compared to the Ideal TPS based on the performance of 3-node cluster)
  • 29. © 2020 Scalar, inc. Verification Results for Scalar DB • Scalar DB has been heavily tested with Jepsen and Elle [VLDB’21] – Jepsen tests are created and conducted by Scalar – See https://github.com/scalar-labs/scalar-jepsen for more detail • Transaction commit protocol is verified with TLA+ – See https://github.com/scalar-labs/scalardb/tree/master/tla%2B/consensus-commit 20 Jepsen Passed TLA+ Passed
  • 30. © 2020 Scalar, inc. Summary • Scalar DB is a universal transaction manager – Provide database-agnostic transactions on various databases – Cassandra, HBase, Amazon DynamoDB, Azure Cosmos DB, MySQL, PostgreSQL, Oracle Database, SQL Server, Amazon RDS, Amazon Aurora, ScyllaDB – Achieve transactions spanning heterogeneous databases – Enhanced to guarantee strict Serializability – Transaction consistency and scalability are verified extensively • Future work – GraphQL I/F, SQL I/F, More adaptors (mongodb, Kafka…) 18