SlideShare a Scribd company logo
Databases
Sargun Dhillon
@Sargun
What is a database?
A database is an organized collection of data
What are databases
for?
Applications
Internet Applications
Experiencing exploding growth
Internet Traffic vs. Penetration
0
25
50
75
100
0
10000
20000
30000
40000
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
IP Traffic (PB/mo) Global Penetration (%)
Number of Internet Users in 2012
Average Distance to Every Human
Extrapolating
We have not yet reached Peak “Web” and we won’t see
it for some time
Applications
How are they built?
Basic Application
Useful Application
Add Persistence
Scale Out
Scale Out with Correctness
What is a Transaction?
A Unit of Work
Transaction Scheduling
Concurrent Operations
Non-Conflicting Concurrency
Parallel Execution
ACID
ACID = Atomicity
A transaction executes or it does not
ACID = Consistency
Correctness; Require the database to follow set of
invariants
ACID = Isolation
Prevent inter-actor visibility during concurrent operations
ACID = Durability
Once you write, it will survive
Lifecycle of a Transaction
Vertically Scalability
Moore’s Law can take us places
Biggest AWS Database
• vCPUs: 32
• Memory: 244
• Storage: 3TB
• IOPs: 30,000 IOPs
• Networking: 10 Gigabit
• Resiliency: Multi-AZ
• SLA: 99.95%
• Backend: Postgresql
$141,052.66/yr
Scaling Beyond
Sharding?
Intro to Databases
Do we have a natural
sharding key?
Add a Coordinator?
Two-phase commit?
Three-phase commit?
Paxos?
Enhanced Three-phase commit?
Wat?
Egalitarian Paxos?
Do we really want to
run NxM databases?
Partial Availability
Failure detectors are
hard
Database Failure
Cascading App Failure
Recovery
Hotspots?
(The “Bieber” problem)
Scaling SSI databases
is a hard problem
What if want
multidatacenter?
Intro to Databases
Intro to Databases
Intro to Databases
No latency win for
mutable data
Must sacrifice recency
for latency win
Complex Routing
Semantics
Multi-master requires
at least 1 RTT
-F1: A Distributed SQL Database That Scales, Google
“Because the data is synchronously replicated
across multiple datacenters, and because
we’ve chosen widely distributed datacenters,
the commit latencies are relatively high (50-150
ms).”
-Kohavi and Longbotham 2007
“Every 100 ms increase in load time of
Amazon.com decreased sales by 1%.”
(~$120M of losses per 100 ms)
“Average partition duration ranged from 6 minutes for
software-related failures to more than 8.2 hours for
hardware-related failures (median 2.7 and 32 minutes;
95th percentile of 19.9 minutes and 3.7 days,
respectively).”
-The Network is Reliable
WANs Fail
Is there another way?
Eventually
Consistent
Systems
-F1: A Distributed SQL Database That Scales, Google
“We also have a lot of experience with eventual
consistency systems at Google. In all such
systems, we find developers spend a
significant fraction of their time building
extremely complex and error-prone
mechanisms to cope with eventual consistency
and handle data that may be out of date. We
think this is an unacceptable burden to place
on developers and that consistency problems
should be solved at the database level. ”
CAP Theorem
“A shared-data system can have at most
two of the three following properties:
Consistency, Availability, and tolerance to
network Partitions.”
-Dr. Eric Brewer
On Consistency
• ACID Consistency: Any transaction, or operation
will bring the database from one valid state to
another
• CAP Consistency: All nodes see the same data at
the same time (synchrony)
On Partition Tolerance
• The network will be allowed to lose arbitrarily many
messages sent from one node to another.
• Databases systems, in order to be useful must
have communication over the network
• Clients count
There is no such thing as
a 100% reliable network:
Can’t choose CA
http://codahale.com/you-cant-sacrifice-partition-tolerance
We Can Have Both*
(*Just not at the same time)
PNUTS
• Paper released by Yahoo! research in 2008
• Operations:
• Read-Any
• Read-Critical(Required-Version)*
• Read-Latest
• Write
• Test-and-set-write(Required-Version)
* Will fall back to CP operation
Weak Consistency
Weak Consistency
“This is a specific form of weak
consistency; the storage system
guarantees that if no new
updates are made to the object,
eventually all accesses will
return the last updated value.”
Definition of “Eventual Consistency” from “Eventually
Consistency Revisited” - Werner Vogels
Intro to Databases
Eventual Consistency
in the LAN
Less Relevant Today
Good at Building
LANs at Scale
Facebook Fabric
Microsoft VL2
Google Jupiter
Less Interesting
Eventual Consistency
in the WAN
Low-latency
everywhere
Write Anywhere
Beat the speed of the light
Build for WAN locality
Typical Pattern
with
COTS EC Store
System Model
Use Case:
Social Network
Models:
Users, Posts, Friends
Schema
CREATE TABLE test.users (
user_name text PRIMARY KEY,
friends set<text>,
posts set<text>
)
State
*****:test> SELECT * FROM users;
user_name | friends | posts
-----------+----------+-------
sargun | {'BOSS'} | null
Let’s Post!
(But First)
Remove Boss
*****:test> UPDATE users SET
friends = friends - {'BOSS'}
WHERE user_name = 'sargun' ;
Hidden Failure
Dropped Unfriending
State at DC2 & DC3
*****:test> SELECT * FROM users;
user_name | friends | posts
-----------+----------+-------
sargun | {'BOSS'} | null
Post Message
*****:test> UPDATE users SET
posts = posts + {'PARTY'} WHERE
user_name = 'sargun' ;
State at DC2 & DC3
*****:test> SELECT * FROM users;
user_name | friends | posts
-----------+----------+-----------
sargun | {'BOSS'} | {'PARTY'}
Worse Than Banking
Unbounded Financial Loss
No
Happens-Before (h.b.)
Relationship
Solution: Wait For Acks
Very Little Benefit
Over
CP system
Quorum Systems
RYOW at an
Incredible Cost
Why not just do
Paxos*?
Single-Decree Paxos Variant such as EPaxos, Cheap Paxos, or
Multi-Paxos
Intro to Databases
Intro to Databases
Quorum
Quorum
Participating Quorums
Must Overlap
Just Perform
Paxos Reconfiguration
to
Recover from Failure
Intro to Databases
Intro to Databases
Intro to Databases
Is there an alternative?
Strong
Eventual
Consistency
Strong Eventual Consistency
“Any set of nodes that have received
the same (unordered) set of updates
will be in the same state.”
How do you even use this?
Vector Clocks
Vector Clocks
• Extension of Lamport Clocks
• Used to detect cause and effect in distributed
systems
• Can determine concurrency of events, and
causality violations
• Preserves h.b. relationships
CRDTs
• CRDTs:
• Convergent Replicated Data Types
• Commutative Replication Data Types
• Enables data structures to be always writeable on both sides of a partition,
and replay after healing a partition
• Enable distributed computation across monotonic functions
• Two Types:
• CvRDTs
• CmRDTs
CRDTs
CvRDTs
• State / value based CRDTs
• Minimal state
• Don’t require active garbage collection
Set CvRDT
CmRDTs
• Op / method based CRDTs
• Size grows monotonically
• Uses version vectors to determine order of
operations
Counter CmRDT
CRDTs in the Wild
• Sets
• Observe-remove set
• Grow-only sets
• Counters
• Grow-only counters
• PN-Counters
• Flags
• Maps
Data structures that are
CRDTs
• Probabilistic, convergent data structures
• Hyper log log
• Bloom filter
• Co-recursive folding functions
• Maximum-counter
• Running Average
• Operational Transform
CRDTs
• Incredibly powerful primitive
• Not only useful for in-database manipulation but
client-database interaction
• You can compose them, and build your own
• Garbage collection is tricky
Riak
In Action
Model
curl -s http://localhost:8098/types/test/buckets/test/
datatypes/sargun |python -mjson.tool
{
"context": "g2wAAAABaAJtAAAACBjtDYuvG6A4YQpq",
"type": "map",
"value": {
"friends_set": [
"Boss"
],
"posts_set": []
}
}
“Primary Key”
curl -s http://localhost:8098/types/test/buckets/test/
datatypes/sargun |python -mjson.tool
{
"context": "g2wAAAABaAJtAAAACBjtDYuvG6A4YQpq",
"type": "map",
"value": {
"friends_set": [
"Boss"
],
"posts_set": []
}
}
Causal Context
curl -s http://localhost:8098/types/test/buckets/test/
datatypes/sargun |python -mjson.tool
{
"context": "g2wAAAABaAJtAAAACBjtDYuvG6A4YQpq",
"type": "map",
"value": {
"friends_set": [
"Boss"
],
"posts_set": []
}
}
Update
curl -XPOST http://localhost:8098/types/test/buckets/
test/datatypes/sargun 
-H "Content-Type: application/json" 
-H "X-Riak-Vclock: g2wAAAABaAJtAAAACBjtDYuvG6A4YQpq" 
-d '
{
"update": {
"friends_set": {
"remove": "Boss"
}
}
}'
Updated Entries
(during partition)
{
"context": "g2wAAAABaAJtAAAACBjtDYuvG6A4YQpq",
"type": "map",
"value": {
"friends_set": [
"Boss"
],
"posts_set": []
}
}
{
"context": "g2wAAAABaAJtAAAACBjtDYuvG6A4YQtq",
"type": "map",
"value": {
"friends_set": [],
"posts_set": []
}
}
Updatecurl -XPOST http://localhost:8098/types/test/buckets/
test/datatypes/sargun 
-H "Content-Type: application/json"
-H "X-Riak-Vclock: g2wAAAABaAJtAAAACBjtDYuvG6A4YQtq"
-d '
{
"update": {
"posts_set": {
"add": "Party"
}
}
}'
Updated Entries
(After Healing)
{
"context": "g2wAAAABaAJtAAAACBjtDYuvG6A4YQ5q",
"type": "map",
"value": {
"friends_set": [],
"posts_set": [
"Party"
]
}
}
{
"context": "g2wAAAABaAJtAAAACBjtDYuvG6A4YQ5q",
"type": "map",
"value": {
"friends_set": [],
"posts_set": [
"Party"
]
}
}
Currently:
Replicates entire value
Future Work:
δ-CRDT
Ship only Deltas
Eventual Consistency
In Summary
SEC Enables
Distributed
Scalable
Scalability
Processors
Fault-Tolerant
Applications
Eventual Consistency (CAP)
Without Consistency (ACID)
Gives EC a Bad Name
Invariant Operation AP / CP
Specify unique ID Any CP
Generate unique ID Any AP
> INCREMENT AP
> DECREMENT CP
< INCREMENT CP
< DECREMENT AP
Secondary Index Any AP
Materialized View Any AP
AUTO_INCREMEN
T
INSERT CP
Linearizability CAS CP
Operations Requiring
Weak Consistency
vs.
Strong Consistency
BASE not ACID
•Basically Available: There will be a response
per request (failure, or success)
•Soft State: Any two reads against the system
may yield different data (when measured
against time)
•Eventually Consistent: The system will
eventually become consistent when all
failures have healed, and time goes to infinity
Brand New Technology
Still being invented
Technology Timeline
• 1996 - Log structured merge tree
• 2000 - CAP Theorem
• 2007 - Amazon Dynamo Paper
• 2011 - INRIA CRDT Technical Report
• 2014 - Riak DT map: a composable, convergent
replicated dictionary
Further Reading
• Don’t Settle for Eventual: Scalable Causal Consistency for Wide-Area
Storage with COPS
• PNUTS: Yahoo!’s Hosted Data Serving Platform
• F1: A Distributed SQL Database That Scales
• Spanner: Google's Globally-Distributed Database
• The Network is Reliable: An informal survey of real-world communications
failures
• A comprehensive study of Convergent and CommutativeReplicated Data
Types
• Riak DT Map: A Composable, Convergent Replicated Dictionary
Get in Touch
• If you’re interested in cheating the speed of light
• Come use our software
• If you’re interested in solving today’s computer science
problems
• Come work for us
• If you’d like to learn more about distributed systems at
scale
• Maybe you have a better idea
Sargun Dhillon
@Sargun
sdhillon@basho.com
The Case
for
Eventual Consistency

More Related Content

What's hot (20)

PDF
Openstack summit 2015
Andrew Yongjoon Kong
 
PDF
Enabling Limitless Connectivity, Opportunity and Growth with Interconnection ...
Sagi Brody
 
PDF
Way to cloud
Andrew Yongjoon Kong
 
PDF
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
Yafang Chang
 
PPTX
Building clouds with apache cloudstack apache roadshow 2018
ShapeBlue
 
PDF
How to build a winning solution for large scale VDI deployments
NetApp
 
PPTX
Neutron scaling
Vinay Bannai
 
PPTX
Neutron scale
Justin Hammond
 
PPTX
Make a Move to the Azure Cloud with SoftNAS
Buurst
 
PDF
VMworld 2013: Three Advantages of Running Cloud Foundry in a VMware Private C...
VMworld
 
PDF
Reactive Supply To Changing Demand
Jonas Bonér
 
PPT
Introduction to Apache CloudStack by David Nalley
buildacloud
 
PDF
Pulling Back the Cloud Curtain
Sagi Brody
 
PPTX
Policy Based SDN Solution for DC and Branch Office by Suresh Boddapati
buildacloud
 
PDF
Cloud Networking is not Virtual Networking - London VMUG 20130425
Greg Ferro
 
PDF
Networking in the Cloud Age (LISA 2012 Tutorial)
Chiradeep Vittal
 
PDF
The Next Big Thing: Serverless
Doug Vanderweide
 
PDF
Percona presentation v2
Sandro Mazziotta
 
PDF
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platform
confluent
 
PPTX
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Fwdays
 
Openstack summit 2015
Andrew Yongjoon Kong
 
Enabling Limitless Connectivity, Opportunity and Growth with Interconnection ...
Sagi Brody
 
Way to cloud
Andrew Yongjoon Kong
 
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
Yafang Chang
 
Building clouds with apache cloudstack apache roadshow 2018
ShapeBlue
 
How to build a winning solution for large scale VDI deployments
NetApp
 
Neutron scaling
Vinay Bannai
 
Neutron scale
Justin Hammond
 
Make a Move to the Azure Cloud with SoftNAS
Buurst
 
VMworld 2013: Three Advantages of Running Cloud Foundry in a VMware Private C...
VMworld
 
Reactive Supply To Changing Demand
Jonas Bonér
 
Introduction to Apache CloudStack by David Nalley
buildacloud
 
Pulling Back the Cloud Curtain
Sagi Brody
 
Policy Based SDN Solution for DC and Branch Office by Suresh Boddapati
buildacloud
 
Cloud Networking is not Virtual Networking - London VMUG 20130425
Greg Ferro
 
Networking in the Cloud Age (LISA 2012 Tutorial)
Chiradeep Vittal
 
The Next Big Thing: Serverless
Doug Vanderweide
 
Percona presentation v2
Sandro Mazziotta
 
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platform
confluent
 
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Fwdays
 

Similar to Intro to Databases (20)

PDF
John adams talk cloudy
John Adams
 
PDF
Azure and cloud design patterns
Venkatesh Narayanan
 
PPTX
Data Handning with Sqlite for Android
Jakir Hossain
 
PPTX
MongoDB: How We Did It – Reanimating Identity at AOL
MongoDB
 
PPTX
Iot cloud service v2.0
Vinod Wilson
 
PPTX
Building FoundationDB
FoundationDB
 
PDF
NoSQL - No Security?
Gavin Holt
 
PPTX
Planning to Fail #phpuk13
Dave Gardner
 
PDF
Building a Database for the End of the World
jhugg
 
PDF
Data Lake and the rise of the microservices
Bigstep
 
PDF
Architecting Cloud Applications - the essential checklist
Object Consulting
 
PDF
SpringPeople - Introduction to Cloud Computing
SpringPeople
 
PPT
Using Simplicity to Make Hard Big Data Problems Easy
nathanmarz
 
PPTX
Essential Data Engineering for Data Scientist
SoftServe
 
PDF
Real World Cassandra
GiltTech
 
PPTX
Locking and Race Conditions in Web Applications
Andrew Kandels
 
PDF
Gluecon Monitoring Microservices and Containers: A Challenge
Adrian Cockcroft
 
PPTX
IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...
In-Memory Computing Summit
 
PPTX
Compare Clustering Methods for MS SQL Server
AlexDepo
 
PDF
Instrumenting the real-time web: Node.js in production
bcantrill
 
John adams talk cloudy
John Adams
 
Azure and cloud design patterns
Venkatesh Narayanan
 
Data Handning with Sqlite for Android
Jakir Hossain
 
MongoDB: How We Did It – Reanimating Identity at AOL
MongoDB
 
Iot cloud service v2.0
Vinod Wilson
 
Building FoundationDB
FoundationDB
 
NoSQL - No Security?
Gavin Holt
 
Planning to Fail #phpuk13
Dave Gardner
 
Building a Database for the End of the World
jhugg
 
Data Lake and the rise of the microservices
Bigstep
 
Architecting Cloud Applications - the essential checklist
Object Consulting
 
SpringPeople - Introduction to Cloud Computing
SpringPeople
 
Using Simplicity to Make Hard Big Data Problems Easy
nathanmarz
 
Essential Data Engineering for Data Scientist
SoftServe
 
Real World Cassandra
GiltTech
 
Locking and Race Conditions in Web Applications
Andrew Kandels
 
Gluecon Monitoring Microservices and Containers: A Challenge
Adrian Cockcroft
 
IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...
In-Memory Computing Summit
 
Compare Clustering Methods for MS SQL Server
AlexDepo
 
Instrumenting the real-time web: Node.js in production
bcantrill
 
Ad

Recently uploaded (20)

PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
Digital Circuits, important subject in CS
contactparinay1
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
Ad

Intro to Databases