SlideShare a Scribd company logo
Constraints of
Highly Scalable
Databases
1.
Traditional Databases
Recap of the ACID constraints
“Traditional” databases operate with the Transaction
paradigm that guarantees certain properties
(A) Atomicity
(C) Consistency
(I) Isolation
(D) Durability
The ACID Guarantees
1. Atomicity
Each transaction must be “all
or nothing” - if any part fails,
the whole transaction must be
rolled back as if it never
happened.
2. Consistency
The end-state of a transaction
must follow all the rules defined
in the database: data
constraints, cascades, triggers
etc.
3. Isolation
The result of 2 concurrent
operations should be the same
as if they occurred in
sequential order.
4. Durability
A transaction, once committed, will survive permanently even if the system fails.
This includes disk crashes, power outages, etc.
Locking
● Read / write / range locks
How do they do this?
Concurrency Control
● 2-phase commit (2PC), 3PC
protocols
● Distributed locks
But then came the 2000s
And Scale Happened
Traditional RDBMSs were not designed for
the needs of modern web applications
Global Scale
Netflix knows which movies you
watched, when, at what
point(s) you paused and for
how long, etc.
It then replicates that data
across 3 global data centers.
Volume
In 2008, Facebook had only
100 million users and needed
8,000 shards of MySQL.
Today it has ~ 1.86 Billion users.
Speed
In 2013 Twitter was recording
150,000 new tweets/second
every single day.
What to do?
Scale up! (?)
- Increase memory, cores, CPU
- Cache reads with memcached
- Master-slave replication
- Sharding
NOT
ENOUGH
2.
Redefining Constraints
Replacing ACID with BASE
“DMBS research is about ACID
(mostly). But we forfeit “C” and “I”
for availability, graceful
degradation, and performance.
This tradeoff is fundamental.
- Eric Brewer, 2000
Eric Brewer proposed a new set of properties: BASE
Soft State
Basically
Available
Eventual
consistency
System is always available
for clients (but may not be
consistent)
Database is no longer in
charge of “valid” data state.
The app is now responsible.
If all goes well, all clients will
eventually see the same
thing. Probably.
In the world of BASE parameters,
A different set of priorities rule
Availability is most important
Weak consistency (i.e. stale data) is okay
Approximate answers are okay
Aggressive (optimistic) algorithms are okay
Simple, fast, easy evolution of the schema is important
A new set of constraints:
the CAP Theorem
It is impossible for a distributed computer system to simultaneously
provide more than 2 of these 3 guarantees:
Consistency
Availability
Partition tolerance
(Eric Brewer, 1998-2000)
The CAP Parameters
1. Consistency*
All clients get the same view of
the data, or they get an error
(i.e. every read receives the
most recent write)
2. Availability
All clients can always read and
always write
(i.e. every request receives a
non-error response)
3. Partition tolerance
The system functions even if
some nodes are unavailable
(i.e. system operates despite
an arbitrary number of
messages being dropped by
the network between nodes)
Lightning talk: highly scalable databases and the PACELC theorem
All NoSQL databases live somewhere on this
spectrum, based on how they’re tuned
ACID BASE
● What levels of availability do you choose to provide?
● What levels of consistency do you choose to provide?
● What do you do when a partition is detected?
● How do you recover from a partition event?
But wait…
we’re not
through yet
2010: Daniel Abadi (Yale) says CAP is misleading
The trade-offs defined by CAP’s “pick any 2” are misleading:
● The only time you need to make a trade-off is when there is
a partition event (P)
● Systems that sacrifice C must do so all the time
● But systems that sacrifice A only need to do so when
there’s a partition
Most importantly, you don’t give up C to gain A
You give up C to get another missing ingredient: L
LATENCY
Latency = how long must a client request wait for your response?
Imagine replicating data across global data centers
Data Center 1
Data Center 2
Data Center 3
Data Center 4
Data Center n
Data Center 5
“A high availability requirement implies
that the system must replicate data.
But as soon as a distributed system
replicates data, a tradeoff between
consistency and latency arises.
- Abadi, 2010
The PACELC theorem (Abadi, 2010)
In a system that replicates data:
If a partition (P) is detected, how does the system trade off
○ (A) Availability or
○ (C) Consistency
Else (E) how does the system trade off
○ (L) Latency or
○ (C) Consistency
DDBS P+A P+C E+L E+C
Dynamo,
Cassandra,
Riak
Mongo,
H-Store, VoltDb
Yahoo! PNUTS
Comparing NoSQL databases using PACELC
References
Images and title ideas from:
○ http://blog.nahurst.com/visual-guide-to-nosql-systems
○ http://digbigdata.com/know-thy-cap-theorem-for-nosql/
Detailed references at:
○ http://www.bardoloi.com/blog/2017/03/06/pacelc-theorem/
thanks!
Any questions?
You can find me at
@bardoloi

More Related Content

What's hot (18)

ODP
Distributed systems and consistency
seldo
 
PPTX
Beyond Strong Consistency
jsinglet
 
PDF
BASE: An Acid Alternative
Hiroshi Ono
 
PPTX
CAP Theorem
Vikash Kodati
 
ODP
Consistency in Distributed Systems
Shane Johnson
 
PPTX
NoSQL databases, the CAP theorem, and the theory of relativity
Lars Marius Garshol
 
PDF
The Economics of Scale: Promises and Perils of Going Distributed
Tyler Treat
 
PPTX
Global Mutable State Analysis in Spring MVC Applications
jsinglet
 
PDF
Simple Solutions for Complex Problems
Tyler Treat
 
PPTX
HbaseHivePigbyRohitDubey
Rohit Dubey
 
PPTX
No sql (not only sql)
Priyodarshini Dhar
 
PDF
Design patterns in distributed system
Tom Huynh
 
PDF
Distributed Systems: scalability and high availability
Renato Lucindo
 
PPTX
Queue centric pattern
Sagar Rao
 
PPTX
Software Architectures, Week 5 - Advanced Architectures
Angelos Kapsimanis
 
PDF
Intro to distributed systems
Ahmed Soliman
 
PPTX
CAP: Scaling, HA
Vitaly Peregudov
 
PPTX
FAULT TOLERANCE
Poonam Yadav
 
Distributed systems and consistency
seldo
 
Beyond Strong Consistency
jsinglet
 
BASE: An Acid Alternative
Hiroshi Ono
 
CAP Theorem
Vikash Kodati
 
Consistency in Distributed Systems
Shane Johnson
 
NoSQL databases, the CAP theorem, and the theory of relativity
Lars Marius Garshol
 
The Economics of Scale: Promises and Perils of Going Distributed
Tyler Treat
 
Global Mutable State Analysis in Spring MVC Applications
jsinglet
 
Simple Solutions for Complex Problems
Tyler Treat
 
HbaseHivePigbyRohitDubey
Rohit Dubey
 
No sql (not only sql)
Priyodarshini Dhar
 
Design patterns in distributed system
Tom Huynh
 
Distributed Systems: scalability and high availability
Renato Lucindo
 
Queue centric pattern
Sagar Rao
 
Software Architectures, Week 5 - Advanced Architectures
Angelos Kapsimanis
 
Intro to distributed systems
Ahmed Soliman
 
CAP: Scaling, HA
Vitaly Peregudov
 
FAULT TOLERANCE
Poonam Yadav
 

Viewers also liked (19)

PPTX
Pancreatitis and peritonitis
Victor Vk
 
PPTX
Study pre-registration: Benefits and considerations
Krzysztof Gorgolewski
 
PPS
lv2 y v2 - formato y estructura compositiva
Coqui Podestá
 
PPTX
3Com 20-0469-001 A
savomir
 
PDF
PMP Lecture 3: Project Management Processes
Mohamed Loey
 
PPTX
ребус метод
Валентина Сідненко
 
DOCX
Taller 6 tarea organizador gráfico
Isamalia Muniz
 
PPT
3клас. матем осн здоровя
Андрій Сенчакевич
 
PPTX
Constelacion (1)
Andrea Aguirre Gómez
 
PPTX
EFFECTIVE BUSINESS DECISION MAKING CONCEPTS AND PROCESS
VISHAL VERMA LAKHNAWI JI
 
PPTX
Second trimestric soft markers of aneuploidy
Special Fetal Care Unit Ain Shams University Hospital
 
PDF
PMP Lecture 2: Project Management Framework
Mohamed Loey
 
PPTX
SMELOANS
LOAN KING
 
PPTX
Strategic Level of Confidence Matrix 2017
David Christensen
 
PPTX
Geog. 102 geography of agriculture
Susan White
 
PDF
Laboratory Method Verification, March 2017
Ola Elgaddar
 
PDF
NAVIGATION ALERT SYSTEM FOR FISHERMEN WITH SOLAR POWER HARVESTING
AM Publications
 
PPTX
Fun Core Gym Pdf
ITALY COFFEE TEA STORE
 
Pancreatitis and peritonitis
Victor Vk
 
Study pre-registration: Benefits and considerations
Krzysztof Gorgolewski
 
lv2 y v2 - formato y estructura compositiva
Coqui Podestá
 
3Com 20-0469-001 A
savomir
 
PMP Lecture 3: Project Management Processes
Mohamed Loey
 
Taller 6 tarea organizador gráfico
Isamalia Muniz
 
3клас. матем осн здоровя
Андрій Сенчакевич
 
Constelacion (1)
Andrea Aguirre Gómez
 
EFFECTIVE BUSINESS DECISION MAKING CONCEPTS AND PROCESS
VISHAL VERMA LAKHNAWI JI
 
Second trimestric soft markers of aneuploidy
Special Fetal Care Unit Ain Shams University Hospital
 
PMP Lecture 2: Project Management Framework
Mohamed Loey
 
SMELOANS
LOAN KING
 
Strategic Level of Confidence Matrix 2017
David Christensen
 
Geog. 102 geography of agriculture
Susan White
 
Laboratory Method Verification, March 2017
Ola Elgaddar
 
NAVIGATION ALERT SYSTEM FOR FISHERMEN WITH SOLAR POWER HARVESTING
AM Publications
 
Fun Core Gym Pdf
ITALY COFFEE TEA STORE
 
Ad

Similar to Lightning talk: highly scalable databases and the PACELC theorem (20)

KEY
Database Throwdown Introduction
Sean Collins
 
PPTX
NoSQL
RithikRaj25
 
PPTX
17-NoSQL.pptx
levichan1
 
PDF
Distribute Storage System May-2014
Công Lợi Dương
 
PDF
dist_systems.pdf
CherenetToma
 
PDF
Distributed computing for new bloods
Raymond Tay
 
PDF
Real time eventual consistency
ijfcstjournal
 
PDF
A Critique of the CAP Theorem by Martin Kleppmann
mustafa sarac
 
PPT
17855584.ppt
NoorEjaz1
 
PPTX
Data Engineering for Data Scientists
jlacefie
 
PDF
Highly available distributed databases, how they work, javier ramirez at teowaki
javier ramirez
 
PPT
Fundamentals Of Transaction Systems - Part 1: Causality banishes Acausality ...
Valverde Computing
 
PDF
Lecture-04-Principles of data management.pdf
manimozhi98
 
PPTX
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
DataStax Academy
 
PDF
Petabytes and Nanoseconds
Robert Greiner
 
PDF
Why Distributed Databases?
Sargun Dhillon
 
PPTX
Distributed systems and scalability rules
Oleg Tsal-Tsalko
 
ODP
Nosql availability & integrity
Fahri Firdausillah
 
PPTX
Software architecture for data applications
Ding Li
 
PPTX
Introduction
Mohamed Diallo
 
Database Throwdown Introduction
Sean Collins
 
17-NoSQL.pptx
levichan1
 
Distribute Storage System May-2014
Công Lợi Dương
 
dist_systems.pdf
CherenetToma
 
Distributed computing for new bloods
Raymond Tay
 
Real time eventual consistency
ijfcstjournal
 
A Critique of the CAP Theorem by Martin Kleppmann
mustafa sarac
 
17855584.ppt
NoorEjaz1
 
Data Engineering for Data Scientists
jlacefie
 
Highly available distributed databases, how they work, javier ramirez at teowaki
javier ramirez
 
Fundamentals Of Transaction Systems - Part 1: Causality banishes Acausality ...
Valverde Computing
 
Lecture-04-Principles of data management.pdf
manimozhi98
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
DataStax Academy
 
Petabytes and Nanoseconds
Robert Greiner
 
Why Distributed Databases?
Sargun Dhillon
 
Distributed systems and scalability rules
Oleg Tsal-Tsalko
 
Nosql availability & integrity
Fahri Firdausillah
 
Software architecture for data applications
Ding Li
 
Introduction
Mohamed Diallo
 
Ad

Recently uploaded (20)

PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PDF
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
PPTX
Q2 Leading a Tableau User Group - Onboarding
lward7
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PDF
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
Learn Computer Forensics, Second Edition
AnuraShantha7
 
PDF
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
Q2 Leading a Tableau User Group - Onboarding
lward7
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
Learn Computer Forensics, Second Edition
AnuraShantha7
 
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 

Lightning talk: highly scalable databases and the PACELC theorem

  • 2. 1. Traditional Databases Recap of the ACID constraints
  • 3. “Traditional” databases operate with the Transaction paradigm that guarantees certain properties (A) Atomicity (C) Consistency (I) Isolation (D) Durability
  • 4. The ACID Guarantees 1. Atomicity Each transaction must be “all or nothing” - if any part fails, the whole transaction must be rolled back as if it never happened. 2. Consistency The end-state of a transaction must follow all the rules defined in the database: data constraints, cascades, triggers etc. 3. Isolation The result of 2 concurrent operations should be the same as if they occurred in sequential order. 4. Durability A transaction, once committed, will survive permanently even if the system fails. This includes disk crashes, power outages, etc.
  • 5. Locking ● Read / write / range locks How do they do this? Concurrency Control ● 2-phase commit (2PC), 3PC protocols ● Distributed locks
  • 6. But then came the 2000s
  • 8. Traditional RDBMSs were not designed for the needs of modern web applications Global Scale Netflix knows which movies you watched, when, at what point(s) you paused and for how long, etc. It then replicates that data across 3 global data centers. Volume In 2008, Facebook had only 100 million users and needed 8,000 shards of MySQL. Today it has ~ 1.86 Billion users. Speed In 2013 Twitter was recording 150,000 new tweets/second every single day.
  • 9. What to do? Scale up! (?) - Increase memory, cores, CPU - Cache reads with memcached - Master-slave replication - Sharding
  • 12. “DMBS research is about ACID (mostly). But we forfeit “C” and “I” for availability, graceful degradation, and performance. This tradeoff is fundamental. - Eric Brewer, 2000
  • 13. Eric Brewer proposed a new set of properties: BASE Soft State Basically Available Eventual consistency System is always available for clients (but may not be consistent) Database is no longer in charge of “valid” data state. The app is now responsible. If all goes well, all clients will eventually see the same thing. Probably.
  • 14. In the world of BASE parameters, A different set of priorities rule Availability is most important Weak consistency (i.e. stale data) is okay Approximate answers are okay Aggressive (optimistic) algorithms are okay Simple, fast, easy evolution of the schema is important
  • 15. A new set of constraints: the CAP Theorem It is impossible for a distributed computer system to simultaneously provide more than 2 of these 3 guarantees: Consistency Availability Partition tolerance (Eric Brewer, 1998-2000)
  • 16. The CAP Parameters 1. Consistency* All clients get the same view of the data, or they get an error (i.e. every read receives the most recent write) 2. Availability All clients can always read and always write (i.e. every request receives a non-error response) 3. Partition tolerance The system functions even if some nodes are unavailable (i.e. system operates despite an arbitrary number of messages being dropped by the network between nodes)
  • 18. All NoSQL databases live somewhere on this spectrum, based on how they’re tuned ACID BASE ● What levels of availability do you choose to provide? ● What levels of consistency do you choose to provide? ● What do you do when a partition is detected? ● How do you recover from a partition event?
  • 20. 2010: Daniel Abadi (Yale) says CAP is misleading The trade-offs defined by CAP’s “pick any 2” are misleading: ● The only time you need to make a trade-off is when there is a partition event (P) ● Systems that sacrifice C must do so all the time ● But systems that sacrifice A only need to do so when there’s a partition Most importantly, you don’t give up C to gain A You give up C to get another missing ingredient: L
  • 21. LATENCY Latency = how long must a client request wait for your response?
  • 22. Imagine replicating data across global data centers Data Center 1 Data Center 2 Data Center 3 Data Center 4 Data Center n Data Center 5
  • 23. “A high availability requirement implies that the system must replicate data. But as soon as a distributed system replicates data, a tradeoff between consistency and latency arises. - Abadi, 2010
  • 24. The PACELC theorem (Abadi, 2010) In a system that replicates data: If a partition (P) is detected, how does the system trade off ○ (A) Availability or ○ (C) Consistency Else (E) how does the system trade off ○ (L) Latency or ○ (C) Consistency
  • 25. DDBS P+A P+C E+L E+C Dynamo, Cassandra, Riak Mongo, H-Store, VoltDb Yahoo! PNUTS Comparing NoSQL databases using PACELC
  • 26. References Images and title ideas from: ○ http://blog.nahurst.com/visual-guide-to-nosql-systems ○ http://digbigdata.com/know-thy-cap-theorem-for-nosql/ Detailed references at: ○ http://www.bardoloi.com/blog/2017/03/06/pacelc-theorem/
  • 27. thanks! Any questions? You can find me at @bardoloi