SlideShare a Scribd company logo
Building a Fault Tolerant Distributed
Architecture
Rodrigo Toste Gomes
Software Engineer
A MemSQL Cluster
2
- Master Aggregator
- Leaves
Leaves store shards of data.
Master Aggregator is the source of truth for cluster state:
- Location of data shards;
Fault Tolerance and High Availability
3
Maintain replicas of shards in different nodes.
System can tolerate one node failure.
Master Aggregator is also responsible for:
- Location of data shard replicas;
- Replication states;
- Moving shards around.
4
Cluster State:
Node State:
- Leaf A is online
- Leaf B is online
Databases:
- data
Shards:
- data:0 - primary on A
- data:0 - replica on B
- data:1 - primary on B
- data:1 - replica on A
Example Cluster
Nodes:
Master Aggregator
Leaf A
Leaf B
Maintaining the Cluster State
5
Cluster State:
- Leaf A is online
Leaf A:
Offline MA
BA
Heartbeats
Cluster State:
- Leaf A is offline
Leaf A:
Offline
Maintaining the Cluster State
6
Cluster State:
- Leaf A is online
- data:0 - primary on A
- data:0 - replica on B
Leaf A:
Offline
Leaf B:
Online; replica of tweets:0
Cluster State:
- Leaf A is offline
- data:0 - primary on B
Leaf A:
Offline
Leaf B:
Online; replica of data:0
Maintaining the Cluster State
7
Cluster State:
- Leaf A is online
- data:0 - primary on A
- data:0 - replica on B
Leaf A:
Offline
Leaf B:
Online; replica of tweets:0
Cluster State:
- Leaf A is offline
- data:0 - primary on B
Leaf A:
Offline
Leaf B:
Online; replica of data:0
Leaf B:
Online; primary of data:0
MemSQL Distributed Architecture
8
Master Aggregator:
- Maintains primary copy of cluster state;
- Monitors node state via heartbeats.
Leaves:
- Monitor for changes in local and cluster state;
- Reconcile local state with cluster state.
What happens when a node is unable to reconcile its local
state with the cluster state?
Shard Can’t Replicate
9
Cluster State:
- data:0 primary on A
- data:0 replica on B
Leaf A:
Primary data:0
Replicating data:0 to B
Leaf B:
Replica of data:0
MA
BA
Heartbeats
data:0
Shard Can’t Replicate
10
Cluster State:
- data:0 primary on A
- data:0 replica on B
Leaf A:
Primary data:0
Leaf B:
Replica of data:0
Network Partition A | B
MA
BA
Heartbeats
data:0
Strategy 1: Don’t block writes
11
MA
BA
Heartbeats
data:0
Cluster State:
- data:0 primary on A
- data:0 replica on B
Leaf A:
Primary data:0
Leaf B:
Out of Date Replica of data:0
Network Partition A | B
Strategy 1: Don’t block writes (Leaf A crashes)
12
MA
BA
Heartbeats
data:0
Cluster State:
- data:0 primary on A
- data:0 replica on B
Leaf A:
Offline
Leaf B:
Out of Date Replica of data:0
Network Partition A | B
Cluster State:
- data:0 primary on B
Leaf B:
Primary of data:0
Not a MemSQL approach
given potential for data loss
Strategy 2: Do nothing - block writes indefinitely
13
MA
BA
Heartbeats
data:0
Cluster State:
- data:0 primary on A
- data:0 replica on B
Leaf A:
Primary data:0
Leaf B:
Replica of data:0
Network Partition A | B
Write workload stalls
Strategy 3: Leaf Notify
14
MA
BA
Heartbeats
data:0
Cluster State:
- data:0 primary on A
- data:0 synchronous replica on B
Leaf A:
Primary data:0
Leaf B:
Replica of data:0
Network Partition A | B
Write workload is stalled
Strategy 3: Leaf Notify
15
MA
BA
Heartbeats
data:0
Cluster State:
- data:0 primary on A
- data:0 synchronous replica on B
Leaf A:
Primary data:0
Leaf Notifies MA - no replication
Leaf B:
Replica of data:0
Network Partition A | B
Strategy 3: Leaf Notify
16
MA
BA
Heartbeats
data:0
Cluster State:
- data:0 primary on A
- data:0 asynchronous replica on B
Leaf A:
Primary data:0
Leaf B:
Out of Date Replica of data:0
Network Partition A | B
MemSQL Distributed Architecture
17
Master Aggregator:
- Maintains primary copy of cluster state;
- Monitors node state via heartbeats.
Leaves:
- Monitor for changes in local and cluster state;
- Reconcile local state with cluster state;
- Notify MA to change cluster state when reconciling is
impossible.
Questions?
Thank you
memsql.com

More Related Content

What's hot (7)

PPTX
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
Muhammad Saleem
 
PPTX
Federated Query Formulation and Processing Through BioFed
Muhammad Saleem
 
PPTX
FedX - Optimization Techniques for Federated Query Processing on Linked Data
aschwarte
 
PPT
Re-using Media on the Web: Media fragment re-mixing and playout
MediaMixerCommunity
 
PPTX
GDG Meets U event - Big data & Wikidata - no lies codelab
CAMELIA BOBAN
 
PDF
Querying Linked Data with SPARQL
Olaf Hartig
 
PDF
(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)
Olaf Hartig
 
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
Muhammad Saleem
 
Federated Query Formulation and Processing Through BioFed
Muhammad Saleem
 
FedX - Optimization Techniques for Federated Query Processing on Linked Data
aschwarte
 
Re-using Media on the Web: Media fragment re-mixing and playout
MediaMixerCommunity
 
GDG Meets U event - Big data & Wikidata - no lies codelab
CAMELIA BOBAN
 
Querying Linked Data with SPARQL
Olaf Hartig
 
(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)
Olaf Hartig
 

Similar to Building a Fault Tolerant Distributed Architecture (20)

PDF
Tachyon memory centric, fault tolerance storage for cluster framworks
Viet-Trung TRAN
 
PDF
Architectural Overview of MapR's Apache Hadoop Distribution
mcsrivas
 
PDF
Replication in the wild ankara cloud meetup - feb 2017
AnkaraCloud
 
PDF
Replication in the wild ankara cloud meetup - feb 2017
Onur Dayıbaşı
 
PDF
Building a Distributed Message Log from Scratch
Tyler Treat
 
PDF
Data has a better idea the in-memory data grid
Bogdan Dina
 
PPTX
20240515 - Chicago PUG - Clustering in PostgreSQL: Because one database serve...
Umair Shahid
 
PPTX
Basics of Distributed Systems - Distributed Storage
Nilesh Salpe
 
PDF
Advanced Replication Internals
Scott Hernandez
 
PDF
Highly available distributed databases, how they work, javier ramirez at teowaki
javier ramirez
 
PPTX
MongoDB for Time Series Data Part 3: Sharding
MongoDB
 
PDF
20230511 - PGConf Nepal - Clustering in PostgreSQL_ Because one database serv...
Umair Shahid
 
PPTX
Common Cluster Configuration Pitfalls
MongoDB
 
PPT
Big Data & NoSQL - EFS'11 (Pavlo Baron)
Pavlo Baron
 
PDF
Design Patterns For Distributed NO-reational databases
lovingprince58
 
PDF
Open west 2015 talk ben coverston
bcoverston
 
PDF
Design Patterns for Distributed Non-Relational Databases
guestdfd1ec
 
PPTX
Webinar: Replication and Replica Sets
MongoDB
 
PPT
Ogf2008 Grid Data Caching
Jags Ramnarayan
 
PDF
Invalidation-Based Protocols for Replicated Datastores
Antonios Katsarakis
 
Tachyon memory centric, fault tolerance storage for cluster framworks
Viet-Trung TRAN
 
Architectural Overview of MapR's Apache Hadoop Distribution
mcsrivas
 
Replication in the wild ankara cloud meetup - feb 2017
AnkaraCloud
 
Replication in the wild ankara cloud meetup - feb 2017
Onur Dayıbaşı
 
Building a Distributed Message Log from Scratch
Tyler Treat
 
Data has a better idea the in-memory data grid
Bogdan Dina
 
20240515 - Chicago PUG - Clustering in PostgreSQL: Because one database serve...
Umair Shahid
 
Basics of Distributed Systems - Distributed Storage
Nilesh Salpe
 
Advanced Replication Internals
Scott Hernandez
 
Highly available distributed databases, how they work, javier ramirez at teowaki
javier ramirez
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB
 
20230511 - PGConf Nepal - Clustering in PostgreSQL_ Because one database serv...
Umair Shahid
 
Common Cluster Configuration Pitfalls
MongoDB
 
Big Data & NoSQL - EFS'11 (Pavlo Baron)
Pavlo Baron
 
Design Patterns For Distributed NO-reational databases
lovingprince58
 
Open west 2015 talk ben coverston
bcoverston
 
Design Patterns for Distributed Non-Relational Databases
guestdfd1ec
 
Webinar: Replication and Replica Sets
MongoDB
 
Ogf2008 Grid Data Caching
Jags Ramnarayan
 
Invalidation-Based Protocols for Replicated Datastores
Antonios Katsarakis
 
Ad

More from SingleStore (20)

PPTX
Five ways database modernization simplifies your data life
SingleStore
 
PPTX
How Kafka and Modern Databases Benefit Apps and Analytics
SingleStore
 
PDF
Architecting Data in the AWS Ecosystem
SingleStore
 
PPTX
Building the Foundation for a Latency-Free Life
SingleStore
 
PDF
Converging Database Transactions and Analytics
SingleStore
 
PDF
Building a Machine Learning Recommendation Engine in SQL
SingleStore
 
PPTX
MemSQL 201: Advanced Tips and Tricks Webcast
SingleStore
 
PDF
Introduction to MemSQL
SingleStore
 
PDF
An Engineering Approach to Database Evaluations
SingleStore
 
PDF
Stream Processing with Pipelines and Stored Procedures
SingleStore
 
PPTX
Curriculum Associates Strata NYC 2017
SingleStore
 
PPTX
Image Recognition on Streaming Data
SingleStore
 
PPTX
Spark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
SingleStore
 
PDF
The State of the Data Warehouse in 2017 and Beyond
SingleStore
 
PDF
How Database Convergence Impacts the Coming Decades of Data Management
SingleStore
 
PPTX
Teaching Databases to Learn in the World of AI
SingleStore
 
PDF
Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud
SingleStore
 
PPTX
Gartner Catalyst 2017: Image Recognition on Streaming Data
SingleStore
 
PPTX
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
SingleStore
 
PDF
Real-Time Analytics at Uber Scale
SingleStore
 
Five ways database modernization simplifies your data life
SingleStore
 
How Kafka and Modern Databases Benefit Apps and Analytics
SingleStore
 
Architecting Data in the AWS Ecosystem
SingleStore
 
Building the Foundation for a Latency-Free Life
SingleStore
 
Converging Database Transactions and Analytics
SingleStore
 
Building a Machine Learning Recommendation Engine in SQL
SingleStore
 
MemSQL 201: Advanced Tips and Tricks Webcast
SingleStore
 
Introduction to MemSQL
SingleStore
 
An Engineering Approach to Database Evaluations
SingleStore
 
Stream Processing with Pipelines and Stored Procedures
SingleStore
 
Curriculum Associates Strata NYC 2017
SingleStore
 
Image Recognition on Streaming Data
SingleStore
 
Spark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
SingleStore
 
The State of the Data Warehouse in 2017 and Beyond
SingleStore
 
How Database Convergence Impacts the Coming Decades of Data Management
SingleStore
 
Teaching Databases to Learn in the World of AI
SingleStore
 
Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud
SingleStore
 
Gartner Catalyst 2017: Image Recognition on Streaming Data
SingleStore
 
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
SingleStore
 
Real-Time Analytics at Uber Scale
SingleStore
 
Ad

Recently uploaded (20)

PPTX
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
PPTX
How to Add Columns and Rows in an R Data Frame
subhashenia
 
PDF
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
PDF
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
PDF
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
PPTX
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
PPTX
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
PPTX
Feb 2021 Ransomware Recovery presentation.pptx
enginsayin1
 
PPTX
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
PDF
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
PDF
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
PPTX
BinarySearchTree in datastructures in detail
kichokuttu
 
PPTX
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
PDF
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
PPTX
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
PPTX
01_Nico Vincent_Sailpeak.pptx_AI_Barometer_2025
FinTech Belgium
 
PPTX
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
PPTX
big data eco system fundamentals of data science
arivukarasi
 
PDF
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
PPTX
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
How to Add Columns and Rows in an R Data Frame
subhashenia
 
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
Feb 2021 Ransomware Recovery presentation.pptx
enginsayin1
 
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
BinarySearchTree in datastructures in detail
kichokuttu
 
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
01_Nico Vincent_Sailpeak.pptx_AI_Barometer_2025
FinTech Belgium
 
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
big data eco system fundamentals of data science
arivukarasi
 
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 

Building a Fault Tolerant Distributed Architecture

  • 1. Building a Fault Tolerant Distributed Architecture Rodrigo Toste Gomes Software Engineer
  • 2. A MemSQL Cluster 2 - Master Aggregator - Leaves Leaves store shards of data. Master Aggregator is the source of truth for cluster state: - Location of data shards;
  • 3. Fault Tolerance and High Availability 3 Maintain replicas of shards in different nodes. System can tolerate one node failure. Master Aggregator is also responsible for: - Location of data shard replicas; - Replication states; - Moving shards around.
  • 4. 4 Cluster State: Node State: - Leaf A is online - Leaf B is online Databases: - data Shards: - data:0 - primary on A - data:0 - replica on B - data:1 - primary on B - data:1 - replica on A Example Cluster Nodes: Master Aggregator Leaf A Leaf B
  • 5. Maintaining the Cluster State 5 Cluster State: - Leaf A is online Leaf A: Offline MA BA Heartbeats Cluster State: - Leaf A is offline Leaf A: Offline
  • 6. Maintaining the Cluster State 6 Cluster State: - Leaf A is online - data:0 - primary on A - data:0 - replica on B Leaf A: Offline Leaf B: Online; replica of tweets:0 Cluster State: - Leaf A is offline - data:0 - primary on B Leaf A: Offline Leaf B: Online; replica of data:0
  • 7. Maintaining the Cluster State 7 Cluster State: - Leaf A is online - data:0 - primary on A - data:0 - replica on B Leaf A: Offline Leaf B: Online; replica of tweets:0 Cluster State: - Leaf A is offline - data:0 - primary on B Leaf A: Offline Leaf B: Online; replica of data:0 Leaf B: Online; primary of data:0
  • 8. MemSQL Distributed Architecture 8 Master Aggregator: - Maintains primary copy of cluster state; - Monitors node state via heartbeats. Leaves: - Monitor for changes in local and cluster state; - Reconcile local state with cluster state. What happens when a node is unable to reconcile its local state with the cluster state?
  • 9. Shard Can’t Replicate 9 Cluster State: - data:0 primary on A - data:0 replica on B Leaf A: Primary data:0 Replicating data:0 to B Leaf B: Replica of data:0 MA BA Heartbeats data:0
  • 10. Shard Can’t Replicate 10 Cluster State: - data:0 primary on A - data:0 replica on B Leaf A: Primary data:0 Leaf B: Replica of data:0 Network Partition A | B MA BA Heartbeats data:0
  • 11. Strategy 1: Don’t block writes 11 MA BA Heartbeats data:0 Cluster State: - data:0 primary on A - data:0 replica on B Leaf A: Primary data:0 Leaf B: Out of Date Replica of data:0 Network Partition A | B
  • 12. Strategy 1: Don’t block writes (Leaf A crashes) 12 MA BA Heartbeats data:0 Cluster State: - data:0 primary on A - data:0 replica on B Leaf A: Offline Leaf B: Out of Date Replica of data:0 Network Partition A | B Cluster State: - data:0 primary on B Leaf B: Primary of data:0 Not a MemSQL approach given potential for data loss
  • 13. Strategy 2: Do nothing - block writes indefinitely 13 MA BA Heartbeats data:0 Cluster State: - data:0 primary on A - data:0 replica on B Leaf A: Primary data:0 Leaf B: Replica of data:0 Network Partition A | B Write workload stalls
  • 14. Strategy 3: Leaf Notify 14 MA BA Heartbeats data:0 Cluster State: - data:0 primary on A - data:0 synchronous replica on B Leaf A: Primary data:0 Leaf B: Replica of data:0 Network Partition A | B Write workload is stalled
  • 15. Strategy 3: Leaf Notify 15 MA BA Heartbeats data:0 Cluster State: - data:0 primary on A - data:0 synchronous replica on B Leaf A: Primary data:0 Leaf Notifies MA - no replication Leaf B: Replica of data:0 Network Partition A | B
  • 16. Strategy 3: Leaf Notify 16 MA BA Heartbeats data:0 Cluster State: - data:0 primary on A - data:0 asynchronous replica on B Leaf A: Primary data:0 Leaf B: Out of Date Replica of data:0 Network Partition A | B
  • 17. MemSQL Distributed Architecture 17 Master Aggregator: - Maintains primary copy of cluster state; - Monitors node state via heartbeats. Leaves: - Monitor for changes in local and cluster state; - Reconcile local state with cluster state; - Notify MA to change cluster state when reconciling is impossible.