Building a Fault Tolerant Distributed Architecture

Download as PPTX, PDF

1 like844 views

The document outlines a fault-tolerant distributed architecture for a MemSQL cluster managed by a master aggregator, which maintains the state of data shards and their replicas across nodes. It discusses how the system handles node failures, replication states, and various strategies for managing data consistency when network partitions occur. Additionally, it details the roles of master aggregator and leaf nodes in maintaining cluster state and monitoring changes.

Data & Analytics

Building a Fault Tolerant Distributed
Architecture
Rodrigo Toste Gomes
Software Engineer

A MemSQL Cluster
2
- Master Aggregator
- Leaves
Leaves store shards of data.
Master Aggregator is the source of truth for cluster state:
- Location of data shards;

Fault Tolerance and High Availability
3
Maintain replicas of shards in different nodes.
System can tolerate one node failure.
Master Aggregator is also responsible for:
- Location of data shard replicas;
- Replication states;
- Moving shards around.

4
Cluster State:
Node State:
- Leaf A is online
- Leaf B is online
Databases:
- data
Shards:
- data:0 - primary on A
- data:0 - replica on B
- data:1 - primary on B
- data:1 - replica on A
Example Cluster
Nodes:
Master Aggregator
Leaf A
Leaf B

Maintaining the Cluster State
5
Cluster State:
- Leaf A is online
Leaf A:
Offline MA
BA
Heartbeats
Cluster State:
- Leaf A is offline
Leaf A:
Offline

Maintaining the Cluster State
6
Cluster State:
- Leaf A is online
- data:0 - primary on A
- data:0 - replica on B
Leaf A:
Offline
Leaf B:
Online; replica of tweets:0
Cluster State:
- Leaf A is offline
- data:0 - primary on B
Leaf A:
Offline
Leaf B:
Online; replica of data:0

Maintaining the Cluster State
7
Cluster State:
- Leaf A is online
- data:0 - primary on A
- data:0 - replica on B
Leaf A:
Offline
Leaf B:
Online; replica of tweets:0
Cluster State:
- Leaf A is offline
- data:0 - primary on B
Leaf A:
Offline
Leaf B:
Online; replica of data:0
Leaf B:
Online; primary of data:0

MemSQL Distributed Architecture
8
Master Aggregator:
- Maintains primary copy of cluster state;
- Monitors node state via heartbeats.
Leaves:
- Monitor for changes in local and cluster state;
- Reconcile local state with cluster state.
What happens when a node is unable to reconcile its local
state with the cluster state?

Shard Can’t Replicate
9
Cluster State:
- data:0 primary on A
- data:0 replica on B
Leaf A:
Primary data:0
Replicating data:0 to B
Leaf B:
Replica of data:0
MA
BA
Heartbeats
data:0

Shard Can’t Replicate
10
Cluster State:
- data:0 primary on A
- data:0 replica on B
Leaf A:
Primary data:0
Leaf B:
Replica of data:0
Network Partition A | B
MA
BA
Heartbeats
data:0

Strategy 1: Don’t block writes
11
MA
BA
Heartbeats
data:0
Cluster State:
- data:0 primary on A
- data:0 replica on B
Leaf A:
Primary data:0
Leaf B:
Out of Date Replica of data:0
Network Partition A | B

Strategy 1: Don’t block writes (Leaf A crashes)
12
MA
BA
Heartbeats
data:0
Cluster State:
- data:0 primary on A
- data:0 replica on B
Leaf A:
Offline
Leaf B:
Out of Date Replica of data:0
Network Partition A | B
Cluster State:
- data:0 primary on B
Leaf B:
Primary of data:0
Not a MemSQL approach
given potential for data loss

Strategy 2: Do nothing - block writes indefinitely
13
MA
BA
Heartbeats
data:0
Cluster State:
- data:0 primary on A
- data:0 replica on B
Leaf A:
Primary data:0
Leaf B:
Replica of data:0
Network Partition A | B
Write workload stalls

Strategy 3: Leaf Notify
14
MA
BA
Heartbeats
data:0
Cluster State:
- data:0 primary on A
- data:0 synchronous replica on B
Leaf A:
Primary data:0
Leaf B:
Replica of data:0
Network Partition A | B
Write workload is stalled

Strategy 3: Leaf Notify
15
MA
BA
Heartbeats
data:0
Cluster State:
- data:0 primary on A
- data:0 synchronous replica on B
Leaf A:
Primary data:0
Leaf Notifies MA - no replication
Leaf B:
Replica of data:0
Network Partition A | B

Strategy 3: Leaf Notify
16
MA
BA
Heartbeats
data:0
Cluster State:
- data:0 primary on A
- data:0 asynchronous replica on B
Leaf A:
Primary data:0
Leaf B:
Out of Date Replica of data:0
Network Partition A | B

MemSQL Distributed Architecture
17
Master Aggregator:
- Maintains primary copy of cluster state;
- Monitors node state via heartbeats.
Leaves:
- Monitor for changes in local and cluster state;
- Reconcile local state with cluster state;
- Notify MA to change cluster state when reconciling is
impossible.

More Related Content

What's hot (7)

PPTX

HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint FederationMuhammad Saleem

PPTX

Federated Query Formulation and Processing Through BioFedMuhammad Saleem

PPTX

FedX - Optimization Techniques for Federated Query Processing on Linked Dataaschwarte

PPT

Re-using Media on the Web: Media fragment re-mixing and playoutMediaMixerCommunity

PPTX

GDG Meets U event - Big data & Wikidata - no lies codelabCAMELIA BOBAN

PDF

Querying Linked Data with SPARQLOlaf Hartig

PDF

(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)Olaf Hartig

HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint FederationMuhammad Saleem

Federated Query Formulation and Processing Through BioFedMuhammad Saleem

FedX - Optimization Techniques for Federated Query Processing on Linked Dataaschwarte

Re-using Media on the Web: Media fragment re-mixing and playoutMediaMixerCommunity

GDG Meets U event - Big data & Wikidata - no lies codelabCAMELIA BOBAN

Querying Linked Data with SPARQLOlaf Hartig

(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)Olaf Hartig

Similar to Building a Fault Tolerant Distributed Architecture (20)

PDF

Tachyon memory centric, fault tolerance storage for cluster framworksViet-Trung TRAN

PDF

Architectural Overview of MapR's Apache Hadoop Distributionmcsrivas

PDF

Replication in the wild ankara cloud meetup - feb 2017AnkaraCloud

PDF

Replication in the wild ankara cloud meetup - feb 2017Onur Dayıbaşı

PDF

Building a Distributed Message Log from ScratchTyler Treat

PDF

Data has a better idea the in-memory data gridBogdan Dina

PPTX

20240515 - Chicago PUG - Clustering in PostgreSQL: Because one database serve...Umair Shahid

PPTX

Basics of Distributed Systems - Distributed StorageNilesh Salpe

PDF

Advanced Replication InternalsScott Hernandez

PDF

Highly available distributed databases, how they work, javier ramirez at teowakijavier ramirez

PPTX

MongoDB for Time Series Data Part 3: ShardingMongoDB

PDF

20230511 - PGConf Nepal - Clustering in PostgreSQL_ Because one database serv...Umair Shahid

PPTX

Common Cluster Configuration PitfallsMongoDB

PPT

Big Data & NoSQL - EFS'11 (Pavlo Baron)Pavlo Baron

PDF

Design Patterns For Distributed NO-reational databaseslovingprince58

PDF

Open west 2015 talk ben coverstonbcoverston

PDF

Design Patterns for Distributed Non-Relational Databasesguestdfd1ec

PPTX

Webinar: Replication and Replica SetsMongoDB

PPT

Ogf2008 Grid Data CachingJags Ramnarayan

PDF

Invalidation-Based Protocols for Replicated DatastoresAntonios Katsarakis

Tachyon memory centric, fault tolerance storage for cluster framworksViet-Trung TRAN

Architectural Overview of MapR's Apache Hadoop Distributionmcsrivas

Replication in the wild ankara cloud meetup - feb 2017AnkaraCloud

Replication in the wild ankara cloud meetup - feb 2017Onur Dayıbaşı

Building a Distributed Message Log from ScratchTyler Treat

Data has a better idea the in-memory data gridBogdan Dina

20240515 - Chicago PUG - Clustering in PostgreSQL: Because one database serve...Umair Shahid

Basics of Distributed Systems - Distributed StorageNilesh Salpe

Advanced Replication InternalsScott Hernandez

Highly available distributed databases, how they work, javier ramirez at teowakijavier ramirez

MongoDB for Time Series Data Part 3: ShardingMongoDB

20230511 - PGConf Nepal - Clustering in PostgreSQL_ Because one database serv...Umair Shahid

Common Cluster Configuration PitfallsMongoDB

Big Data & NoSQL - EFS'11 (Pavlo Baron)Pavlo Baron

Design Patterns For Distributed NO-reational databaseslovingprince58

Open west 2015 talk ben coverstonbcoverston

Design Patterns for Distributed Non-Relational Databasesguestdfd1ec

Webinar: Replication and Replica SetsMongoDB

Ogf2008 Grid Data CachingJags Ramnarayan

Invalidation-Based Protocols for Replicated DatastoresAntonios Katsarakis

More from SingleStore (20)

PPTX

Five ways database modernization simplifies your data lifeSingleStore

PPTX

How Kafka and Modern Databases Benefit Apps and AnalyticsSingleStore

PDF

Architecting Data in the AWS EcosystemSingleStore

PPTX

Building the Foundation for a Latency-Free LifeSingleStore

PDF

Converging Database Transactions and Analytics SingleStore

PDF

Building a Machine Learning Recommendation Engine in SQLSingleStore

PPTX

MemSQL 201: Advanced Tips and Tricks WebcastSingleStore

PDF

Introduction to MemSQLSingleStore

PDF

An Engineering Approach to Database EvaluationsSingleStore

PDF

Stream Processing with Pipelines and Stored ProceduresSingleStore

PPTX

Curriculum Associates Strata NYC 2017SingleStore

PPTX

Image Recognition on Streaming DataSingleStore

PPTX

Spark Summit Dublin 2017 - MemSQL - Real-Time Image RecognitionSingleStore

PDF

The State of the Data Warehouse in 2017 and BeyondSingleStore

PDF

How Database Convergence Impacts the Coming Decades of Data ManagementSingleStore

PPTX

Teaching Databases to Learn in the World of AISingleStore

PDF

Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid CloudSingleStore

PPTX

Gartner Catalyst 2017: Image Recognition on Streaming DataSingleStore

PPTX

Spark Summit West 2017: Real-Time Image Recognition with MemSQL and SparkSingleStore

PDF

Real-Time Analytics at Uber ScaleSingleStore

Five ways database modernization simplifies your data lifeSingleStore

How Kafka and Modern Databases Benefit Apps and AnalyticsSingleStore

Architecting Data in the AWS EcosystemSingleStore

Building the Foundation for a Latency-Free LifeSingleStore

Converging Database Transactions and Analytics SingleStore

Building a Machine Learning Recommendation Engine in SQLSingleStore

MemSQL 201: Advanced Tips and Tricks WebcastSingleStore

Introduction to MemSQLSingleStore

An Engineering Approach to Database EvaluationsSingleStore

Stream Processing with Pipelines and Stored ProceduresSingleStore

Curriculum Associates Strata NYC 2017SingleStore

Image Recognition on Streaming DataSingleStore

Spark Summit Dublin 2017 - MemSQL - Real-Time Image RecognitionSingleStore

The State of the Data Warehouse in 2017 and BeyondSingleStore

How Database Convergence Impacts the Coming Decades of Data ManagementSingleStore

Teaching Databases to Learn in the World of AISingleStore

Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid CloudSingleStore

Gartner Catalyst 2017: Image Recognition on Streaming DataSingleStore

Spark Summit West 2017: Real-Time Image Recognition with MemSQL and SparkSingleStore

Real-Time Analytics at Uber ScaleSingleStore

Recently uploaded (20)

PPTX

apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...apidays

PPTX

How to Add Columns and Rows in an R Data Framesubhashenia

PDF

apidays Singapore 2025 - Surviving an interconnected world with API governanc...apidays

PDF

Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdfKPycho

PDF

apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...apidays

PPTX

apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...apidays

PPTX

apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...apidays

PPTX

Feb 2021 Ransomware Recovery presentation.pptxenginsayin1

PPTX

apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...apidays

PDF

A GraphRAG approach for Energy Efficiency Q&AMarco Brambilla

PDF

apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...apidays

PPTX

BinarySearchTree in datastructures in detailkichokuttu

PPTX

SlideEgg_501298-Agentic AI.pptx agentic ai530BYManoj

PDF

Development and validation of the Japanese version of the Organizational Matt...Yoga Tokuyoshi

PPTX

SHREYAS25 INTERN-I,II,III PPT (1).pptx preswapnilherage

PPTX

01_Nico Vincent_Sailpeak.pptx_AI_Barometer_2025FinTech Belgium

PPTX

Listify-Intelligent-Voice-to-Catalog-Agent.pptxnareshkottees

PPTX

big data eco system fundamentals of data sciencearivukarasi

PDF

OOPs with Java_unit2.pdf. sarthak bookkkSarthak964187

PPTX

apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...apidays

apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...apidays

How to Add Columns and Rows in an R Data Framesubhashenia

apidays Singapore 2025 - Surviving an interconnected world with API governanc...apidays

Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdfKPycho

apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...apidays

apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...apidays

apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...apidays

Feb 2021 Ransomware Recovery presentation.pptxenginsayin1

apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...apidays

A GraphRAG approach for Energy Efficiency Q&AMarco Brambilla

apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...apidays

BinarySearchTree in datastructures in detailkichokuttu

SlideEgg_501298-Agentic AI.pptx agentic ai530BYManoj

Development and validation of the Japanese version of the Organizational Matt...Yoga Tokuyoshi

SHREYAS25 INTERN-I,II,III PPT (1).pptx preswapnilherage

01_Nico Vincent_Sailpeak.pptx_AI_Barometer_2025FinTech Belgium

Listify-Intelligent-Voice-to-Catalog-Agent.pptxnareshkottees

big data eco system fundamentals of data sciencearivukarasi

OOPs with Java_unit2.pdf. sarthak bookkkSarthak964187

apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...apidays

Building a Fault Tolerant Distributed Architecture

1. Building a Fault Tolerant Distributed Architecture Rodrigo Toste Gomes Software Engineer

2. A MemSQL Cluster 2 - Master Aggregator - Leaves Leaves store shards of data. Master Aggregator is the source of truth for cluster state: - Location of data shards;

3. Fault Tolerance and High Availability 3 Maintain replicas of shards in different nodes. System can tolerate one node failure. Master Aggregator is also responsible for: - Location of data shard replicas; - Replication states; - Moving shards around.

4. 4 Cluster State: Node State: - Leaf A is online - Leaf B is online Databases: - data Shards: - data:0 - primary on A - data:0 - replica on B - data:1 - primary on B - data:1 - replica on A Example Cluster Nodes: Master Aggregator Leaf A Leaf B

5. Maintaining the Cluster State 5 Cluster State: - Leaf A is online Leaf A: Offline MA BA Heartbeats Cluster State: - Leaf A is offline Leaf A: Offline

6. Maintaining the Cluster State 6 Cluster State: - Leaf A is online - data:0 - primary on A - data:0 - replica on B Leaf A: Offline Leaf B: Online; replica of tweets:0 Cluster State: - Leaf A is offline - data:0 - primary on B Leaf A: Offline Leaf B: Online; replica of data:0

7. Maintaining the Cluster State 7 Cluster State: - Leaf A is online - data:0 - primary on A - data:0 - replica on B Leaf A: Offline Leaf B: Online; replica of tweets:0 Cluster State: - Leaf A is offline - data:0 - primary on B Leaf A: Offline Leaf B: Online; replica of data:0 Leaf B: Online; primary of data:0

8. MemSQL Distributed Architecture 8 Master Aggregator: - Maintains primary copy of cluster state; - Monitors node state via heartbeats. Leaves: - Monitor for changes in local and cluster state; - Reconcile local state with cluster state. What happens when a node is unable to reconcile its local state with the cluster state?

9. Shard Can’t Replicate 9 Cluster State: - data:0 primary on A - data:0 replica on B Leaf A: Primary data:0 Replicating data:0 to B Leaf B: Replica of data:0 MA BA Heartbeats data:0

10. Shard Can’t Replicate 10 Cluster State: - data:0 primary on A - data:0 replica on B Leaf A: Primary data:0 Leaf B: Replica of data:0 Network Partition A | B MA BA Heartbeats data:0

11. Strategy 1: Don’t block writes 11 MA BA Heartbeats data:0 Cluster State: - data:0 primary on A - data:0 replica on B Leaf A: Primary data:0 Leaf B: Out of Date Replica of data:0 Network Partition A | B

12. Strategy 1: Don’t block writes (Leaf A crashes) 12 MA BA Heartbeats data:0 Cluster State: - data:0 primary on A - data:0 replica on B Leaf A: Offline Leaf B: Out of Date Replica of data:0 Network Partition A | B Cluster State: - data:0 primary on B Leaf B: Primary of data:0 Not a MemSQL approach given potential for data loss

13. Strategy 2: Do nothing - block writes indefinitely 13 MA BA Heartbeats data:0 Cluster State: - data:0 primary on A - data:0 replica on B Leaf A: Primary data:0 Leaf B: Replica of data:0 Network Partition A | B Write workload stalls

14. Strategy 3: Leaf Notify 14 MA BA Heartbeats data:0 Cluster State: - data:0 primary on A - data:0 synchronous replica on B Leaf A: Primary data:0 Leaf B: Replica of data:0 Network Partition A | B Write workload is stalled

15. Strategy 3: Leaf Notify 15 MA BA Heartbeats data:0 Cluster State: - data:0 primary on A - data:0 synchronous replica on B Leaf A: Primary data:0 Leaf Notifies MA - no replication Leaf B: Replica of data:0 Network Partition A | B

16. Strategy 3: Leaf Notify 16 MA BA Heartbeats data:0 Cluster State: - data:0 primary on A - data:0 asynchronous replica on B Leaf A: Primary data:0 Leaf B: Out of Date Replica of data:0 Network Partition A | B

17. MemSQL Distributed Architecture 17 Master Aggregator: - Maintains primary copy of cluster state; - Monitors node state via heartbeats. Leaves: - Monitor for changes in local and cluster state; - Reconcile local state with cluster state; - Notify MA to change cluster state when reconciling is impossible.

18. Questions?

19. Thank you memsql.com