SlideShare a Scribd company logo
#MDBW17
Andrew Young, Technical Services Engineer
COMMON CLUSTER
CONFIGURATION PITFALLS
#MDBW17
OVERVIEW
• Replication Review
• Replication Pitfalls
• Sharding Review
• Sharding Pitfalls
• Takeaways
• Questions
REPLICATION
REVIEW
#MDBW17
REPLICATION IMPROVES AVAILABILITY BY
DUPLICATING DATA BETWEEN NODES
Primary Secondary Secondary
{x:1,y:2} {x:1,y:2} {x:1,y:2}
#MDBW17
NODES VOTE TO SELECT A PRIMARY
VOTING REQUIRES A STRICT MAJORITY
Server 1
Vote:
Server
1
Server 2
Vote:
Server
1
Server 3
Vote:
Server
3
Winner:
Server 1
#MDBW17
ARBITERS VOTE BUT DON’T STORE DATA
Primary Secondary Arbiter
{x:1,y:2} {x:1,y:2}
#MDBW17
WRITE CONCERN
Primary Secondary Secondary
{x:1,y:2} {x:1,y:2} {x:1,y:2}
w: 1 w: 2 w: 3
w: majority
REPLICATION
PITFALLS
#MDBW17
BUY N LARGE
• Online retailer, USA only
• Requires high availability
• Report generation should not
impact production performance
#MDBW17
REPLICA SET DEPLOYMENT
DC1
Primary
Arbiter
DC2
Secondary
#MDBW17
REPLICA SET DEPLOYMENT
DC1
Primary
Arbiter
DC2
Secondary
#MDBW17
BUY N LARGE’S EXPECTED BEHAVIOR
DC1
Primary
Arbiter
DC2
Primary
I’m the only one
left, so I should
be primary!
#MDBW17
ACTUAL BEHAVIOR
DC1
Primary
Arbiter
DC2
Secondary
I only received one
vote out of three, so I
should be a
secondary!
#MDBW17
DC2DC1
BUY N LARGE’S APPLICATION
ARCHITECTURE
Load Balancer
App Server 1
Primary Arbiter
App Server 2
Secondary
#MDBW17
DC2DC1
BUY N LARGE’S APPLICATION
ARCHITECTURE
Load Balancer
App Server 1
Primary Arbiter
App Server 2
Secondary
#MDBW17
DC2DC1
NETWORK SPLIT
Load Balancer
App Server 1
Primary Arbiter
App Server 2
Secondary
#MDBW17
DC2DC1
NETWORK SPLIT
Load Balancer
App Server 1
Primary Arbiter
App Server 2
Secondary
#MDBW17
DC2DC1
EXPECTED BEHAVIOR LEADS TO
POSSIBLE DATA CORRUPTION
Load Balancer
App Server 1
Primary Arbiter
App Server 2
Primary
X = 1
✅
X = 2
✅
#MDBW17
DC2DC1
ACTUAL BEHAVIOR PREVENTS
POSSIBLE DATA CORRUPTION
Load Balancer
App Server 1
Primary Arbiter
App Server 2
Secondary
X = 1
✅
X = 2
❌
#MDBW17
REPLICA SET DEPLOYMENT V2
DC1
Primary
DC2
Secondary
Cloud
Arbiter
#MDBW17
REPLICA SET DEPLOYMENT V2
DC1
Primary
DC2
Secondary
Cloud
Arbiter
#MDBW17
REPLICA SET DEPLOYMENT V2
DC1
Primary
DC2
Primary
Cloud
Arbiter
#MDBW17
REPLICA SET DEPLOYMENT V3
DC1
Primary
DC2
Secondary
DC3
Secondary
#MDBW17
REPLICA SET DEPLOYMENT V3
DC1
Primary
DC2
Secondary
• Under
Provisioned
1 Hour Lag
DC3
Secondary
• Under
Provisioned
1 Hour Lag
#MDBW17
REPLICA SET DEPLOYMENT V3
DC1
Primary
DC2
Secondary
• Under
Provisioned
1 Hour Lag
DC3
Primary
• Under
Provisioned
1 Hour Lag
#MDBW17
REPLICA SET DEPLOYMENT V3
DC1
Secondary
• Rollback 1
Hour
DC2
Secondary
DC3
Primary
#MDBW17
REPLICA SET DEPLOYMENT V4
DC1
Primary
DC2
Secondary
DC3
Secondary
#MDBW17
REPLICA SET DEPLOYMENT V5
DC1
Primary
• Operational
DC2
Secondary
• High
Availability
DC3
Secondary
• HA
Hidden
• Reporting
#MDBW17
LESSONS LEARNED - REPLICATION
• Be sure that the application’s write concern setting can still be
fulfilled if one of the nodes crashes.
• Use arbiters very sparingly if at all.
• The standard “hot spare” disaster recovery model requires manual
intervention. For true HA, a network split or the loss of a data
center should not leave the replica set in a read-only mode.
• Design replica sets for high availability first and then add
specialized nodes as needed.
SHARDING
REVIEW
#MDBW17
BUY N LARGE V2
• BnL has gone global.
• Traffic has increased and the
current replica set is starting to
experience performance issues
from the increased load.
#MDBW17
REPLICATION DUPLICATES DATA
Primary
A-Z
Secondary
A-Z
Secondary
A-Z
#MDBW17
SHARDING DIVIDES UP THE DATA
S0
A-E
N-Q
S1
F-J
R-T
S2
K-M
U-Z
Collections are
sharded, not
databases.
#MDBW17
HIGH AVAILABILITY VS SCALABILITY
Primary
A-Z
Secondary
A-Z
Secondary
A-Z
S0
A-E
N-Q
S1
F-J
R-T
S2
K-M
U-Z
#MDBW17
BUT WHAT ABOUT SECONDARY READS?
• Can improve performance in
certain cases.
• May return stale or duplicate
documents.
• Use with caution!
https://xkcd.com/386/
SHARDING
PITFALLS
#MDBW17
SHARDED CLUSTER CONFIGURATION V1
• Shard keys:
{_id:”hashed”}
CFG
P
S
S
S0
P
S
S
H
S1
P
S
S
H
S2
P
S
S
H
#MDBW17
SHARDED QUERIES
mongos
A-F G-M N-Z
mongos
A-F G-M N-Z
Range Query
(Requires Shard Key)
Scatter / Gather Query
(No Shard Key)
#MDBW17
SHARDED CLUSTER CONFIGURATION V2
• Shard keys:
{category:1}
CFG
P
S
S
S0
P
S
S
H
S1
P
S
S
H
S2
P
S
S
H
#MDBW17
HOW DOES CHUNKING ACTUALLY WORK?
• Chunks are metadata.
• Chunks represent a key range. S0
A-E
N-Q
S1
F-J
R-T
S2
K-M
U-Z
#MDBW17
HOW DOES CHUNKING ACTUALLY WORK?
• The mongos instance controls
chunk creation and splitting.
• After writing 1/5th of the
maximum chunk size to a
chunk, the mongos requests a
chunk split.
S0
A-E
N-Q
S1
F-J
R-S
T
S2
K-M
U-Z
#MDBW17
SHARD KEY SELECTION AND CARDINALITY
• If a key range can’t be split, it
lacks cardinality. S0
A-E
N-Q
S1
F-J
R-S
T
S2
K-M
U-Z
S
Can’t Split!
#MDBW17
SHARDED CLUSTER CONFIGURATION V3
• Shard keys:
{
category:1,
sku: 1,
_id: 1
}
CFG
P
S
S
S0
P
S
S
H
S1
P
S
S
H
S2
P
S
S
H
#MDBW17
HOW DOES BALANCING ACTUALLY WORK?
• When one shard has too many
chunks, the balancer executes
a chunk move operation.
• Balancing is based on number
of chunks, not size of data.
S0
A-E
N-Q
S1
F-J
R
S
T
S2
K-M
U-Z
S
#MDBW17
CHUNK MOVE PROCESS
1. Documents in the shard key
range of the chunk are copied
to the destination shard.
2. The chunk metadata is
updated.
3. A delete command is queued
on the source shard.
S0
A-E
N-Q
S1
F-J
R
S
T
S2
K-M
U-Z
S
#MDBW17
WHAT ARE ORPHANED DOCUMENTS?
• Sometimes the same document
exists in two places.
• Only primaries and mongos
instances know about chunks.
• Reading from secondary nodes
in a sharded system will return
both copies of the document.
S0
A-E
N-Q
S1
F-J
R
S
T
S2
K-M
U-Z
S
#MDBW17
EMPTY CHUNKS
• If documents are deleted from a
chunk and not replaced, chunks
can become empty.
• Empty chunks can cause data
size imbalances between
shards.
S0
2012
2015
S1
2013
2016
S2
2014
2017
#MDBW17
SHARDED CLUSTER CONFIGURATION V4
• Shard keys:
{
category:1,
sku: 1,
_id: 1
}
CFG
P
S
S
S0
P
S
S
S1
P
S
S
S2
P
S
S
TAKEAWAYS
#MDBW17
LESSONS LEARNED - REPLICATION
• Be sure that the application’s write concern setting can still be
fulfilled if one of the nodes crashes.
• Use arbiters very sparingly if at all.
• The standard “hot spare” disaster recovery model requires manual
intervention. For true HA, a network split or the loss of a data
center should not leave the replica set in a read-only mode.
• Design replica sets for high availability first and then add
specialized nodes as needed.
#MDBW17
LESSONS LEARNED - SHARDING
• Replication is for HA, sharding is for scaling.
• Shard key selection is extremely important.
‒ Affects query performance
‒ Affects chunk balancing
‒ Can not easily be changed later on
• Secondary reads on sharded clusters are highly discouraged.
‒ Orphaned documents will cause multiple versions of the same document to
be returned.
‒ MongoDB 3.4 greatly improved both replication and sharding.
#MDBW17
YOUR MILEAGE MAY VARY
https://xkcd.com/722/
QUESTIONS
Common Cluster Configuration Pitfalls
SAVE
THE
DATE
STRATA + HADOOP WORLD
September 25 – 28, 2017
Javits Center
655 W 34th St
New York, NY 10001**
ORACLE OPENWORLD
September 25 – 28, 2017
Moscone Center
747 Howard Street
San Francisco, CA, 94103
Diamond
Platinum
Gold
Silver

More Related Content

What's hot (20)

PPTX
Webinar: Choosing the Right Shard Key for High Performance and Scale
MongoDB
 
PPTX
Securing Your Enterprise Web Apps with MongoDB Enterprise
MongoDB
 
PPTX
Sizing MongoDB Clusters
MongoDB
 
PPTX
Jumpstart: Your Introduction to MongoDB
MongoDB
 
PPTX
It's a Dangerous World
MongoDB
 
PPTX
AWS Lambda, Step Functions & MongoDB Atlas Tutorial
MongoDB
 
PPTX
Using Compass to Diagnose Performance Problems in Your Cluster
MongoDB
 
PPTX
Powering Microservices with Docker, Kubernetes, Kafka, and MongoDB
MongoDB
 
PPTX
Managing Cloud Security Design and Implementation in a Ransomware World
MongoDB
 
PPTX
Getting Started with MongoDB Using the Microsoft Stack
MongoDB
 
PPTX
Advanced Schema Design Patterns
MongoDB
 
PDF
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
DataStax Academy
 
PPT
Everything You Need to Know About Sharding
MongoDB
 
PDF
RedisConf18 - Redis on Flash
Redis Labs
 
PPTX
Webinar: Avoiding Sub-optimal Performance in your Retail Application
MongoDB
 
PDF
RedisConf18 - Redis Memory Optimization
Redis Labs
 
PDF
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB
 
PDF
Re-Engineering PostgreSQL as a Time-Series Database
All Things Open
 
PDF
MongoDB Europe 2016 - Building WiredTiger
MongoDB
 
PPTX
MongoDB Auto-Sharding at Mongo Seattle
MongoDB
 
Webinar: Choosing the Right Shard Key for High Performance and Scale
MongoDB
 
Securing Your Enterprise Web Apps with MongoDB Enterprise
MongoDB
 
Sizing MongoDB Clusters
MongoDB
 
Jumpstart: Your Introduction to MongoDB
MongoDB
 
It's a Dangerous World
MongoDB
 
AWS Lambda, Step Functions & MongoDB Atlas Tutorial
MongoDB
 
Using Compass to Diagnose Performance Problems in Your Cluster
MongoDB
 
Powering Microservices with Docker, Kubernetes, Kafka, and MongoDB
MongoDB
 
Managing Cloud Security Design and Implementation in a Ransomware World
MongoDB
 
Getting Started with MongoDB Using the Microsoft Stack
MongoDB
 
Advanced Schema Design Patterns
MongoDB
 
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
DataStax Academy
 
Everything You Need to Know About Sharding
MongoDB
 
RedisConf18 - Redis on Flash
Redis Labs
 
Webinar: Avoiding Sub-optimal Performance in your Retail Application
MongoDB
 
RedisConf18 - Redis Memory Optimization
Redis Labs
 
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB
 
Re-Engineering PostgreSQL as a Time-Series Database
All Things Open
 
MongoDB Europe 2016 - Building WiredTiger
MongoDB
 
MongoDB Auto-Sharding at Mongo Seattle
MongoDB
 

Similar to Common Cluster Configuration Pitfalls (20)

PDF
MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen
ryanthiessen
 
PDF
MongoDB World 2019: Sharding: Stories From the Field
MongoDB
 
PPTX
MongoDB for Time Series Data Part 3: Sharding
MongoDB
 
PPTX
MongoDB for Time Series Data: Sharding
MongoDB
 
PPTX
Distribution Models.pptxgdfgdfgdfgfdgdfg
zmulani8
 
PDF
MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkins
kiwilkins
 
PPTX
UNIT II (1).pptx
gopi venkat
 
KEY
Scaling MongoDB (Mongo Austin)
MongoDB
 
PPTX
NOSQL DATABASES UNIT-3 FOR ENGINEERING STUDENTS
Abcd463572
 
PDF
Highly available distributed databases, how they work, javier ramirez at teowaki
javier ramirez
 
PPTX
Sharding
MongoDB
 
PPTX
Back to Basics Spanish 4 Introduction to sharding
MongoDB
 
PDF
Basics of the Highly Available Distributed Databases - teowaki - javier ramir...
javier ramirez
 
PDF
Everything you always wanted to know about highly available distributed datab...
Codemotion
 
PPTX
MySQL HA Sharding-Fabric
Abdul Manaf
 
PDF
Scaling MongoDB with Horizontal and Vertical Sharding
Mydbops
 
PDF
MongoDB Sharding
uzzal basak
 
DOCX
MongoDB Replication and Sharding
Tharun Srinivasa
 
PPTX
Building a Fault Tolerant Distributed Architecture
SingleStore
 
PPTX
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
raghdooosh
 
MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen
ryanthiessen
 
MongoDB World 2019: Sharding: Stories From the Field
MongoDB
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB
 
MongoDB for Time Series Data: Sharding
MongoDB
 
Distribution Models.pptxgdfgdfgdfgfdgdfg
zmulani8
 
MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkins
kiwilkins
 
UNIT II (1).pptx
gopi venkat
 
Scaling MongoDB (Mongo Austin)
MongoDB
 
NOSQL DATABASES UNIT-3 FOR ENGINEERING STUDENTS
Abcd463572
 
Highly available distributed databases, how they work, javier ramirez at teowaki
javier ramirez
 
Sharding
MongoDB
 
Back to Basics Spanish 4 Introduction to sharding
MongoDB
 
Basics of the Highly Available Distributed Databases - teowaki - javier ramir...
javier ramirez
 
Everything you always wanted to know about highly available distributed datab...
Codemotion
 
MySQL HA Sharding-Fabric
Abdul Manaf
 
Scaling MongoDB with Horizontal and Vertical Sharding
Mydbops
 
MongoDB Sharding
uzzal basak
 
MongoDB Replication and Sharding
Tharun Srinivasa
 
Building a Fault Tolerant Distributed Architecture
SingleStore
 
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
raghdooosh
 
Ad

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
Ad

Recently uploaded (20)

PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 

Common Cluster Configuration Pitfalls

Editor's Notes

  • #5: If the primary become unavailable, whether that is due to an outage or just regular maintenance, one of the secondaries can take over as primary.
  • #6: Here we see an example election. Since two of the three servers have voted for server 1, server 1 will become primary.
  • #8: Default write concern is w:1. The client can override the write concern, either by decreasing it to w:0 or by increasing it to another value.
  • #11: Buy N Large is used to standard enterprise software deployment. As such, they are using industry standard disaster recovery methodologies. They have two data centers: their primary data center (DC1) and a “hot spare” disaster recovery data center (DC2). They also set their write concern value to “majority” based on our suggested practices.
  • #12: Their primary data center has now failed, leaving only the “hot spare” disaster recovery data center operational.
  • #13: Buy N Large expects that because there is only one node left, that one node will constitute a majority and will vote to make itself primary.
  • #14: However, the actual behavior is that the secondary remains a secondary, because it knows that there should be three nodes and thus one vote is not enough to become primary.
  • #15: To better understand why this is, let’s look at Buy N Large’s complete architecture.
  • #16: In this case, both application servers talk to the primary in DC1.
  • #17: Now let’s imagine that a network error prevents DC1 from seeing DC2. Each data center thinks the other is down, but in reality all of the servers are still operating normally.
  • #18: The application server in each data center can only see the MongoDB instance in its own data center. However, the load balancer still sees both data centers and still sends traffic to both application servers.
  • #19: If the secondary in DC2 is allowed to become primary because it can only see one itself, then there would be an opportunity for the same data to be changed differently in each data center. When the network issue was resolved, the system would need to determine how to reconcile the updates that were made.
  • #20: Because the voting algorithm that MongoDB uses does not allow the secondary in DC2 to become primary in this situation, this possible data corruption is prevented.
  • #21: To alleviate this problem, Buy N Large moved the arbiter to a small cloud instance where it could see both data centers.
  • #22: Later on, Buy N Large decided to perform routine maintenance on the primary node and shut it down.
  • #23: This time, the node in the second data center was successfully elected as primary thanks to the arbiter in the cloud. However, because write concern is set to “majority”, writes still fail to complete when one of the data bearing nodes is down.
  • #24: To prevent this from happening again in the future, Buy N Large switched out the arbiter for a data bearing node.
  • #25: However, after the system ran for a while under regular load, the secondaries – which were under provisioned as they were not considered to be as important as the primary – began to lag behind the primary.
  • #26: The next time the primary was rotated out for maintenance the secondary in DC3 became primary. Because it is an hour behind, that hour of data is no longer available to the application.
  • #27: When the node in DC1 comes back online as a secondary, it sees that it is an hour ahead of the primary and initiates a rollback. That hour of data is written out to disk so that it is not lost, but the documents in the database are replaced with those that are on the primary. Recovering the data requires a manual process.
  • #28: Buy N Large has re-provisioned their hardware to be the same in all data centers, but because any secondary might become primary, they don’t want to use one of the secondary nodes for reporting.
  • #29: The final replica set deployment includes a hidden secondary node for reporting purposes.
  • #36: Secondary reads also affect the ability for a replica set to continue to function properly when the primary becomes unavailable. If each of the secondaries is already handling some percentage of the reads for the replica set, then the loss of the primary means that the reads from the primary will now be spread across the remaining secondaries.
  • #38: Then they noticed that queries were taking a long time.
  • #40: Then they started seeing jumbo chunks that couldn't be migrated.
  • #43: Because the mongos only knows about the data it is writing, the 1/5th number is used as a heuristic to make an educated guess about when the chunk should be split. The heuristic assumes five mongos instances that are receiving equal traffic. If your system has a significantly larger number of mongos instances and they are all being used equally often, it is possible for this heuristic to break down and cause problems.
  • #44: BI reporting tools begin to see duplicate documents.
  • #46: The chunk move process has been improved in 3.4 with the addition of parallel chunk migrations. In 3.4 it is more efficient to add more than one shard at a time when increasing capacity due to parallel chunk migrations. 3.4 also adds support for inter-cluster compression, although it is turned off my default.
  • #47: There is currently a plan to make secondaries aware of chunk metadata. The difficulty with this is that the view that secondaries have of the chunk metadata must be accurate as of their most recent oplog entry. If a secondary is lagging behind a primary, it is important that the secondary’s chunk metadata matches this lag.
  • #48: In a system where documents are deleted in such a way that chunks become empty, a cluster may become unbalanced from a data size perspective even though it appears balanced from a chunk count perspective. This can happen, for instance, when the shard key contains a timestamp element and old documents are deleted after a certain number of days. If a shard contains empty chunks, the mergeChunks command can be used to merge those empty chunks with chunks that contain data.
  • #49: Here is the final configuration for Buy N Large’s sharded cluster. In this configuration, their BI tools must be taken into account when sizing the cluster because they use the same servers as their production systems. Another option might be to create a second BI cluster and copy data from production to that cluster.