SlideShare a Scribd company logo
MongoDB at Scale
Achieving Scale with MongoDB
Thomas Boyd
Manager, SolutionsArchitecture, MongoDB
MongoDB Doesn’t Scale!!
0
5,000
10,000
15,000
20,000
25,000
30,000
1 2 3 4 5 6 7 8
Operations/Second
Number of Nodes
MongoDB Cluster Throughput
Agenda
• Optimization Tips
– Schema Design
– Indexes
– Monitoring
• Vertical Scaling
• Horizontal Scaling
• Scaling your Operations Team
• Customer @ Scale on MongoDB
Optimization Tips: Schema Design
Document Model
• Matches Application
Objects
• Flexible
• High performance
{ "customer_id" : 123,
"first_name" : ”John",
"last_name" : "Smith",
"address" : {
"street": "123 Main Street",
"city": "Houston",
"state": "TX",
"zip_code": "77027"
}
policies: [ {
policy_number : 13,
description: “short term”,
deductible: 500
},
{ policy_number : 14,
description: “dental”,
visits: […]
} ]
}
The Importance of Schema Design
• Very different from RDBMS schema design
• MongoDB Schema:
– denormalize the data
– create a (potentially complex) schema with
prior knowledge of your actual (not just
predicted) query patterns
– write simple queries
Real World Example
Product catalog for retailer selling in 20 countries
{
_id: 375,
en_US: { name: …, description: …, <etc…> },
en_GB: { name: …, description: …, <etc…> },
fr_FR: { name: …, description: …, <etc…> },
fr_CA: { name: …, description: …, <etc…> },
de_DE: …,
<… and so on for other locales …>
}
Not a Good Match for Access Pattern
Actual application queries:
db.catalog.find( { _id: 375 }, { en_US: true } );
db.catalog.find( { _id: 375 }, { fr_FR: true } );
db.catalog.find( { _id: 375 }, { de_DE: true } );
… and so forth for other locales
Inefficient use of resources
Data in RED are being
used. Data in BLUE
take up memory but
are not in demand.
{
_id: 375,
en_US: { name: …, description: …, <etc…> },
en_GB: { name: …, description: …, <etc…> },
fr_FR: { name: …, description: …, <etc…> },
fr_CA: { name: …, description: …, <etc…> },
de_DE: …,
de_CH: …,
<… and so on for other locales …>
}
{
_id: 42,
en_US: { name: …, description: …, <etc…> },
en_GB: { name: …, description: …, <etc…> },
fr_FR: { name: …, description: …, <etc…> },
fr_CA: { name: …, description: …, <etc…> },
de_DE: …,
de_CH: …,
<… and so on for other locales …>
}
Consequences of Schema Redesign
• Queries induced minimal memory overhead
• 20x as many products fit in RAM at once
• Disk IO utilization reduced
• Application latency reduced
{
_id: "375-en_GB",
name: …,
description: …,
<… the rest of the document …>
}
Schema Design Patterns
• Pattern: pre-computing interesting
quantities, ideally with each write operation
• Pattern: putting unrelated items in different
collections to take advantage of indexing
• Anti-pattern: appending to arrays ad
infinitum
• Anti-pattern: importing relational schemas
directly into MongoDB
TAKE THE TIME TO
UNDERSTAND YOUR
APPLICATION
Schema Design Resources
• Blog series, "6 rules of thumb"
– Part 1: http://goo.gl/TFJ3dr
– Part 2: http://goo.gl/qTdGhP
– Part 3: http://goo.gl/JFO1pI
• Webinars, training, consulting,
etc…
Optimization Tips: Indexing
B-Tree Indexes
• Tree-structured references to your documents
• Single biggest tunable performance factor
• Indexing and schema design go hand in hand
Indexing Mistakes and Their Fixes
• Failing to build necessary indexes
– Run .explain(), examine slow query log, mtools,
system.profile collection
• Building unnecessary indexes
– Talk to your application developers about usage
• Running ad-hoc queries in production
– Use a staging environment, use secondaries
mongod log files
Sun Jun 29 06:35:37.646 [conn2]
query test.docs query: {
parent.company: "22794",
parent.employeeId: "83881" }
ntoreturn:1 ntoskip:0
nscanned:806381 keyUpdates:0
numYields: 5 locks(micros)
r:2145254 nreturned:0 reslen:20
1156ms
mtools
• http://github.com/rueckstiess/mtools
• log file analysis for poorly performing queries
– Show me queries that took more than 1000 ms
from 6 am to 6 pm:
– mlogfilter mongodb.log --from 06:00 --to
18:00 --slow 1000 > mongodb-filtered.log
Indexing Strategies
• Create indexes that support your queries!
• Create highly selective indexes
• Eliminate duplicate indexes with compound
indexes
– db.collection.ensureIndex({A:1, B:1, C:1})
– allows queries using leftmost prefix
• Order index columns to support scans & sorts
• Create indexes that support covered queries
• Prevent collection scans in pre-production
environments
db.getSiblingDB("admin").runCommand( {
setParameter: 1, notablescan: 1 } )
Optimization Tips: Monitoring
JUST DO IT
NOW
IN PRE-PROD
MongoDB Management Services (MMS)
Backup
Monitoring
Automation
MMS: Database and Hardware Metrics
MMS Monitoring Setup
Cloud Version of MMS
1. Go to http://mms.mongodb.com
2. Create an account
3. Install one agent in your datacenter
4. Add hosts from the web interface
5. Enjoy!
Vertical Scaling
Factors:
– RAM
– Disk
– CPU
– Network
We are Here to Pump you Up
Primary
Secondary
Secondary
Replica Set Primary
Secondary
Secondary
Replica Set
Working Set Exceeds Physical Memory
Real world Example
• Status changes for entities in the business
• State changes happen in batches and are
fully random
– sometimes 10% of entities get updated
– sometimes 100% get updated
Initial Architecture
Sharded Cluster, 4 shards backed by spinning disk
Application / mongos
mongod
Horizontal Scaling
Rapidly growing business means more
shards
Application / mongos
…16 more shards…
mongod
Vertical Scaling
Scaling random IOPS with SSDs
Application / mongos
mongod SSD
Before you add hardware....
• Make sure you are solving the right scaling problem
• Remedy schema and index problems first
– schema and index problems can look like hardware
problems
• Tune the Operating System
– ulimits, swap, NUMA, NOOP scheduler with hypervisors
• Tune the IO subsystem
– ext4 or XFS vs SAN, RAID10, readahead, noatime
• See MongoDB "production notes" page
• Heed logfile startup warnings
Horizontal Scaling
Why Shard?
• Space
• Throughput
• Latency
• (Manageability of individual nodes)
– restore, compaction, indexing, etc…
Sharding Overview
Primary
Secondary
Secondary
Shard 1
Primary
Secondary
Secondary
Shard 2
Primary
Secondary
Secondary
Shard 3
Primary
Secondary
Secondary
Shard N
…
Query
Router
Query
Router
Query
Router
……
Driver
Application
Range Sharding
mongod
Read/Write Scalability
Key Range
0..100
Range Sharding
Read/Write Scalability
mongod mongod
Key Range
0..50
Key Range
51..100
Sharding
mongod mongod mongod mongod
Key Range
0..25
Key Range
26..50
Key Range
51..75
Key Range
76.. 100
Read/Write Scalability
Shard Key Characteristics
• A good shard key has:
– sufficient cardinality
– distributed writes
– targeted reads ("query isolation")
• Shard key should be in every query if possible
– scatter gather otherwise
• Choosing a good shard key is important!
– affects performance and scalability
– changing it later is expensive
Beware of Ascending Shard Keys
• Monotonically increasing shard key values cause
"hot spots" on inserts
• Examples: timestamps, _id
Shard 1
mongos
Shard 2 Shard 3 Shard N
[ ISODate(…), $maxKey )
Beware of Scatter-Gather Queries
• Extra network traffic
• Extra work on each node
• Sorts in mongos
Primary
Secondary
Secondary
Primary
Secondary
Secondary
Primary
Secondary
Secondary
Primary
Secondary
Secondary
…
Query
Router
Query
Router
Query
Router
……
Driver
Application
Advanced Sharding Options
• Hash-based Sharding
• Tag-Aware Sharding
mongod mongod mongod mongod
Shard Tag Start End
Winter 23 Dec 21 Mar
Spring 22 Mar 21 Jun
Summer 21 Jun 23 Sep
Fall 24 Sep 22 Dec
Spring Summer FallWinter
I FORGOT THE SECTION
ON REPLICA SETS !?
Scaling your Operations Team
How MMS helps you
Scale EasilyMeet SLAs
Best Practices,
Automated
Cut Management
Overhead
Without MMS
Example Deployment – 12 Servers
Install, Configure
150+ steps
…Error handling, throttling, alerts
Scale out, move servers, resize oplog, etc.
10-180+ steps
Upgrades, downgrades
100+ steps
With MMS
Common Tasks, Performed in Minutes
• Deploy – any size, most topologies
• Upgrade/Downgrade – with no downtime
• Scale – add/remove shards or replicas, with no
downtime
• Resize Oplog – with no downtime
• Specify users, roles, custom roles
• Provision AWS instances and optimize for
MongoDB
Customers @ Scale on MongoDB
MonoDB at Scale
250M Ticks/Sec
300K+ Ops/Sec
500K+ Ops/SecFed Agency
Performance
1,400 Servers
1,000+ Servers
250+ Servers
Entertainment Co.
Cluster
Petabytes
10s of billions of objects
13B documents
Data
Asian Internet Co.
Foursquare Stats
• 50M users.
• 1.7M merchants using the platform for marketing
• Operations Per Second: 300,000
• Documents: 5.5B
• 11 MongoDB clusters
– 8 are sharded
• Largest cluster has 15 shards (check ins)
– Sharded on user id
MongoDB DOES Scale!!
0
5,000
10,000
15,000
20,000
25,000
30,000
1 2 3 4 5 6 7 8
Operations/Second
Number of Nodes
MongoDB Cluster Throughput
MongoDB at Scale

More Related Content

What's hot (20)

PPT
Introduction to MongoDB
Ravi Teja
 
PDF
MongoDB WiredTiger Internals
Norberto Leite
 
PPT
Introduction to mongodb
neela madheswari
 
PDF
Get to know PostgreSQL!
Oddbjørn Steffensen
 
PPTX
MongoDB
nikhil2807
 
PPTX
MongoDB presentation
Hyphen Call
 
PDF
MongodB Internals
Norberto Leite
 
PDF
An introduction to MongoDB
César Trigo
 
PPTX
A Technical Introduction to WiredTiger
MongoDB
 
PPTX
An Introduction To NoSQL & MongoDB
Lee Theobald
 
PPTX
MongoDB 101
Abhijeet Vaikar
 
PPTX
Introduction to azure cosmos db
Ratan Parai
 
PPTX
Mongo db intro.pptx
JWORKS powered by Ordina
 
PDF
AWS RDS
Mahesh Raj
 
PPTX
Introduction to MongoDB
NodeXperts
 
PDF
Introduction to Apache Cassandra
Robert Stupp
 
PPTX
대용량 분산 아키텍쳐 설계 #3 대용량 분산 시스템 아키텍쳐
Terry Cho
 
PPTX
Introduction to Redis
Maarten Smeets
 
PDF
An Introduction to Kubernetes
Imesh Gunaratne
 
PPTX
Introduction to Redis
Arnab Mitra
 
Introduction to MongoDB
Ravi Teja
 
MongoDB WiredTiger Internals
Norberto Leite
 
Introduction to mongodb
neela madheswari
 
Get to know PostgreSQL!
Oddbjørn Steffensen
 
MongoDB
nikhil2807
 
MongoDB presentation
Hyphen Call
 
MongodB Internals
Norberto Leite
 
An introduction to MongoDB
César Trigo
 
A Technical Introduction to WiredTiger
MongoDB
 
An Introduction To NoSQL & MongoDB
Lee Theobald
 
MongoDB 101
Abhijeet Vaikar
 
Introduction to azure cosmos db
Ratan Parai
 
Mongo db intro.pptx
JWORKS powered by Ordina
 
AWS RDS
Mahesh Raj
 
Introduction to MongoDB
NodeXperts
 
Introduction to Apache Cassandra
Robert Stupp
 
대용량 분산 아키텍쳐 설계 #3 대용량 분산 시스템 아키텍쳐
Terry Cho
 
Introduction to Redis
Maarten Smeets
 
An Introduction to Kubernetes
Imesh Gunaratne
 
Introduction to Redis
Arnab Mitra
 

Viewers also liked (11)

PPTX
Transforming Your Business Through APIs
Apigee | Google Cloud
 
PPT
API First Mobile Strategy
Nitin Gaur
 
PPTX
Concurrency Control in MongoDB 3.0
MongoDB
 
PDF
MongoDB Europe 2016 - Debugging MongoDB Performance
MongoDB
 
PPTX
MongoDB on AWSクラウドという選択
Yasuhiro Matsuo
 
PPT
Everything You Need to Know About Sharding
MongoDB
 
PPTX
Redis勉強会資料(2015/06 update)
Yuji Otani
 
PPTX
Sharding Methods for MongoDB
MongoDB
 
PDF
Optimizing MongoDB: Lessons Learned at Localytics
andrew311
 
PPTX
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB
 
PPTX
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB
 
Transforming Your Business Through APIs
Apigee | Google Cloud
 
API First Mobile Strategy
Nitin Gaur
 
Concurrency Control in MongoDB 3.0
MongoDB
 
MongoDB Europe 2016 - Debugging MongoDB Performance
MongoDB
 
MongoDB on AWSクラウドという選択
Yasuhiro Matsuo
 
Everything You Need to Know About Sharding
MongoDB
 
Redis勉強会資料(2015/06 update)
Yuji Otani
 
Sharding Methods for MongoDB
MongoDB
 
Optimizing MongoDB: Lessons Learned at Localytics
andrew311
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB
 
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB
 
Ad

Similar to MongoDB at Scale (20)

PPTX
Scaling MongoDB
MongoDB
 
PPTX
How to Achieve Scale with MongoDB
MongoDB
 
PPTX
Webinar: Scaling MongoDB
MongoDB
 
PPTX
Agility and Scalability with MongoDB
MongoDB
 
PPTX
MongoDB for Time Series Data Part 3: Sharding
MongoDB
 
PPT
MongoDB Sharding Webinar 2014
Dylan Tong
 
PPTX
Challenges of Implementing an Advanced SQL Engine on Hadoop
DataWorks Summit
 
PPTX
MongoDB 3.0
Victoria Malaya
 
PPTX
How sitecore depends on mongo db for scalability and performance, and what it...
Antonios Giannopoulos
 
PPTX
L’architettura di Classe Enterprise di Nuova Generazione
MongoDB
 
PPT
MongoDB Tick Data Presentation
MongoDB
 
PPTX
L’architettura di classe enterprise di nuova generazione
MongoDB
 
KEY
MongoDB
Steven Francia
 
PDF
mongodb tutorial
Jaehong Park
 
PDF
Using Spring with NoSQL databases (SpringOne China 2012)
Chris Richardson
 
PDF
MongoDB in FS
MongoDB
 
PDF
MongoDB: What, why, when
Eugenio Minardi
 
PPTX
MongoDB for Time Series Data: Sharding
MongoDB
 
PPTX
MongoDB 3.4 webinar
Andrew Morgan
 
PDF
MongoDB performance
Mydbops
 
Scaling MongoDB
MongoDB
 
How to Achieve Scale with MongoDB
MongoDB
 
Webinar: Scaling MongoDB
MongoDB
 
Agility and Scalability with MongoDB
MongoDB
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB
 
MongoDB Sharding Webinar 2014
Dylan Tong
 
Challenges of Implementing an Advanced SQL Engine on Hadoop
DataWorks Summit
 
MongoDB 3.0
Victoria Malaya
 
How sitecore depends on mongo db for scalability and performance, and what it...
Antonios Giannopoulos
 
L’architettura di Classe Enterprise di Nuova Generazione
MongoDB
 
MongoDB Tick Data Presentation
MongoDB
 
L’architettura di classe enterprise di nuova generazione
MongoDB
 
mongodb tutorial
Jaehong Park
 
Using Spring with NoSQL databases (SpringOne China 2012)
Chris Richardson
 
MongoDB in FS
MongoDB
 
MongoDB: What, why, when
Eugenio Minardi
 
MongoDB for Time Series Data: Sharding
MongoDB
 
MongoDB 3.4 webinar
Andrew Morgan
 
MongoDB performance
Mydbops
 
Ad

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 

Recently uploaded (20)

PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 

MongoDB at Scale

  • 2. Achieving Scale with MongoDB Thomas Boyd Manager, SolutionsArchitecture, MongoDB
  • 3. MongoDB Doesn’t Scale!! 0 5,000 10,000 15,000 20,000 25,000 30,000 1 2 3 4 5 6 7 8 Operations/Second Number of Nodes MongoDB Cluster Throughput
  • 4. Agenda • Optimization Tips – Schema Design – Indexes – Monitoring • Vertical Scaling • Horizontal Scaling • Scaling your Operations Team • Customer @ Scale on MongoDB
  • 6. Document Model • Matches Application Objects • Flexible • High performance { "customer_id" : 123, "first_name" : ”John", "last_name" : "Smith", "address" : { "street": "123 Main Street", "city": "Houston", "state": "TX", "zip_code": "77027" } policies: [ { policy_number : 13, description: “short term”, deductible: 500 }, { policy_number : 14, description: “dental”, visits: […] } ] }
  • 7. The Importance of Schema Design • Very different from RDBMS schema design • MongoDB Schema: – denormalize the data – create a (potentially complex) schema with prior knowledge of your actual (not just predicted) query patterns – write simple queries
  • 8. Real World Example Product catalog for retailer selling in 20 countries { _id: 375, en_US: { name: …, description: …, <etc…> }, en_GB: { name: …, description: …, <etc…> }, fr_FR: { name: …, description: …, <etc…> }, fr_CA: { name: …, description: …, <etc…> }, de_DE: …, <… and so on for other locales …> }
  • 9. Not a Good Match for Access Pattern Actual application queries: db.catalog.find( { _id: 375 }, { en_US: true } ); db.catalog.find( { _id: 375 }, { fr_FR: true } ); db.catalog.find( { _id: 375 }, { de_DE: true } ); … and so forth for other locales
  • 10. Inefficient use of resources Data in RED are being used. Data in BLUE take up memory but are not in demand. { _id: 375, en_US: { name: …, description: …, <etc…> }, en_GB: { name: …, description: …, <etc…> }, fr_FR: { name: …, description: …, <etc…> }, fr_CA: { name: …, description: …, <etc…> }, de_DE: …, de_CH: …, <… and so on for other locales …> } { _id: 42, en_US: { name: …, description: …, <etc…> }, en_GB: { name: …, description: …, <etc…> }, fr_FR: { name: …, description: …, <etc…> }, fr_CA: { name: …, description: …, <etc…> }, de_DE: …, de_CH: …, <… and so on for other locales …> }
  • 11. Consequences of Schema Redesign • Queries induced minimal memory overhead • 20x as many products fit in RAM at once • Disk IO utilization reduced • Application latency reduced { _id: "375-en_GB", name: …, description: …, <… the rest of the document …> }
  • 12. Schema Design Patterns • Pattern: pre-computing interesting quantities, ideally with each write operation • Pattern: putting unrelated items in different collections to take advantage of indexing • Anti-pattern: appending to arrays ad infinitum • Anti-pattern: importing relational schemas directly into MongoDB
  • 13. TAKE THE TIME TO UNDERSTAND YOUR APPLICATION
  • 14. Schema Design Resources • Blog series, "6 rules of thumb" – Part 1: http://goo.gl/TFJ3dr – Part 2: http://goo.gl/qTdGhP – Part 3: http://goo.gl/JFO1pI • Webinars, training, consulting, etc…
  • 16. B-Tree Indexes • Tree-structured references to your documents • Single biggest tunable performance factor • Indexing and schema design go hand in hand
  • 17. Indexing Mistakes and Their Fixes • Failing to build necessary indexes – Run .explain(), examine slow query log, mtools, system.profile collection • Building unnecessary indexes – Talk to your application developers about usage • Running ad-hoc queries in production – Use a staging environment, use secondaries
  • 18. mongod log files Sun Jun 29 06:35:37.646 [conn2] query test.docs query: { parent.company: "22794", parent.employeeId: "83881" } ntoreturn:1 ntoskip:0 nscanned:806381 keyUpdates:0 numYields: 5 locks(micros) r:2145254 nreturned:0 reslen:20 1156ms
  • 19. mtools • http://github.com/rueckstiess/mtools • log file analysis for poorly performing queries – Show me queries that took more than 1000 ms from 6 am to 6 pm: – mlogfilter mongodb.log --from 06:00 --to 18:00 --slow 1000 > mongodb-filtered.log
  • 20. Indexing Strategies • Create indexes that support your queries! • Create highly selective indexes • Eliminate duplicate indexes with compound indexes – db.collection.ensureIndex({A:1, B:1, C:1}) – allows queries using leftmost prefix • Order index columns to support scans & sorts • Create indexes that support covered queries • Prevent collection scans in pre-production environments db.getSiblingDB("admin").runCommand( { setParameter: 1, notablescan: 1 } )
  • 22. JUST DO IT NOW IN PRE-PROD
  • 23. MongoDB Management Services (MMS) Backup Monitoring Automation
  • 24. MMS: Database and Hardware Metrics
  • 26. Cloud Version of MMS 1. Go to http://mms.mongodb.com 2. Create an account 3. Install one agent in your datacenter 4. Add hosts from the web interface 5. Enjoy!
  • 28. Factors: – RAM – Disk – CPU – Network We are Here to Pump you Up Primary Secondary Secondary Replica Set Primary Secondary Secondary Replica Set
  • 29. Working Set Exceeds Physical Memory
  • 30. Real world Example • Status changes for entities in the business • State changes happen in batches and are fully random – sometimes 10% of entities get updated – sometimes 100% get updated
  • 31. Initial Architecture Sharded Cluster, 4 shards backed by spinning disk Application / mongos mongod
  • 32. Horizontal Scaling Rapidly growing business means more shards Application / mongos …16 more shards… mongod
  • 33. Vertical Scaling Scaling random IOPS with SSDs Application / mongos mongod SSD
  • 34. Before you add hardware.... • Make sure you are solving the right scaling problem • Remedy schema and index problems first – schema and index problems can look like hardware problems • Tune the Operating System – ulimits, swap, NUMA, NOOP scheduler with hypervisors • Tune the IO subsystem – ext4 or XFS vs SAN, RAID10, readahead, noatime • See MongoDB "production notes" page • Heed logfile startup warnings
  • 36. Why Shard? • Space • Throughput • Latency • (Manageability of individual nodes) – restore, compaction, indexing, etc…
  • 37. Sharding Overview Primary Secondary Secondary Shard 1 Primary Secondary Secondary Shard 2 Primary Secondary Secondary Shard 3 Primary Secondary Secondary Shard N … Query Router Query Router Query Router …… Driver Application
  • 39. Range Sharding Read/Write Scalability mongod mongod Key Range 0..50 Key Range 51..100
  • 40. Sharding mongod mongod mongod mongod Key Range 0..25 Key Range 26..50 Key Range 51..75 Key Range 76.. 100 Read/Write Scalability
  • 41. Shard Key Characteristics • A good shard key has: – sufficient cardinality – distributed writes – targeted reads ("query isolation") • Shard key should be in every query if possible – scatter gather otherwise • Choosing a good shard key is important! – affects performance and scalability – changing it later is expensive
  • 42. Beware of Ascending Shard Keys • Monotonically increasing shard key values cause "hot spots" on inserts • Examples: timestamps, _id Shard 1 mongos Shard 2 Shard 3 Shard N [ ISODate(…), $maxKey )
  • 43. Beware of Scatter-Gather Queries • Extra network traffic • Extra work on each node • Sorts in mongos Primary Secondary Secondary Primary Secondary Secondary Primary Secondary Secondary Primary Secondary Secondary … Query Router Query Router Query Router …… Driver Application
  • 44. Advanced Sharding Options • Hash-based Sharding • Tag-Aware Sharding mongod mongod mongod mongod Shard Tag Start End Winter 23 Dec 21 Mar Spring 22 Mar 21 Jun Summer 21 Jun 23 Sep Fall 24 Sep 22 Dec Spring Summer FallWinter
  • 45. I FORGOT THE SECTION ON REPLICA SETS !?
  • 47. How MMS helps you Scale EasilyMeet SLAs Best Practices, Automated Cut Management Overhead
  • 48. Without MMS Example Deployment – 12 Servers Install, Configure 150+ steps …Error handling, throttling, alerts Scale out, move servers, resize oplog, etc. 10-180+ steps Upgrades, downgrades 100+ steps
  • 50. Common Tasks, Performed in Minutes • Deploy – any size, most topologies • Upgrade/Downgrade – with no downtime • Scale – add/remove shards or replicas, with no downtime • Resize Oplog – with no downtime • Specify users, roles, custom roles • Provision AWS instances and optimize for MongoDB
  • 51. Customers @ Scale on MongoDB
  • 52. MonoDB at Scale 250M Ticks/Sec 300K+ Ops/Sec 500K+ Ops/SecFed Agency Performance 1,400 Servers 1,000+ Servers 250+ Servers Entertainment Co. Cluster Petabytes 10s of billions of objects 13B documents Data Asian Internet Co.
  • 53. Foursquare Stats • 50M users. • 1.7M merchants using the platform for marketing • Operations Per Second: 300,000 • Documents: 5.5B • 11 MongoDB clusters – 8 are sharded • Largest cluster has 15 shards (check ins) – Sharded on user id
  • 54. MongoDB DOES Scale!! 0 5,000 10,000 15,000 20,000 25,000 30,000 1 2 3 4 5 6 7 8 Operations/Second Number of Nodes MongoDB Cluster Throughput

Editor's Notes

  • #48: MMS can do a lot for [ops teams]. Best Practices, Automated. MMS takes best practices for running MongoDB and automates them. So you run ops the way MongoDB engineers would do it. This not only makes it more fool-proof, but it also helps you… Cut Management Overhead. No custom scripting or special setup needed. You can spend less time running and managing manual tasks because MMS takes care of a lot of the work for you, letting you focus on other tasks. Meet SLAs. Automating critical management tasks makes it easier to meet uptime SLAs. This includes managing failover as well as doing rolling upgrades with no downtime. Scale Easily. Provision new nodes and systems with a single click.
  • #49: It is, of course, possible to do these things without MMS. But it takes work. Typically manual work, or custom scripting. In either case, these things take time, require you to check for mistakes and are more prone to having things go wrong.
  • #53: More info: http://www.mongodb.com/mongodb-scale