SlideShare a Scribd company logo
Scaling with MongoDB
       Eliot Horowitz
       @eliothorowitz
        MongoAustin
      February 15, 2011
Scaling

• Storage needs only go up
• Operations/sec only go up
• Complexity only goes up
Horizontal Scaling

• Vertical scaling is limited
• Hard to scale vertically in the cloud
• Can scale wider than higher
Read Scaling

• One master at any time
• Programmer determines if read hits master
  or a slave
• Pro: easy to setup, can scale reads very well
• Con: reads are inconsistent on a slave
• Writes don’t scale
One Master, Many Slaves


• Custom Master/Slave setup
• Have as many slaves as you want
• Can put them local to application servers
• Good for 90+% read heavy applications
  (Wikipedia)
Replica Sets
• High Availability Cluster
• One master at any time, up to 6 slaves
• A slave automatically promoted to master if
  failure
• Drivers support auto routing of reads to
  slaves if programmer allows
• Good for applications that need high write
  availability but mostly reads (Commenting
  System)
Sharding

• Many masters, even more slaves
• Can scale in two dimensions
• Add Shards for write and data size scaling
• Add slaves for inconsistent read scaling and
  redundancy
Sharding Basics
• Data is split up into chunks
• Shard: Replica sets that hold a portion of
  the data
• Config Servers: Store meta data about
  system
• Mongos: Routers, direct direct and merge
  requests
Architecture
                             Shards

           mongod      mongod             mongod
                                                               ...
           mongod      mongod             mongod

           mongod      mongod             mongod
 Config
 Servers

mongod
                       mongos           mongos           ...
mongod

mongod
                    client    client   client   client
Common Setup

• A common setup is 3 shards with 3 servers
  per shard: 3 masters, 6 slaves
• Can add sharding later to an existing replica
  set with no down time
• Can have sharded and non-sharded
  collections
Range Based
       MIN         MAX        LOCATION
        A           F          shard1
        F           M          shard1
       M            R          shard2
        R           Z          shard3




• collection is broken into chunks by range
• chunks default to 64mb or 100,000 objects
Config Servers

• 3 of them
• changes are made with 2 phase commit
• if any are down, meta data goes read only
• system is online as long as 1/3 is up
mongos

• Sharding Router
• Acts just like a mongod to clients
• Can have 1 or as many as you want
• Can run on appserver so no extra network
  traffic
• Cache meta data from config servers
Writes

• Inserts : require shard key, routed
• Removes: routed and/or scattered
• Updates: routed or scattered
Queries

• By shard key: routed
• sorted by shard key: routed in order
• by non shard key: scatter gather
• sorted by non shard key: distributed merge
  sort
Splitting

• Take a chunk and split it in 2
• Splits on the median value
• Splits only change meta data, no data
  change
Splitting
T1
     MIN      MAX   LOCATION
      A        Z      shard1


T2
     MIN      MAX   LOCATION
      A        G       shard1
      G        Z       shard1


T3
     MIN      MAX   LOCATION
      A        D      shard1
      D        G      shard1
      G        S      shard1
      S        Z      shard1
Balancing

• Moves chunks from one shard to another
• Done online while system is running
• Balancing runs in the background
Migrating
T3   MIN      MAX   LOCATION
      A        D      shard1
      D        G      shard1
      G        S      shard1
      S        Z      shard1

T4   MIN      MAX   LOCATION
      A        D      shard1
      D        G      shard1
      G        S      shard1
      S        Z     shard2
T5
     MIN      MAX   LOCATION
      A        D      shard1
      D        G      shard1
      G        S     shard2
      S        Z      shard2
Choosing a Shard Key

• Shard key determines how data is
  partitioned
• Hard to change
• Most important performance decision
Use Case: User Profiles
  { email : “eliot@10gen.com” ,
      addresses : [ { state : “NY” } ]
  }
• Shard by email
• Lookup by email hits 1 node
• Index on { “addresses.state” : 1 }
Use Case: Activity
          Stream
  { user_id : XXX, event_id : YYY , data : ZZZ }
• Shard by user_id
• Looking up an activity stream hits 1 node
• Writing even is distributed
• Index on { “event_id” : 1 } for deletes
Use Case: Photos
  { photo_id : ???? , data : <binary> }
  What’s the right key?
• auto increment
• MD5( data )
• now() + MD5(data)
• month() + MD5(data)
Use Case: Logging
    { machine : “app.foo.com” , app : “apache” ,
     when : “2010-12-02:11:33:14” , data : XXX }
    Possible Shard keys
•   { machine : 1 }
•   { when : 1 }
•   { machine : 1 , app : 1 }
•   { app : 1 }
Roadmap
Past Releases
• First release - February 2009
• v1.0 - August 2009
• v1.2 - December 2009 - Map/Reduce, lots
  of small things
• v1.4 - March 2010 - Concurrency/Geo
• V1.6 - August 2010 - Sharding/Replica Sets
1.8

• Single Server Durability
• Covered Indexes
• Enhancements to Sharding/Replica Sets
Short List

• Better Aggregation
• Full Text Search
• TTL timeout collections
• Concurrency
• Compaction
Download MongoDB
      http://www.mongodb.org



   and
let
us
know
what
you
think
    @eliothorowitz



@mongodb


       10gen is hiring!
http://www.10gen.com/jobs

More Related Content

What's hot (20)

PDF
Sharding
MongoDB
 
PPTX
Introduction to Sharding
MongoDB
 
PPTX
Webinar: Sharding
MongoDB
 
PDF
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Lucidworks
 
PDF
Introduction to SolrCloud
Varun Thacker
 
PDF
Scaling search with SolrCloud
Saumitra Srivastav
 
ODP
GIDS2014: SolrCloud: Searching Big Data
Shalin Shekhar Mangar
 
PPTX
Scaling Through Partitioning and Shard Splitting in Solr 4
thelabdude
 
PDF
Scaling Social Games
Paolo Negri
 
PPTX
Sphinx - High performance full-text search for MySQL
Nguyen Van Vuong
 
PDF
Using Sphinx for Search in PHP
Mike Lively
 
PDF
Search at Twitter: Presented by Michael Busch, Twitter
Lucidworks
 
PDF
Mongrel2, a short introduction
Paolo Negri
 
PDF
Call me maybe: Jepsen and flaky networks
Shalin Shekhar Mangar
 
PPTX
MongoDB Deployment Checklist
MongoDB
 
PDF
Real time fulltext search with sphinx
Adrian Nuta
 
PDF
Cassandra summit 2013 how not to use cassandra
Axel Liljencrantz
 
PPT
2010 mongo berlin-shardinginternals (1)
MongoDB
 
PPTX
Keynote: Apache HBase at Yahoo! Scale
HBaseCon
 
PDF
Real time indexes in Sphinx, Yaroslav Vorozhko
Fuenteovejuna
 
Sharding
MongoDB
 
Introduction to Sharding
MongoDB
 
Webinar: Sharding
MongoDB
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Lucidworks
 
Introduction to SolrCloud
Varun Thacker
 
Scaling search with SolrCloud
Saumitra Srivastav
 
GIDS2014: SolrCloud: Searching Big Data
Shalin Shekhar Mangar
 
Scaling Through Partitioning and Shard Splitting in Solr 4
thelabdude
 
Scaling Social Games
Paolo Negri
 
Sphinx - High performance full-text search for MySQL
Nguyen Van Vuong
 
Using Sphinx for Search in PHP
Mike Lively
 
Search at Twitter: Presented by Michael Busch, Twitter
Lucidworks
 
Mongrel2, a short introduction
Paolo Negri
 
Call me maybe: Jepsen and flaky networks
Shalin Shekhar Mangar
 
MongoDB Deployment Checklist
MongoDB
 
Real time fulltext search with sphinx
Adrian Nuta
 
Cassandra summit 2013 how not to use cassandra
Axel Liljencrantz
 
2010 mongo berlin-shardinginternals (1)
MongoDB
 
Keynote: Apache HBase at Yahoo! Scale
HBaseCon
 
Real time indexes in Sphinx, Yaroslav Vorozhko
Fuenteovejuna
 

Similar to Scaling MongoDB (Mongo Austin) (20)

KEY
2011 mongo sf-sharding
MongoDB
 
PPT
2011 mongo FR - scaling with mongodb
antoinegirbal
 
KEY
Mongodb sharding
xiangrong
 
PPTX
Sharding
MongoDB
 
KEY
2011 mongo sf-scaling
MongoDB
 
KEY
Scaling with MongoDB
MongoDB
 
PPTX
Sharding
MongoDB
 
PDF
Sharding
MongoDB
 
PPTX
Sharding - Seoul 2012
MongoDB
 
PPTX
Introduction to Sharding
MongoDB
 
ODP
MongoDB Devops Madrid February 2012
Juan Vicente Herrera Ruiz de Alejo
 
PDF
Sharding in MongoDB Days 2013
Randall Hunt
 
PDF
Introduction to Sharding
MongoDB
 
PPTX
Back tobasicswebinar part6-rev.
MongoDB
 
PPTX
MongoDB Auto-Sharding at Mongo Seattle
MongoDB
 
PPTX
Hellenic MongoDB user group - Introduction to sharding
csoulios
 
PPTX
Webinar: Serie Operazioni per la vostra applicazione - Sessione 6 - Installar...
MongoDB
 
PPTX
Scaling with MongoDB
Rick Copeland
 
PPT
Mongo db roma replication and sharding
Guglielmo Incisa Di Camerana
 
PPTX
Back to Basics: Build Something Big With MongoDB
MongoDB
 
2011 mongo sf-sharding
MongoDB
 
2011 mongo FR - scaling with mongodb
antoinegirbal
 
Mongodb sharding
xiangrong
 
Sharding
MongoDB
 
2011 mongo sf-scaling
MongoDB
 
Scaling with MongoDB
MongoDB
 
Sharding
MongoDB
 
Sharding
MongoDB
 
Sharding - Seoul 2012
MongoDB
 
Introduction to Sharding
MongoDB
 
MongoDB Devops Madrid February 2012
Juan Vicente Herrera Ruiz de Alejo
 
Sharding in MongoDB Days 2013
Randall Hunt
 
Introduction to Sharding
MongoDB
 
Back tobasicswebinar part6-rev.
MongoDB
 
MongoDB Auto-Sharding at Mongo Seattle
MongoDB
 
Hellenic MongoDB user group - Introduction to sharding
csoulios
 
Webinar: Serie Operazioni per la vostra applicazione - Sessione 6 - Installar...
MongoDB
 
Scaling with MongoDB
Rick Copeland
 
Mongo db roma replication and sharding
Guglielmo Incisa Di Camerana
 
Back to Basics: Build Something Big With MongoDB
MongoDB
 
Ad

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
Ad

Recently uploaded (20)

PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 

Scaling MongoDB (Mongo Austin)

  • 1. Scaling with MongoDB Eliot Horowitz @eliothorowitz MongoAustin February 15, 2011
  • 2. Scaling • Storage needs only go up • Operations/sec only go up • Complexity only goes up
  • 3. Horizontal Scaling • Vertical scaling is limited • Hard to scale vertically in the cloud • Can scale wider than higher
  • 4. Read Scaling • One master at any time • Programmer determines if read hits master or a slave • Pro: easy to setup, can scale reads very well • Con: reads are inconsistent on a slave • Writes don’t scale
  • 5. One Master, Many Slaves • Custom Master/Slave setup • Have as many slaves as you want • Can put them local to application servers • Good for 90+% read heavy applications (Wikipedia)
  • 6. Replica Sets • High Availability Cluster • One master at any time, up to 6 slaves • A slave automatically promoted to master if failure • Drivers support auto routing of reads to slaves if programmer allows • Good for applications that need high write availability but mostly reads (Commenting System)
  • 7. Sharding • Many masters, even more slaves • Can scale in two dimensions • Add Shards for write and data size scaling • Add slaves for inconsistent read scaling and redundancy
  • 8. Sharding Basics • Data is split up into chunks • Shard: Replica sets that hold a portion of the data • Config Servers: Store meta data about system • Mongos: Routers, direct direct and merge requests
  • 9. Architecture Shards mongod mongod mongod ... mongod mongod mongod mongod mongod mongod Config Servers mongod mongos mongos ... mongod mongod client client client client
  • 10. Common Setup • A common setup is 3 shards with 3 servers per shard: 3 masters, 6 slaves • Can add sharding later to an existing replica set with no down time • Can have sharded and non-sharded collections
  • 11. Range Based MIN MAX LOCATION A F shard1 F M shard1 M R shard2 R Z shard3 • collection is broken into chunks by range • chunks default to 64mb or 100,000 objects
  • 12. Config Servers • 3 of them • changes are made with 2 phase commit • if any are down, meta data goes read only • system is online as long as 1/3 is up
  • 13. mongos • Sharding Router • Acts just like a mongod to clients • Can have 1 or as many as you want • Can run on appserver so no extra network traffic • Cache meta data from config servers
  • 14. Writes • Inserts : require shard key, routed • Removes: routed and/or scattered • Updates: routed or scattered
  • 15. Queries • By shard key: routed • sorted by shard key: routed in order • by non shard key: scatter gather • sorted by non shard key: distributed merge sort
  • 16. Splitting • Take a chunk and split it in 2 • Splits on the median value • Splits only change meta data, no data change
  • 17. Splitting T1 MIN MAX LOCATION A Z shard1 T2 MIN MAX LOCATION A G shard1 G Z shard1 T3 MIN MAX LOCATION A D shard1 D G shard1 G S shard1 S Z shard1
  • 18. Balancing • Moves chunks from one shard to another • Done online while system is running • Balancing runs in the background
  • 19. Migrating T3 MIN MAX LOCATION A D shard1 D G shard1 G S shard1 S Z shard1 T4 MIN MAX LOCATION A D shard1 D G shard1 G S shard1 S Z shard2 T5 MIN MAX LOCATION A D shard1 D G shard1 G S shard2 S Z shard2
  • 20. Choosing a Shard Key • Shard key determines how data is partitioned • Hard to change • Most important performance decision
  • 21. Use Case: User Profiles { email : “[email protected]” , addresses : [ { state : “NY” } ] } • Shard by email • Lookup by email hits 1 node • Index on { “addresses.state” : 1 }
  • 22. Use Case: Activity Stream { user_id : XXX, event_id : YYY , data : ZZZ } • Shard by user_id • Looking up an activity stream hits 1 node • Writing even is distributed • Index on { “event_id” : 1 } for deletes
  • 23. Use Case: Photos { photo_id : ???? , data : <binary> } What’s the right key? • auto increment • MD5( data ) • now() + MD5(data) • month() + MD5(data)
  • 24. Use Case: Logging { machine : “app.foo.com” , app : “apache” , when : “2010-12-02:11:33:14” , data : XXX } Possible Shard keys • { machine : 1 } • { when : 1 } • { machine : 1 , app : 1 } • { app : 1 }
  • 26. Past Releases • First release - February 2009 • v1.0 - August 2009 • v1.2 - December 2009 - Map/Reduce, lots of small things • v1.4 - March 2010 - Concurrency/Geo • V1.6 - August 2010 - Sharding/Replica Sets
  • 27. 1.8 • Single Server Durability • Covered Indexes • Enhancements to Sharding/Replica Sets
  • 28. Short List • Better Aggregation • Full Text Search • TTL timeout collections • Concurrency • Compaction
  • 29. Download MongoDB http://www.mongodb.org and
let
us
know
what
you
think @eliothorowitz



@mongodb 10gen is hiring! http://www.10gen.com/jobs

Editor's Notes