SlideShare a Scribd company logo
MongoDB	
  Sharding	
  

How	
  queries	
  work	
  in	
  sharded	
  environments	
  
One	
  small	
  server.	
  	
  We	
  want	
  more	
  
capacity.	
  	
  What	
  to	
  do?	
  
TradiAonally,	
  we	
  would	
  scale	
  
verAcally	
  with	
  a	
  bigger	
  box.	
  
With	
  sharding	
  we	
  instead	
  scale	
  
horizontally	
  to	
  achieve	
  the	
  same	
  
computaAonal/storage/memory	
  
footprint	
  from	
  smaller	
  servers.	
  




                           m=10	
  
We	
  will	
  show	
  the	
  verAcally	
  scale	
  db	
  and	
  the	
  horizontally	
  scaled	
  db	
  
side-­‐by-­‐side	
  for	
  comparison.	
  
A	
  sharded	
  MongoDB	
  collecAon	
  has	
  a	
  shard	
  key.	
  	
  The	
  collecAon	
  is	
  
parAAoned	
  in	
  an	
  order-­‐preserving	
  manner	
  using	
  this	
  key.	
  	
  In	
  this	
  
example	
  a	
  is	
  our	
  shard	
  key:	
  

{	
  a	
  :	
  …,	
  b	
  :	
  …,	
  c	
  :	
  …	
  }	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  a	
  is	
  declared	
  shard	
  key	
  for	
  the	
  collec0on	
  
{	
  a	
  :	
  …,	
  b	
  :	
  …,	
  c	
  :	
  …	
  }	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  a	
  is	
  shard	
  key	
  

Metadata	
  is	
  maintained	
  on	
  chunks	
  which	
  are	
  represented	
  by	
  
shard	
  key	
  ranges.	
  	
  Each	
  chunk	
  is	
  assigned	
  to	
  a	
  parAcular	
  shard.	
  
                                                                                                          Range	
                        Shard	
  
                                                                                                          a	
  in	
  [-­‐∞,2000)	
       2	
  
                                                                                                          a	
  in	
  [2000,2100)	
       8	
  
                                                                                                          a	
  in	
  [2100,5500)	
       3	
  
                                                                                                          …	
                            …	
  
                                                                                                          a	
  in	
  [88700,	
  ∞)	
     0	
  



When	
  a	
  chunk	
  becomes	
  too	
  large,	
  MongoDB	
  automaAcally	
  splits	
  
it,	
  and	
  the	
  balancer	
  will	
  later	
  migrate	
  chunks	
  as	
  necessary.	
  
{	
  a	
  :	
  …,	
  b	
  :	
  …,	
  c	
  :	
  …	
  }	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  a	
  is	
  shard	
  key	
  

find(	
  {	
  a	
  :	
  {	
  $gt	
  :	
  333,	
  $lt	
  :	
  400	
  }	
  )	
  

                                                                                                          Range	
                        Shard	
  
                                                                                                          a	
  in	
  [-­‐∞,2000)	
       2	
  
                                                                                                          a	
  in	
  [2000,2100)	
       8	
  
                                                                                                          a	
  in	
  [2100,5500)	
       3	
  
                                                                                                          …	
                            …	
  
                                                                                                          a	
  in	
  [88700,	
  ∞)	
     0	
  



The	
  mongos	
  process	
  routes	
  a	
  query	
  to	
  the	
  correct	
  shard(s).	
  	
  For	
  
the	
  query	
  above,	
  all	
  data	
  possibly	
  relevant	
  is	
  on	
  shard	
  2,	
  so	
  the	
  
query	
  is	
  sent	
  to	
  that	
  node	
  only,	
  and	
  processed	
  there.	
  
{	
  a	
  :	
  …,	
  b	
  :	
  …,	
  c	
  :	
  …	
  }	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  a	
  is	
  declared	
  shard	
  key	
  

find(	
  {	
  a	
  :	
  {	
  $gt	
  :	
  333,	
  $lt	
  :	
  2012	
  }	
  )	
  

                                                                                              Range	
                                  Shard	
  
                                                                                              a	
  in	
  [-­‐∞,2000)	
                 2	
  
                                                                                              a	
  in	
  [2000,2100)	
                 8	
  
                                                                                              a	
  in	
  [2100,5500)	
                 3	
  
                                                                                              …	
                                      …	
  
                                                                                              a	
  in	
  [88700,	
  ∞)	
               0	
  



SomeAmes	
  a	
  query	
  range	
  might	
  span	
  more	
  than	
  one	
  shard,	
  but	
  
very	
  few	
  in	
  total.	
  This	
  is	
  reasonably	
  efficient.	
  
{	
  a	
  :	
  …,	
  b	
  :	
  …,	
  c	
  :	
  …	
  }	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  non-­‐shard	
  key	
  query,	
  no	
  index	
  

find(	
  {	
  b	
  :	
  99	
  }	
  )	
  

                                                                                       Range	
                                       Shard	
  
                                                                                       a	
  in	
  [-­‐∞,2000)	
                      2	
  
                                                                                       a	
  in	
  [2000,2100)	
                      8	
  
                                                                                       a	
  in	
  [2100,5500)	
                      3	
  
                                                                                       …	
                                           …	
  
                                                                                       a	
  in	
  [88700,	
  ∞)	
                    0	
  



Queries	
  not	
  involving	
  the	
  shard	
  key	
  will	
  be	
  sent	
  to	
  all	
  shards	
  as	
  a	
  
“scader/gather”	
  operaAon.	
  	
  This	
  is	
  someAmes	
  ok.	
  	
  Here	
  on	
  both	
  
our	
  tradiAonal	
  machine	
  and	
  the	
  shards,	
  we	
  do	
  a	
  table	
  scan	
  -­‐-­‐	
  
equally	
  expensive	
  (roughly)	
  on	
  both.	
  
{	
  a	
  :	
  …,	
  b	
  :	
  …,	
  c	
  :	
  …	
  }	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Sca8er	
  /	
  gather	
  with	
  secondary	
  index	
  

ensureIndex({b:1})	
  
find(	
  {	
  b	
  :	
  99	
  }	
  )	
  
                                                                                 Range	
                                   Shard	
  
                                                                                 a	
  in	
  [-­‐∞,2000)	
                  2	
  
                                                                                 a	
  in	
  [2000,2100)	
                  8	
  
                                                                                 a	
  in	
  [2100,5500)	
                  3	
  
                                                                                 …	
                                       …	
  
                                                                                 a	
  in	
  [88700,	
  ∞)	
                0	
  



Once	
  again	
  a	
  query	
  with	
  a	
  shard	
  key	
  results	
  in	
  a	
  scader/gather	
  
operaAon.	
  	
  However	
  at	
  each	
  shard,	
  we	
  can	
  use	
  the	
  {b:1}	
  index	
  to	
  
make	
  the	
  operaAon	
  efficient	
  for	
  that	
  shard.	
  	
  We	
  have	
  a	
  lidle	
  extra	
  
overhead	
  over	
  the	
  verAcal	
  configuraAon	
  for	
  the	
  communicaAons	
  
effort	
  from	
  mongos	
  to	
  each	
  shard	
  -­‐-­‐	
  not	
  too	
  bad	
  if	
  number	
  of	
  
shards	
  is	
  small	
  (10)	
  but	
  quite	
  substanAal	
  for	
  say,	
  a	
  1000	
  shard	
  
system.	
  
{	
  a	
  :	
  …,	
  b	
  :	
  …,	
  c	
  :	
  …	
  }	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Non-­‐shard	
  key	
  query,	
  secondary	
  index	
  

find(	
  {	
  b	
  :	
  99,	
  a	
  :	
  100	
  }	
  )	
  

                                                                               Range	
                                    Shard	
  
                                                                               a	
  in	
  [-­‐∞,2000)	
                   2	
  
                                                                               a	
  in	
  [2000,2100)	
                   8	
  
                                                                               a	
  in	
  [2100,5500)	
                   3	
  
                                                                               …	
                                        …	
  
                                                                               a	
  in	
  [88700,	
  ∞)	
                 0	
  



The	
  a	
  term	
  involves	
  the	
  shard	
  key	
  and	
  allows	
  mongos	
  to	
  
intelligently	
  route	
  the	
  query	
  to	
  shard	
  2.	
  	
  Once	
  the	
  query	
  reaches	
  
shard	
  2,	
  the	
  {	
  b	
  :	
  1	
  }	
  index	
  can	
  be	
  used	
  to	
  efficiently	
  process	
  the	
  
query.	
  
When	
  sorAng	
  is	
  specified,	
  the	
  relevant	
  shards	
  sort	
  locally,	
  and	
  
            then	
  mongos	
  merges	
  the	
  results.	
  	
  Thus	
  the	
  mongos	
  resource	
  
            usage	
  is	
  not	
  terribly	
  high.	
  



                                        client	
  



                                         Adam	
  
                                          Bob	
  
                                         David	
  
                                          Julie	
  
                                          Sue	
  
                                         Time	
  
                                         Zack	
  




                                       mongos	
  


 Bob	
  
David	
                                   Sue	
                     Adam	
  
Julie	
                                   Tim	
                     Zack	
  
When	
  using	
  replicaAon	
  (typically	
  a	
  replica	
  set),	
  we	
  simply	
  have	
  
more	
  than	
  one	
  node	
  per	
  shard.	
  

(arrows	
  below	
  indicate	
  replica0on,	
  tradi0onal	
  vs.	
  sharded	
  
environments)	
  

More Related Content

What's hot (20)

PPTX
Oracle Tablespace - Basic
Eryk Budi Pratama
 
PPT
Database performance tuning and query optimization
Dhani Ahmad
 
PPTX
Respaldo y Recuperación de una Base de Datos.pptx
JGUADALUPECAMPAMENDE
 
PDF
Oracle db architecture
Simon Huang
 
PDF
Postgresql database administration volume 1
Federico Campoli
 
PDF
Migration to Oracle Multitenant
Jitendra Singh
 
PDF
Backup and recovery in oracle
sadegh salehi
 
PPTX
12. oracle database architecture
Amrit Kaur
 
PPT
Oracle archi ppt
Hitesh Kumar Markam
 
PPTX
MySql:Introduction
DataminingTools Inc
 
PPTX
Explain the explain_plan
Maria Colgan
 
PPTX
The oracle database architecture
Akash Pramanik
 
PPTX
Data Guard Architecture & Setup
Satishbabu Gunukula
 
PDF
Rman Presentation
Rick van Ek
 
PDF
153 Oracle dba interview questions
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
PPTX
Oracle DBA
shivankuniversity
 
PPTX
Oracle ASM Training
Vigilant Technologies
 
PPTX
10 Problems with your RMAN backup script
Yury Velikanov
 
PPT
Dataguard presentation
Vimlendu Kumar
 
PPT
Oracle Architecture
Neeraj Singh
 
Oracle Tablespace - Basic
Eryk Budi Pratama
 
Database performance tuning and query optimization
Dhani Ahmad
 
Respaldo y Recuperación de una Base de Datos.pptx
JGUADALUPECAMPAMENDE
 
Oracle db architecture
Simon Huang
 
Postgresql database administration volume 1
Federico Campoli
 
Migration to Oracle Multitenant
Jitendra Singh
 
Backup and recovery in oracle
sadegh salehi
 
12. oracle database architecture
Amrit Kaur
 
Oracle archi ppt
Hitesh Kumar Markam
 
MySql:Introduction
DataminingTools Inc
 
Explain the explain_plan
Maria Colgan
 
The oracle database architecture
Akash Pramanik
 
Data Guard Architecture & Setup
Satishbabu Gunukula
 
Rman Presentation
Rick van Ek
 
153 Oracle dba interview questions
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
Oracle DBA
shivankuniversity
 
Oracle ASM Training
Vigilant Technologies
 
10 Problems with your RMAN backup script
Yury Velikanov
 
Dataguard presentation
Vimlendu Kumar
 
Oracle Architecture
Neeraj Singh
 

Viewers also liked (20)

PPTX
Sharding Methods for MongoDB
MongoDB
 
PPT
Everything You Need to Know About Sharding
MongoDB
 
PPTX
MongoDB for Time Series Data Part 3: Sharding
MongoDB
 
PPTX
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB
 
PPTX
Choosing a Shard key
MongoDB
 
PPTX
Lessons Learned from Building a Multi-Tenant Saas Content Management System o...
MongoDB
 
PDF
Eclipse Paho - MQTT and the Internet of Things
Andy Piper
 
PPTX
Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS A...
MongoDB
 
PDF
How to monitor MongoDB
Server Density
 
PDF
Efficient Pagination Using MySQL
Surat Singh Bhati
 
PDF
Pagination Done the Right Way
Markus Winand
 
PPTX
Open Source IoT at Eclipse
Ian Skerrett
 
PDF
BigData_TP5 : Neo4J
Lilia Sfaxi
 
PDF
BigData_TP2: Design Patterns dans Hadoop
Lilia Sfaxi
 
PDF
BigData_TP4 : Cassandra
Lilia Sfaxi
 
PDF
BigData_Chp5: Putting it all together
Lilia Sfaxi
 
PDF
BigData_TP1: Initiation à Hadoop et Map-Reduce
Lilia Sfaxi
 
PDF
BigData_TP3 : Spark
Lilia Sfaxi
 
PDF
BigData_Chp3: Data Processing
Lilia Sfaxi
 
PDF
BigData_Chp2: Hadoop & Map-Reduce
Lilia Sfaxi
 
Sharding Methods for MongoDB
MongoDB
 
Everything You Need to Know About Sharding
MongoDB
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB
 
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB
 
Choosing a Shard key
MongoDB
 
Lessons Learned from Building a Multi-Tenant Saas Content Management System o...
MongoDB
 
Eclipse Paho - MQTT and the Internet of Things
Andy Piper
 
Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS A...
MongoDB
 
How to monitor MongoDB
Server Density
 
Efficient Pagination Using MySQL
Surat Singh Bhati
 
Pagination Done the Right Way
Markus Winand
 
Open Source IoT at Eclipse
Ian Skerrett
 
BigData_TP5 : Neo4J
Lilia Sfaxi
 
BigData_TP2: Design Patterns dans Hadoop
Lilia Sfaxi
 
BigData_TP4 : Cassandra
Lilia Sfaxi
 
BigData_Chp5: Putting it all together
Lilia Sfaxi
 
BigData_TP1: Initiation à Hadoop et Map-Reduce
Lilia Sfaxi
 
BigData_TP3 : Spark
Lilia Sfaxi
 
BigData_Chp3: Data Processing
Lilia Sfaxi
 
BigData_Chp2: Hadoop & Map-Reduce
Lilia Sfaxi
 
Ad

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
Ad

Recently uploaded (20)

PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit
 
PDF
🚀 Let’s Build Our First Slack Workflow! 🔧.pdf
SanjeetMishra29
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
Digital Circuits, important subject in CS
contactparinay1
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit
 
🚀 Let’s Build Our First Slack Workflow! 🔧.pdf
SanjeetMishra29
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 

How queries work with sharding

  • 1. MongoDB  Sharding   How  queries  work  in  sharded  environments  
  • 2. One  small  server.    We  want  more   capacity.    What  to  do?  
  • 3. TradiAonally,  we  would  scale   verAcally  with  a  bigger  box.  
  • 4. With  sharding  we  instead  scale   horizontally  to  achieve  the  same   computaAonal/storage/memory   footprint  from  smaller  servers.   m=10  
  • 5. We  will  show  the  verAcally  scale  db  and  the  horizontally  scaled  db   side-­‐by-­‐side  for  comparison.  
  • 6. A  sharded  MongoDB  collecAon  has  a  shard  key.    The  collecAon  is   parAAoned  in  an  order-­‐preserving  manner  using  this  key.    In  this   example  a  is  our  shard  key:   {  a  :  …,  b  :  …,  c  :  …  }                    a  is  declared  shard  key  for  the  collec0on  
  • 7. {  a  :  …,  b  :  …,  c  :  …  }                    a  is  shard  key   Metadata  is  maintained  on  chunks  which  are  represented  by   shard  key  ranges.    Each  chunk  is  assigned  to  a  parAcular  shard.   Range   Shard   a  in  [-­‐∞,2000)   2   a  in  [2000,2100)   8   a  in  [2100,5500)   3   …   …   a  in  [88700,  ∞)   0   When  a  chunk  becomes  too  large,  MongoDB  automaAcally  splits   it,  and  the  balancer  will  later  migrate  chunks  as  necessary.  
  • 8. {  a  :  …,  b  :  …,  c  :  …  }                    a  is  shard  key   find(  {  a  :  {  $gt  :  333,  $lt  :  400  }  )   Range   Shard   a  in  [-­‐∞,2000)   2   a  in  [2000,2100)   8   a  in  [2100,5500)   3   …   …   a  in  [88700,  ∞)   0   The  mongos  process  routes  a  query  to  the  correct  shard(s).    For   the  query  above,  all  data  possibly  relevant  is  on  shard  2,  so  the   query  is  sent  to  that  node  only,  and  processed  there.  
  • 9. {  a  :  …,  b  :  …,  c  :  …  }                    a  is  declared  shard  key   find(  {  a  :  {  $gt  :  333,  $lt  :  2012  }  )   Range   Shard   a  in  [-­‐∞,2000)   2   a  in  [2000,2100)   8   a  in  [2100,5500)   3   …   …   a  in  [88700,  ∞)   0   SomeAmes  a  query  range  might  span  more  than  one  shard,  but   very  few  in  total.  This  is  reasonably  efficient.  
  • 10. {  a  :  …,  b  :  …,  c  :  …  }                    non-­‐shard  key  query,  no  index   find(  {  b  :  99  }  )   Range   Shard   a  in  [-­‐∞,2000)   2   a  in  [2000,2100)   8   a  in  [2100,5500)   3   …   …   a  in  [88700,  ∞)   0   Queries  not  involving  the  shard  key  will  be  sent  to  all  shards  as  a   “scader/gather”  operaAon.    This  is  someAmes  ok.    Here  on  both   our  tradiAonal  machine  and  the  shards,  we  do  a  table  scan  -­‐-­‐   equally  expensive  (roughly)  on  both.  
  • 11. {  a  :  …,  b  :  …,  c  :  …  }                    Sca8er  /  gather  with  secondary  index   ensureIndex({b:1})   find(  {  b  :  99  }  )   Range   Shard   a  in  [-­‐∞,2000)   2   a  in  [2000,2100)   8   a  in  [2100,5500)   3   …   …   a  in  [88700,  ∞)   0   Once  again  a  query  with  a  shard  key  results  in  a  scader/gather   operaAon.    However  at  each  shard,  we  can  use  the  {b:1}  index  to   make  the  operaAon  efficient  for  that  shard.    We  have  a  lidle  extra   overhead  over  the  verAcal  configuraAon  for  the  communicaAons   effort  from  mongos  to  each  shard  -­‐-­‐  not  too  bad  if  number  of   shards  is  small  (10)  but  quite  substanAal  for  say,  a  1000  shard   system.  
  • 12. {  a  :  …,  b  :  …,  c  :  …  }                    Non-­‐shard  key  query,  secondary  index   find(  {  b  :  99,  a  :  100  }  )   Range   Shard   a  in  [-­‐∞,2000)   2   a  in  [2000,2100)   8   a  in  [2100,5500)   3   …   …   a  in  [88700,  ∞)   0   The  a  term  involves  the  shard  key  and  allows  mongos  to   intelligently  route  the  query  to  shard  2.    Once  the  query  reaches   shard  2,  the  {  b  :  1  }  index  can  be  used  to  efficiently  process  the   query.  
  • 13. When  sorAng  is  specified,  the  relevant  shards  sort  locally,  and   then  mongos  merges  the  results.    Thus  the  mongos  resource   usage  is  not  terribly  high.   client   Adam   Bob   David   Julie   Sue   Time   Zack   mongos   Bob   David   Sue   Adam   Julie   Tim   Zack  
  • 14. When  using  replicaAon  (typically  a  replica  set),  we  simply  have   more  than  one  node  per  shard.   (arrows  below  indicate  replica0on,  tradi0onal  vs.  sharded   environments)