How queries work with sharding

12 likes7,055 views

The document discusses how queries work in sharded MongoDB environments. It explains that MongoDB collections are partitioned into chunks based on a shard key, and each chunk is assigned to a particular shard. When a query is executed, the mongos process routes it to the correct shard(s) based on the shard key range in the query. Queries involving only the shard key are efficient, targeting specific shards. Queries on non-shard keys require scattering and gathering across all shards, but secondary indexes can help efficiency on each shard.

Technology

MongoDB
Sharding

How
queries
work
in
sharded
environments

One
small
server.

We
want
more

capacity.

What
to
do?

TradiAonally,
we
would
scale

verAcally
with
a
bigger
box.

With
sharding
we
instead
scale

horizontally
to
achieve
the
same

computaAonal/storage/memory

footprint
from
smaller
servers.

m=10

We
will
show
the
verAcally
scale
db
and
the
horizontally
scaled
db

side-‐by-‐side
for
comparison.

A
sharded
MongoDB
collecAon
has
a
shard
key.

The
collecAon
is

parAAoned
in
an
order-‐preserving
manner
using
this
key.

In
this

example
a
is
our
shard
key:

{
a
:
…,
b
:
…,
c
:
…
}

a
is
declared
shard
key
for
the
collec0on

{
a
:
…,
b
:
…,
c
:
…
}

a
is
shard
key

Metadata
is
maintained
on
chunks
which
are
represented
by

shard
key
ranges.

Each
chunk
is
assigned
to
a
parAcular
shard.

Range
Shard

a
in
[-‐∞,2000)
2

a
in
[2000,2100)
8

a
in
[2100,5500)
3

…
…

a
in
[88700,
∞)
0

When
a
chunk
becomes
too
large,
MongoDB
automaAcally
splits

it,
and
the
balancer
will
later
migrate
chunks
as
necessary.

{
a
:
…,
b
:
…,
c
:
…
}

a
is
shard
key

ﬁnd(
{
a
:
{
$gt
:
333,
$lt
:
400
}
)

Range
Shard

a
in
[-‐∞,2000)
2

a
in
[2000,2100)
8

a
in
[2100,5500)
3

…
…

a
in
[88700,
∞)
0

The
mongos
process
routes
a
query
to
the
correct
shard(s).

For

the
query
above,
all
data
possibly
relevant
is
on
shard
2,
so
the

query
is
sent
to
that
node
only,
and
processed
there.

{
a
:
…,
b
:
…,
c
:
…
}

a
is
declared
shard
key

ﬁnd(
{
a
:
{
$gt
:
333,
$lt
:
2012
}
)

Range
Shard

a
in
[-‐∞,2000)
2

a
in
[2000,2100)
8

a
in
[2100,5500)
3

…
…

a
in
[88700,
∞)
0

SomeAmes
a
query
range
might
span
more
than
one
shard,
but

very
few
in
total.
This
is
reasonably
eﬃcient.

{
a
:
…,
b
:
…,
c
:
…
}

non-‐shard
key
query,
no
index

ﬁnd(
{
b
:
99
}
)

Range
Shard

a
in
[-‐∞,2000)
2

a
in
[2000,2100)
8

a
in
[2100,5500)
3

…
…

a
in
[88700,
∞)
0

Queries
not
involving
the
shard
key
will
be
sent
to
all
shards
as
a

“scader/gather”
operaAon.

This
is
someAmes
ok.

Here
on
both

our
tradiAonal
machine
and
the
shards,
we
do
a
table
scan
-‐-‐

equally
expensive
(roughly)
on
both.

{
a
:
…,
b
:
…,
c
:
…
}

Sca8er
/
gather
with
secondary
index

ensureIndex({b:1})

ﬁnd(
{
b
:
99
}
)

Range
Shard

a
in
[-‐∞,2000)
2

a
in
[2000,2100)
8

a
in
[2100,5500)
3

…
…

a
in
[88700,
∞)
0

Once
again
a
query
with
a
shard
key
results
in
a
scader/gather

operaAon.

However
at
each
shard,
we
can
use
the
{b:1}
index
to

make
the
operaAon
eﬃcient
for
that
shard.

We
have
a
lidle
extra

overhead
over
the
verAcal
conﬁguraAon
for
the
communicaAons

eﬀort
from
mongos
to
each
shard
-‐-‐
not
too
bad
if
number
of

shards
is
small
(10)
but
quite
substanAal
for
say,
a
1000
shard

system.

{
a
:
…,
b
:
…,
c
:
…
}

Non-‐shard
key
query,
secondary
index

ﬁnd(
{
b
:
99,
a
:
100
}
)

Range
Shard

a
in
[-‐∞,2000)
2

a
in
[2000,2100)
8

a
in
[2100,5500)
3

…
…

a
in
[88700,
∞)
0

The
a
term
involves
the
shard
key
and
allows
mongos
to

intelligently
route
the
query
to
shard
2.

Once
the
query
reaches

shard
2,
the
{
b
:
1
}
index
can
be
used
to
eﬃciently
process
the

query.

When
sorAng
is
speciﬁed,
the
relevant
shards
sort
locally,
and

then
mongos
merges
the
results.

Thus
the
mongos
resource

usage
is
not
terribly
high.

client

Adam

Bob

David

Julie

Sue

Time

Zack

mongos

Bob

David
Sue
Adam

Julie
Tim
Zack

When
using
replicaAon
(typically
a
replica
set),
we
simply
have

more
than
one
node
per
shard.

(arrows
below
indicate
replica0on,
tradi0onal
vs.
sharded

environments)

More Related Content

What's hot (20)

PPTX

Oracle Tablespace - BasicEryk Budi Pratama

PPT

Database performance tuning and query optimizationDhani Ahmad

PPTX

Respaldo y Recuperación de una Base de Datos.pptxJGUADALUPECAMPAMENDE

PDF

Oracle db architectureSimon Huang

PDF

Postgresql database administration volume 1Federico Campoli

PDF

Migration to Oracle MultitenantJitendra Singh

PDF

Backup and recovery in oraclesadegh salehi

PPTX

12. oracle database architectureAmrit Kaur

PPT

Oracle archi pptHitesh Kumar Markam

PPTX

MySql:IntroductionDataminingTools Inc

PPTX

Explain the explain_planMaria Colgan

PPTX

The oracle database architectureAkash Pramanik

PPTX

Data Guard Architecture & SetupSatishbabu Gunukula

PDF

Rman PresentationRick van Ek

PDF

153 Oracle dba interview questionsSandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW

PPTX

Oracle DBAshivankuniversity

PPTX

Oracle ASM TrainingVigilant Technologies

PPTX

10 Problems with your RMAN backup scriptYury Velikanov

PPT

Dataguard presentationVimlendu Kumar

PPT

Oracle ArchitectureNeeraj Singh

Oracle Tablespace - BasicEryk Budi Pratama

Database performance tuning and query optimizationDhani Ahmad

Respaldo y Recuperación de una Base de Datos.pptxJGUADALUPECAMPAMENDE

Oracle db architectureSimon Huang

Postgresql database administration volume 1Federico Campoli

Migration to Oracle MultitenantJitendra Singh

Backup and recovery in oraclesadegh salehi

12. oracle database architectureAmrit Kaur

Oracle archi pptHitesh Kumar Markam

MySql:IntroductionDataminingTools Inc

Explain the explain_planMaria Colgan

The oracle database architectureAkash Pramanik

Data Guard Architecture & SetupSatishbabu Gunukula

Rman PresentationRick van Ek

153 Oracle dba interview questionsSandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW

Oracle DBAshivankuniversity

Oracle ASM TrainingVigilant Technologies

10 Problems with your RMAN backup scriptYury Velikanov

Dataguard presentationVimlendu Kumar

Oracle ArchitectureNeeraj Singh

Viewers also liked (20)

PPTX

Sharding Methods for MongoDBMongoDB

PPT

Everything You Need to Know About ShardingMongoDB

PPTX

MongoDB for Time Series Data Part 3: ShardingMongoDB

PPTX

MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB

PPTX

Choosing a Shard keyMongoDB

PPTX

Lessons Learned from Building a Multi-Tenant Saas Content Management System o...MongoDB

PDF

Eclipse Paho - MQTT and the Internet of ThingsAndy Piper

PPTX

Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS A...MongoDB

PDF

How to monitor MongoDBServer Density

PDF

Efficient Pagination Using MySQLSurat Singh Bhati

PDF

Pagination Done the Right WayMarkus Winand

PPTX

Open Source IoT at EclipseIan Skerrett

PDF

BigData_TP5 : Neo4JLilia Sfaxi

PDF

BigData_TP2: Design Patterns dans HadoopLilia Sfaxi

PDF

BigData_TP4 : CassandraLilia Sfaxi

PDF

BigData_Chp5: Putting it all togetherLilia Sfaxi

PDF

BigData_TP1: Initiation à Hadoop et Map-ReduceLilia Sfaxi

PDF

BigData_TP3 : SparkLilia Sfaxi

PDF

BigData_Chp3: Data ProcessingLilia Sfaxi

PDF

BigData_Chp2: Hadoop & Map-ReduceLilia Sfaxi

Sharding Methods for MongoDBMongoDB

Everything You Need to Know About ShardingMongoDB

MongoDB for Time Series Data Part 3: ShardingMongoDB

MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB

Choosing a Shard keyMongoDB

Lessons Learned from Building a Multi-Tenant Saas Content Management System o...MongoDB

Eclipse Paho - MQTT and the Internet of ThingsAndy Piper

Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS A...MongoDB

How to monitor MongoDBServer Density

Efficient Pagination Using MySQLSurat Singh Bhati

Pagination Done the Right WayMarkus Winand

Open Source IoT at EclipseIan Skerrett

BigData_TP5 : Neo4JLilia Sfaxi

BigData_TP2: Design Patterns dans HadoopLilia Sfaxi

BigData_TP4 : CassandraLilia Sfaxi

BigData_Chp5: Putting it all togetherLilia Sfaxi

BigData_TP1: Initiation à Hadoop et Map-ReduceLilia Sfaxi

BigData_TP3 : SparkLilia Sfaxi

BigData_Chp3: Data ProcessingLilia Sfaxi

BigData_Chp2: Hadoop & Map-ReduceLilia Sfaxi

More from MongoDB (20)

PDF

MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB

PDF

MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB

PDF

MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB

PDF

MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB

PDF

MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB

PDF

MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB

PDF

MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB

PDF

MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB

PDF

MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB

PDF

MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB

PDF

MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB

PDF

MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB

PDF

MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB

PDF

MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB

PDF

MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB

PDF

MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB

PDF

MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB

PDF

MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB

PDF

MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB

PDF

MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB