Back to Basics 2017 : Webinar 4
Introduction to Sharding
Joe Drumgoole
Director of Developer Advocacy, EMEA
MongoDB
@jdrumgoole
V1.1
3
Summary of Part 1 to 3
• Introduction to NoSQL
• Your First MongoDB Application
• Introduction to Replica Sets
• MongoDB Compass, MongoDB Atlas
4
Agenda
• Sharding – What is it? Why do we need it?
• The architecture of a sharded cluster
• Sharded cluster constraints
• How a sharded cluster works in practice
Developing with MongoDB
Application
Driver
mongod
/data
Replica Set with MongoDB
Application
Driver
Primary
/data
Secondary
/data
Secondary
/data
Replica Set Bottlenecks
Application
Driver
Primary
/data
Secondary
/data
Secondary
/data
RAM Limits on
single server
CPU Limits on
single server
Network
Bandwidth
Disk I/O
What is Sharding?
Application
mongos mongos mongos
Driver
But There is More
Application
mongos mongos mongos
Driver
Config Server
10
Construction
• Build Cluster
• Identify shard key
• Sharding happens on individual collections
• To shard a collection:
sh.shardcollections( "MUGS.members",{ "members.member_id" : 1 } )
11
Shard Keys
• User defines shard key
• Shard key defines range of data
• Key space is like points on a line
• Range is a segment of that line
12
Shard Key Constraints
• Shard keys are immutable
• Shard keys should have high cardinality
• Shard keys must be unique
• Shard key must exist in every document
• Limited to 512 bytes in size
• Cannot be a multi-key (array)
Distributing Data
14
Chunk is a Section on the Range
15
Chunk Splitting
16
How Data is Distributed
• Initially 1 chunk
• Default max chunk size: 64mb
• MongoDB automatically splits & migrates chunks when max
reached
Balancing the Cluster
18
Acquiring the Balancer Lock
19
Moving the Chunk
20
Committing the Migration
21
Clean Up
Routing Requests
23
Routing Requests - Targeted
24
Routing Requests – Non-Targeted
25
Routing with Sort
26
Picking a Shard Key
• Cardinality
• Write Distribution
• Query Isolation
Q&A

Back to Basics 2017: Introduction to Sharding

Editor's Notes

  • #3 Who I am, how long have I been at MongoDB.
  • #9 Data is partitioned. Mongos distribute data.
  • #10 Data is partitioned. Mongos distribute data.
  • #16 Once chunk size is reached, mongos asks mongod to split a chunk + internal function called splitVector() mongod counts number of documents on each side of split + based on avg. document size `db.stats()` Chunk split is a **logical** operation (no data has moved) Max on first chunk should be 14
  • #18 Balancer is running on mongos Once the difference in chunks between the most dense shard and the least dense shard is above the migration threshold, a balancing round starts