SlideShare a Scribd company logo
Data Modelling for MongoDB
Norberto Leite
MongoDB
May 14th, 2019
Tel Aviv, Israel
Norberto Leite
Lead Engineer - Curriculum
norberto@mongodb.com
New York
@nleite
https://university.mongodb.com
Goals of the Presentation
Recognize the
differences when
modelling for a
Document Database
versus a Relational
Database
Summarize the steps
of a methodology
when modelling for
MongoDB
Recognize the need
and when to apply
Schema Design
Patterns
Goals of the Presentation
Recognize the
differences when
modelling for a
Document Database
versus a Relational
Database
Summarize the steps
of a methodology
when modelling for
MongoDB
Recognize the need
and when to apply
Schema Design
Patterns
Goals of the Presentation
Recognize the
differences when
modelling for a
Document Database
versus a Relational
Database
Summarize the steps
of a methodology
when modelling for
MongoDB
Recognize the need
and when to apply
Schema Design
Patterns
Differences when Modelling for
a Document Database versus a
Relational Database
Data Modelling for MongoDB - MongoDB.local Tel Aviv
Thinking in Documents
1.  Polymorphism
•  different documents may contain
different fields
2.  Array
•  represent a "one-to-many" relation
•  index is on all entries
3.  Sub Document
•  grouping some fields together
4.  JSON/BSON
•  documents are often shown as JSON
•  BSON is the physical format
Data Modelling for MongoDB - MongoDB.local Tel Aviv
Example: modelling a blog
… 5 tables become 1 or 2 collections
Example: Modelling a Social Network
Tabular MongoDB
Steps to create the model 1 – define schema
2 – develop app and queries
1 – identifying the queries
2 – define schema
Initial schema 3rd normal form
One solution
many solutions possible
Final schema likely denormalized few changes
Schema evolution difficult and not optimal
Likely downtime
easy and no downtime
Performance mediocre optimized
Differences: Relational/Tabular vs Document
Other Considerations for the Model
1.  One-to-many relationships where "many" is a humongous number
2.  Embed or Reference
•  Joins via $lookup
•  Transactions for multi document writes
3.  Transactions available for Replica set, and soon for Sharded Clusters
4.  Sharding Key
5.  Indexes
6.  Simple queries, or more complex ones with the Aggregation Framework
Flexible Modelling Methodology for
MongoDB
Data Modelling for MongoDB - MongoDB.local Tel Aviv
Methodology
Methodology
1.  Describe the
Workload
Methodology
1.  Describe the
Workload
2.  Identify and Model
the Relationships
Data Modelling for MongoDB - MongoDB.local Tel Aviv
Data Modelling for MongoDB - MongoDB.local Tel Aviv
Data Modelling for MongoDB - MongoDB.local Tel Aviv
Methodology
1.  Describe the
Workload
2.  Identify and Model
the Relationships
3.  Apply Patterns
Flexible Methodology
Case Study: ‫ארומטי‬ ‫אספרסו‬
A.  Business: coffee shop franchises
B.  Name: ‫אספרסו‬‫ארומטי‬
also considered: Coffee Sababa, Hummus Coffee
C.  Objective:
•  10 000 stores in Israel, Kazakhstan, Romania, Ukraine ...
•  … then we invade America
D.  Keys to success:
•  Best coffee in the world
•  Technology
You have been warned
Make the Best Coffee in the World
23g of ground coffee in, 20g of extracted
coffee out, in approximately 20 seconds
1.  Fill a small or regular cup with 80% hot
water (not boiling but pretty hot). Your
cup should be 150ml to 200ml in total
volume, 80% of which will be hot water.
2.  Grind 23g of coffee into your portafilter
using the double basket. We use a scale
that you can get here.
3.  Draw 20g of coffee over the hot water by
placing your cup on a scale, press tare
and extract your shot.
Technology
1.  Measure inventory in real time
•  Shelves with scales
2.  Big Data collection on cups of coffee
•  weighings, temperature, time to produce, …
3.  Data Analysis
•  Coffee perfection
•  Rush hours -> staffing needs
4.  MongoDB
Methodology
1.  Describe the
Workload
2.  Identify and Model
the Relationships
3.  Apply Patterns
1 – Workload: List Queries
Query Operation Description
1. Coffee weight on the
shelves
write A shelf sends information when coffee bags are
added or removed
2. Coffee to deliver to stores read How much coffee do we have to ship to the store in
the next few days
3. Anomalies in the inventory read Analytics
4. Making a cup of coffee write A coffee machine reporting on the production of a
cup of coffee
5. Analysis of cups of coffee read Analytics
6. Technical Support read Helping our franchisees
Query Quantification Qualification
1. Coffee weight on the shelves 10/day*shelf*store
=> 1/sec
<1s
critical write
2. Coffee to deliver to stores 1/day*store
=> 0.1/sec
<60s
3. Anomalies in the inventory 24 reads/day <5mins
"collection scan"
4. Making a cup of coffee 10 000 000 writes/day
115 writes/sec
<100ms
non-critical write
… cups of coffee at rush hour 3 000 000 writes/hr
833 writes/sec
<100ms
non-critical write
5. Analysis of cups of coffee 24 reads/day stale data is fine
"collection scan"
6. Technical Support 1000 reads/day <1s
1 – Workload: quantify/qualify
Query Quantification Qualification
1. Coffee weight on the shelves 10/day*shelf*store
=> 1/sec
<1s
critical write
2. Coffee to deliver to stores 1/day*store
=> 0.1/sec
<60s
3. Anomalies in the inventory 24 reads/day <5mins
"collection scan"
4. Making a cup of coffee 10 000 000 writes/day
115 writes/sec
<100ms
non-critical write
… cups of coffee at rush hour 3 000 000 writes/hr
833 writes/sec
<100ms
non-critical write
5. Analysis of cups of coffee 24 reads/day stale data is fine
"collection scan"
6. Technical Support 1000 reads/day <1s
1 – Workload: quantify/qualify
Disk Space
Cups of coffee (one year of data)
•  10000 x 1000/day x 365
•  3.7 billions/year
•  370 GB (100 bytes/cup of coffee)
Weighings
•  10000 x 10/day x 365
•  365 billions/year
•  3.7 GB (100 bytes/weighings)
Methodology
1.  Describe the
Workload
2.  Identify and Model
the Relationships
3.  Apply Patterns
2 - Relations are still important
Type of Relation -> one-to-one/1-1 one-to-many/1-N many-to-many/N-N
Document embedded
in the parent document
•  one read
•  no joins
•  one read
•  no joins
•  one read
•  no joins
•  duplication of
information
Document referenced
in the parent document
•  smaller reads
•  many reads
•  smaller reads
•  many reads
•  smaller reads
•  many reads
2 - Entities for ‫ארומטי‬ ‫אספרסו‬
-  Coffee cups
-  Stores
-  Coffee
machines
-  Shelves
-  Weighings
-  Coffee bags
Methodology
1.  Describe the
Workload
2.  Identify and Model
the Relationships
3.  Apply Patterns
Schema Design Patterns
Schema Design Patterns
Resources
A.  Advanced Schema Design
Patterns
•  MongoDB World 2017
•  Webinar
B.  MongoDB University
•  university.mongodb.com
•  M320 – Data Modeling (2019)
C.  Blogs on Schema Design
Patterns
https://www.mongodb.com/blog/post/building-with-patterns-a-summary
Data Modeling
Patterns
Use Cases
Schema Versioning
Computed Pattern
Subset Pattern
Subset Pattern
Bucket Pattern
Bucket Pattern
{	
	"device_id":	000123456,	
	"type":	"2A",	
	"date":	ISODate("2018-03-02"),	
	"temp":	[	[	20.0,	20.1,	20.2,	...	],	
											[	22.1,	22.1,	22.0,	...	],	
											...		
									]	
}
{	
	"device_id":	000123456,	
	"type":	"2A",	
	"date":	ISODate("2018-03-03"),	
	"temp":	[	[	20.1,	20.2,	20.3,	...	],	
											[	22.4,	22.4,	22.3,	...	],	
											...		
									]	
}
	
	
{	
	"device_id":	000123456,	
	"type":	"2A",	
	"date":	ISODate("2018-03-02T13"),	
	"temp":	{	1:	20.0,	2:	20.1,	3:	20.2,	...	}	
}
{	
	"device_id":	000123456,	
	"type":	"2A",	
	"date":	ISODate("2018-03-02T14"),	
	"temp":	{	1:	22.1,	2:	22.1,	3:	22.0,	...	}	
}
	
	
Bucket per
Day
Bucket per
Hour
External Reference Pattern
Solution with - ‫ארומטי‬ ‫אספרסו‬
Patterns
•  Schema Versioning
•  Subset
•  Computed
•  Bucket
•  External Reference
Conclusion
Takeaways from the Presentation
Recognize the
differences when
modelling for a
Document Database
versus a Relational
Database
Takeaways from the Presentation
Recognize the
differences when
modelling for a
Document Database
versus a Relational
Database
Summarize the steps
of a methodology
when modelling for
MongoDB
•  Workload
•  Relationships
•  Patterns
Takeaways from the Presentation
Recognize the
differences when
modelling for a
Document Database
versus a Relational
Database
Summarize the steps
of a methodology
when modelling for
MongoDB
•  Workload
•  Relationships
•  Patterns
Recognize the need
and when to apply
Schema Design
Patterns
Coming Soon …
•  "Data Modelling" course at:
university.mongodb.com
Norberto Leite
Lead Engineer
norberto@mongodb.com
Data Modelling for MongoDB - MongoDB.local Tel Aviv

More Related Content

Similar to Data Modelling for MongoDB - MongoDB.local Tel Aviv (20)

PDF
MongoDB .local London 2019: A Complete Methodology to Data Modeling for MongoDB
MongoDB
 
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
PDF
Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Kris Jack
 
PDF
Silicon Valley Code Camp 2014 - Advanced MongoDB
Daniel Coupal
 
PPT
Agile Data: Building Hadoop Analytics Applications
DataWorks Summit
 
PPTX
Hardware Provisioning for MongoDB
MongoDB
 
PDF
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Databricks
 
PPTX
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
TriNimbus
 
PPTX
Hardware Provisioning
MongoDB
 
PDF
MapReduce succinctly
Daniel Jebaraj
 
PPT
Agile Data Science: Building Hadoop Analytics Applications
Russell Jurney
 
PDF
Demystify Big Data, Data Science & Signal Extraction Deep Dive
Hyderabad Scalability Meetup
 
PPTX
Coming to cassandra from relational world (New)
Nenad Bozic
 
PPT
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
The Hive
 
PDF
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Sonya Liberman
 
PPTX
WisdomEye Technologies
Ashish Jha
 
PPTX
WisdomEye Technologies
wisdomeye
 
PDF
MongoDB World 2019: Raiders of the Anti-patterns: A Journey Towards Fixing Sc...
MongoDB
 
PDF
Cloud computing
Ali reza Khosh zaban
 
MongoDB .local London 2019: A Complete Methodology to Data Modeling for MongoDB
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Kris Jack
 
Silicon Valley Code Camp 2014 - Advanced MongoDB
Daniel Coupal
 
Agile Data: Building Hadoop Analytics Applications
DataWorks Summit
 
Hardware Provisioning for MongoDB
MongoDB
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Databricks
 
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
TriNimbus
 
Hardware Provisioning
MongoDB
 
MapReduce succinctly
Daniel Jebaraj
 
Agile Data Science: Building Hadoop Analytics Applications
Russell Jurney
 
Demystify Big Data, Data Science & Signal Extraction Deep Dive
Hyderabad Scalability Meetup
 
Coming to cassandra from relational world (New)
Nenad Bozic
 
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
The Hive
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Sonya Liberman
 
WisdomEye Technologies
Ashish Jha
 
WisdomEye Technologies
wisdomeye
 
MongoDB World 2019: Raiders of the Anti-patterns: A Journey Towards Fixing Sc...
MongoDB
 
Cloud computing
Ali reza Khosh zaban
 

More from Norberto Leite (20)

PPTX
Avoid Query Pitfalls
Norberto Leite
 
PPTX
MongoDB and Spark
Norberto Leite
 
PDF
Mongo db 3.4 Overview
Norberto Leite
 
PDF
MongoDB Certification Study Group - May 2016
Norberto Leite
 
PDF
Geospatial and MongoDB
Norberto Leite
 
PDF
MongodB Internals
Norberto Leite
 
PDF
MongoDB WiredTiger Internals
Norberto Leite
 
PDF
MongoDB 3.2 Feature Preview
Norberto Leite
 
PDF
Mongodb Spring
Norberto Leite
 
PDF
MongoDB on Azure
Norberto Leite
 
PDF
MongoDB: Agile Combustion Engine
Norberto Leite
 
PDF
MongoDB Capacity Planning
Norberto Leite
 
PDF
Spark and MongoDB
Norberto Leite
 
PDF
Analyse Yourself
Norberto Leite
 
PDF
Python and MongoDB
Norberto Leite
 
PDF
Strongly Typed Languages and Flexible Schemas
Norberto Leite
 
PDF
Effectively Deploying MongoDB on AEM
Norberto Leite
 
PPTX
Advanced applications with MongoDB
Norberto Leite
 
PDF
MongoDB and Node.js
Norberto Leite
 
PPTX
MongoDB + Spring
Norberto Leite
 
Avoid Query Pitfalls
Norberto Leite
 
MongoDB and Spark
Norberto Leite
 
Mongo db 3.4 Overview
Norberto Leite
 
MongoDB Certification Study Group - May 2016
Norberto Leite
 
Geospatial and MongoDB
Norberto Leite
 
MongodB Internals
Norberto Leite
 
MongoDB WiredTiger Internals
Norberto Leite
 
MongoDB 3.2 Feature Preview
Norberto Leite
 
Mongodb Spring
Norberto Leite
 
MongoDB on Azure
Norberto Leite
 
MongoDB: Agile Combustion Engine
Norberto Leite
 
MongoDB Capacity Planning
Norberto Leite
 
Spark and MongoDB
Norberto Leite
 
Analyse Yourself
Norberto Leite
 
Python and MongoDB
Norberto Leite
 
Strongly Typed Languages and Flexible Schemas
Norberto Leite
 
Effectively Deploying MongoDB on AEM
Norberto Leite
 
Advanced applications with MongoDB
Norberto Leite
 
MongoDB and Node.js
Norberto Leite
 
MongoDB + Spring
Norberto Leite
 
Ad

Recently uploaded (20)

PPTX
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
PPTX
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
PPTX
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
PDF
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
PPTX
Human Resources Information System (HRIS)
Amity University, Patna
 
PDF
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
PDF
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
PPTX
Transforming Mining & Engineering Operations with Odoo ERP | Streamline Proje...
SatishKumar2651
 
PDF
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
PDF
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
PDF
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
PPTX
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
PDF
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
PDF
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
PDF
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
PPTX
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
Generic or Specific? Making sensible software design decisions
Bert Jan Schrijver
 
PPTX
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
PDF
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
Human Resources Information System (HRIS)
Amity University, Patna
 
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
Transforming Mining & Engineering Operations with Odoo ERP | Streamline Proje...
SatishKumar2651
 
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Generic or Specific? Making sensible software design decisions
Bert Jan Schrijver
 
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
Ad

Data Modelling for MongoDB - MongoDB.local Tel Aviv