MongoDB.local Sydney 2019: Data Modeling for MongoDB

Data Modelling for MongoDB
Daniel Coupal, Curriculum Team, MongoDB
March 14th, 2019
Sydney, Australia

Daniel Coupal
Curriculum Team, MongoDB
daniel.coupal@mongodb.com
Palo Alto, CA, USA
github.com/dcoupal

Goals of the Presentation
Recognize the
differences when
modelling for a
Document Database
versus a Relational
Database
Summarize the steps
of a methodology
when modelling for
MongoDB
Recognize the need
and when to apply
Schema Design
Patterns

Differences when Modelling for
a Document Database versus a
Relational Database

Thinking in Documents
1. Polymorphism
• different documents may contain
different fields
2. Array
• represent a "one-to-many" relation
• index is on all entries
3. Sub Document
• grouping some fields together
4. JSON/BSON
• documents are often shown as
JSON
• BSON is the physical format

… 5 tabes become 1 or 2
collections

Example: Modelling a Social
Network

Relationnel MongoDB
Steps to create the model 1 – define schema
2 – develop app and queries
1 – identifying the queries
2 – define schema
Initial schema 3rd normal form
One solution
many solutions possible
Final schema likely denormalized few changes
Schema evolution difficult and not optimal
Likely downtime
easy and no downtime
Performance mediocre optimized
Differences: Relational/Tabular vs
Document

Other Considerations for the
Model1. one-to-many relationships where "many" is a humongous number
2. Embed or Reference
• Joins via $lookup
• Transactions for multi document writes
3. Transactions available for Replica set, and soon for Sharded Clusters
4. Sharding Key
5. Indexes
6. Simple queries, or more complex ones with the Aggregation Framework

Flexible Modelling Methodology for
MongoDB

Methodology
1. Describe the Workload

Methodology
2. Identify and Model
the Relationships

Methodology
2. Identify and Model
the Relationships
3. Apply Patterns

Case Study: Cuppa Coffee
A. Business: coffee shop franchises
B. Name: Cuppa Coffee
also considered: Coffee Mate, Crocodile Coffee
C. Objective:
• 10 000 stores in Australia, New Zealand and South Asia
• … then we invade America
D. Keys to success:
• Best coffee in the world
• Technology

Make the Best Coffee in the World
23g of ground coffee in, 20g of extracted
coffee out, in approximately 20 seconds
1. Fill a small or regular cup with 80% hot
water (not boiling but pretty hot). Your
cup should be 150ml to 200ml in total
volume, 80% of which will be hot water.
2. Grind 23g of coffee into your portafilter
using the double basket. We use a scale
that you can get here.
3. Draw 20g of coffee over the hot water by
placing your cup on a scale, press tare
and extract your shot.

Technology
1. Measure inventory in real time
• Shelves with scales
2. Big Data collection on cups of coffee
• weighings, temperature, time to produce, …
3. Data Analysis
• Coffee perfection
• Rush hours -> staffing needs
4. MongoDB

1 – Workload: List Queries
Query Operation Description
1. Coffee weight on the
shelves
write A shelf send information when coffee bags are
added or removed
2. Coffee to deliver to stores read How much coffee do we have to ship to the store in
the next days
3. Anomalies in the inventory read Analytics
4. Making a cup of coffee write A coffee machine reporting on the production of a
coffee cup
5. Analysis of cups of coffee read Analytics
6. Technical Support read Helping our franchisees

Query Quantification Qualification
1. Coffee weight on the shelves 10/day*shelf*store
=> 1/sec
<1s
critical write
2. Coffee to deliver to stores 1/day*store
=> 0.1/sec
<60s
3. Anomalies in the inventory 24 reads/day <5mins
"collection scan"
4. Making a cup of coffee 10 000 000 writes/day
115 writes/sec
<100ms
non-critical write
… cups of coffee at rush hour 3 000 000 writes/hr
833 writes/sec
<100ms
non-critical write
5. Analysis of cups of coffee 24 reads/day stale data is fine
"collection scan"
6. Technical Support 1000 reads/day <1s
1 – Workload: quantify/qualify the
queries

1 – Workload: quantify/qualify the
queriesQuery Quantification Qualification
1. Coffee weight on the shelves 10/day*shelf*store
=> 1/sec
<1s
critical write
2. Coffee to deliver to stores 1/day*store
=> 0.1/sec
<60s
3. Anomalies in the inventory 24 reads/day <5mins
"collection scan"
4. Making a cup of coffee 10 000 000 writes/day
115 writes/sec
<100ms
non-critical write
… cups of coffee at rush hour 3 000 000 writes/hr
833 writes/sec
<100ms
non-critical write
5. Analysis of cups of coffee 24 reads/day stale data is fine
"collection scan"
6. Technical Support 1000 reads/day <1s

Disk Space
Cups of coffee (one year of data)
• 10000 x 1000/day x 365
• 3.7 billions/year
• 370 GB (100 bytes/cup of coffee)
Weighings
• 10000 x 10/day x 365
• 365 billions/year
• 3.7 GB (100 bytes/weighings)

2 - Relations are still important
Type of Relation -> one-to-one/1-1 one-to-many/1-N many-to-many/N-N
Document embedded
in the parent
document
• one read
• no joins
• one read
• no joins
• one read
• no joins
• duplication of
information
Document referenced
in the parent
document
• smaller reads
• many reads
• smaller reads
• many reads
• smaller reads
• many reads

2 - Entities for Cuppa Café
- Coffee cups
- Stores
- Coffee machines
- Shelves
- Weighings
- Coffee bags

Schema Design Patterns
RessourcesA. Advanced Schema Design
Patterns
• MongoDB World 2017
• Webinar
B. MongoDB University
• university.mongodb.com
• M320 – Data Modeling (2019)
C. Blogs on Schema Design
Patterns
D. Appendix to this presentation
• Schema Versioning Pattern
• Computed Pattern

Bucket Pattern
{
"device_id": 000123456,
"type": "2A",
"date": ISODate("2018-03-02"),
"temp": [ [ 20.0, 20.1, 20.2, ... ],
[ 22.1, 22.1, 22.0, ... ],
...
]
}
{
"device_id": 000123456,
"type": "2A",
"date": ISODate("2018-03-03"),
"temp": [ [ 20.1, 20.2, 20.3, ... ],
[ 22.4, 22.4, 22.3, ... ],
...
]
}
{
"device_id": 000123456,
"type": "2A",
"date": ISODate("2018-03-02T13"),
"temp": { 1: 20.0, 2: 20.1, 3: 20.2, ... }
}
{
"device_id": 000123456,
"type": "2A",
"date": ISODate("2018-03-02T14"),
"temp": { 1: 22.1, 2: 22.1, 3: 22.0, ... }
}
Bucket per
Day
Bucket per
Hour

Cuppa Coffee - Solution with
Patterns• Schema Versioning
• Subset
• Computed
• Bucket
• External Reference

Takeaways from the Presentation
Recognize the
differences when
modelling for a
Document Database
versus a Relational
Database

Recognize the
differences when
modelling for a
Document Database
versus a Relational
Database
Summarize the steps
of a methodology
when modelling for
MongoDB
• Workload
• Relationships
• Patterns

Recognize the
differences when
modelling for a
Document Database
versus a Relational
Database
Summarize the steps
of a methodology
when modelling for
MongoDB
• Workload
• Relationships
• Patterns
Recognize the need
and when to apply
Schema Design
Patterns

Coming Soon …
• "Data Modelling" course at:
university.mongodb.com

Appendix A
Schema Versioning Pattern

This is what your dreams should be
when
thinking about a schema upgrade !

Schema Revision
Relational MongoDB
Versioned Unit Schema Document
Migration Procedure Difficult Easy
Service Uptime Interrupted No interruption
Rollback Difficult to nightmare-ish Easy

Application Lifecycle
Modify Application
• Can read/process all versions of documents
• Have different handler per version
• Reshape the document before processing it
Update all Application servers
• Install updated application
• Remove old processes
Once migration completed
• remove the code to process old versions.

Document Lifecycle
New Documents:
• Application writes them in latest version
Existing Documents
A) Use updates to documents
• to transform to latest version
• keep forever documents that
never need an update
B) or transform all documents in
batch
• no worry even if process takes
days

Problem Solution
Use Cases Examples Benefits and Trade-Offs
Schema Versioning Pattern
● Avoid downtime while doing schema
upgrades
● Upgrading all documents can take hours,
days or even weeks when dealing with
big data
● Don't want to update all documents
✅ No downtime needed
✅ Feel in control of the migration
✅ Less future technical debt
� May need 2 indexes for same field while
in migration period
● Each document gets a "schema_version"
field
● Application can handle all versions
● Choose your strategy to migrate the
documents
● Every application that use a database,
deployed in production and heavily used.
● System with a lot of legacy data

Problem Solution
Use Cases Examples Benefits and Trade-Offs
Computed Pattern
● Costly computation or manipulation of
data
● Executed frequently on the same data,
producing the same result
✅ Read queries are faster
✅ Saving on resources like CPU and Disk
� May be difficult to identify the need
� Avoid applying or overusing it unless
needed
● Perform the operation and store the result
in the appropriate document and
collection
● If need to redo the operations, keep the
source of them
● Internet Of Things (IOT)
● Event Sourcing
● Time Series Data
● Frequent Aggregation Framework queries

MongoDB.local Sydney 2019: Data Modeling for MongoDB

More Related Content

What's hot (20)

Similar to MongoDB.local Sydney 2019: Data Modeling for MongoDB (20)

More from MongoDB (20)

Recently uploaded (20)

MongoDB.local Sydney 2019: Data Modeling for MongoDB

Editor's Notes