Open Source North - MongoDB Advanced Schema Design Patterns

J U N E 1 4 , 2 0 1 8 | T W I N C I T I E S
# O S N 2 0 1 8
Advanced Schema
Design Patterns

# O S N 2 0 1 8
{ “name”: ”Matt Kalan",
“titles”: [ “Master Solution Architect”,
“Enterprise Architect”],
“location” : "Minneapolis, MN",
“yearsAtMDB” : 5.5,
“contactInfo” : {
“email”: : “matt.kalan@mongodb.com”,
“twitter” : ["@MatthewKalan", "@MongoDB"],
“linkedIn” : ["mkalan", "MongoDB"]
}
}
Who Am I?

# O S N 2 0 1 8
• Quick MongoDB overview
• Review of each Schema Design Pattern
• Patterns we couldn’t get to
• Q&A (and throughout)
Agenda

# O S N 2 0 1 8
Quick MongoDB Overview

# O S N 2 0 1 8
Why MongoDB?
Best way to
work with data
Intelligently put data
where you need it
Freedom
to run anywhere
Intelligent Operational Data PlatformIntelligent Operational Data PlatformIntelligent Operational Data PlatformIntelligent Operational Data PlatformIntelligent Operational Data Platform

# O S N 2 0 1 8
Best way to work with data
Easy: Work with data in a natural,
intuitive way
Flexible: Adapt and make
changes quickly
Fast: Get great performance
with less code
Versatile: Supports a wide
variety of data models and
queries

# O S N 2 0 1 8
Easy & Versatile - Rich Query
Functionality MongoDB
Expressive Queries
• Find anyone with phone # “1-212…”
• Check if the person with number “555…” is on the “do not call” list
Geospatial
• Find the best offer for the customer at geo coordinates of 42nd St. and
6th Ave
Text Search • Find all tweets that mention the firm within the last 2 days
Aggregation • Count and sort number of customers by city
Native Binary
JSON support
• Add an additional phone number to Mark Smith’s without rewriting
the document
• Update just 2 phone numbers out of 10
• Sort on the modified date
{ customer_id : 1,
first_name : "Mark",
last_name : "Smith",
city : "San Francisco",
phones: [ {
number : “1-212-777-1212”,
dnc : true,
type : “home”
},
{
number : “1-212-777-1213”,
type : “cell”
}]
}
Joins ($lookup)
• Query for all San Francisco residences, lookup their transactions, and
sum the amount by person
Graph queries
($graphLookup)
• Query for all people within 3 degrees of separation from Mark

# O S N 2 0 1 8
Intelligently put data where you need it
Ability to run both operational &
analytics workloads on same cluster,
for timely insight and lower cost
Workload Isolation
Elastic horizontal scalability -
add/remove capacity dynamically
without downtime
Scalability
Declare data locality rules for
governance (e.g. data sovereignty), tiers of
service & local low latency access
Locality
Built-in multi-geography high
availability, replication & automated
failover
Highly Availability

# O S N 2 0 1 8
Freedom to run anywhere
Local
On-premises
Server & Mainframe Private cloud
Fully managed cloud service
Hybrid cloud Public cloud
● Database that runs the same everywhere
● Leverage the benefits of a multi-cloud strategy
● Global coverage
● Avoid lock-in
Convenience: same codebase, same APIs, same tools, wherever you run

# O S N 2 0 1 8
MongoDB Atlas: Database as a service
mongodb.com/atlas
Self-service and elastic
• Deploy in minutes
• Scale up/down without
downtime
• Automated upgrades
Global and highly available
• 52 Regions worldwide
• Replica sets optimized for
availability
• Cross-region replication
Secure by default
• Network isolation and Peering
• Encryption in flight and at rest
• Role-based access control
• SOC 2 Type 1 / Privacy Shield
Comprehensive Monitoring
• Performance Advisor
• Dashboards w/ 100+ metrics
• Real Time Performance
• Customizable alerting
Managed Backup
• Point in Time Restore
• Queryable backups
• Consistent snapshots
Cloud Agnostic
• AWS, Azure, and GCP
• Easy migrations
• Consistent experience

# O S N 2 0 1 8
MongoDB Compass MongoDB Connector for BI
MongoDB Enterprise Server
Enterprise Advanced for Self-Managed
CommercialLicense
(NoAGPLCopyleftRestrictions)
Platform
Certifications
MongoDB Ops Manager
Monitoring &
Alerting
Query
Optimization
Backup &
Recovery
Automation &
Configuration
Schema Visualization
Data Exploration
Ad-Hoc Queries
Visualization
Analysis
Reporting
LDAP & Kerberos Auditing
In-Memory
Storage Engine
Encryption at Rest
REST APIEmergency
Patches
Customer
Success
Program
On-Demand
Online Training
Warranty
Limitation of
Liability
Indemnification
24x7Support
(1hourSLA)

# O S N 2 0 1 8
Schema Design Patterns

# O S N 2 0 1 8
• 10 years with the document
model
• Use of a common
methodology and
vocabulary when designing
schemas for MongoDB
• Ability to model schemas
using building blocks
• Less art and more
methodology
Why this Talk?

# O S N 2 0 1 8
Ensure:
• Good performance &
scalability
• Fast development
despite constraints
• Hardware
• RAM faster than Disk
• Disk cheaper than RAM
• Network latency
• Reduce costs $$$
• Database Server
• Maximum size for a document
• Atomicity of a write (ACID GA soon)
• Data set
• Size of data
Why do we Create Models?

# O S N 2 0 1 8
However, Don't Over Design!

# O S N 2 0 1 8
World Movie Database (WMDB)
- Logical Data Model
Any events, characters and
entities depicted in this
presentation are fictional.
Any resemblance or similarity to
reality is entirely coincidental

# O S N 2 0 1 8
• Frequency of Access
• Subset ✔️
• Approximation
• Extended Reference
Patterns by Category
• Grouping
• Computed ✔️
• Bucket ✔️
• Outlier
• Representation
• Entity ✔️
• Document Versioning
✔️
• Schema Versioning ✔️
• Mixed Attributes
• Tree
• Polymorphism

# O S N 2 0 1 8
Problem:
• How to get started modeling data in MongoDB, not as a relational
model
• Logical model is spread across tables
• Today’s languages used OOP and JSON
• Hard to use and worse performance spreading across tables
Use cases:
• Most every operational application with modern languages
• Also applicable to analytics environments
Issue #1 – How to Model Data in Documents

# O S N 2 0 1 8
Solution:
• Simply store data in the objects or JSON used in the
application/service
Benefits:
• Faster development
• Faster performance
• Easier to partition and scale
Pattern #1 - Entity

# O S N 2 0 1 8
Logical Model to Documents
Typically map to objects & JSON
3 collections:
A. movies
B. moviegoers
C. screenings

# O S N 2 0 1 8
Moviegoer
{
_id: 1,
...
viewings: [
{m: 100, d: 2016-05-24}
{m: 200, d: 2017-03-18}
],
ratings: [
{m: 100, v: 3, c: “great“}
]
}
3 Main Entities
Movie
{
_id: 100,
name: “Best Movie Ever”,
castAndCrew: [
{fn: “Joe”, ln: Smith, …}
… ],
reviews: [
{d: 2018-05-25, r: “awful”, …}
… ],
quotes: […]
}
Screening
{
_id: 200,
movieId: 100
location: “NYC”,
numViewers: 500,
revenues: 100,000
}

# O S N 2 0 1 8
Possible solutions:
A. Reduce the size of your working set (no extra cost!)
B. Add more RAM per machine
C. Start sharding or add more shards
Issue #2: Working Set Doesn’t Fit in RAM

# O S N 2 0 1 8
In this example, we can:
• Limit the list of actors and
crew to 20
• Limit the embedded reviews
to the top 20
• …
Pattern #2: Subset

# O S N 2 0 1 8
Problem:
• There are 1-N or N-N relationships, and only a few fields or
documents that always need to be shown
• Only infrequently do you need to pull all of the related data
Use cases:
• Main actors of a movie
• List of reviews or comments
Generalizing the Subset Pattern

# O S N 2 0 1 8
Solution:
• Keep duplicates of a small subset of fields in the main collection
Benefits:
• Allows for fast data retrieval and a reduced working set size
• One query brings all the information needed for the "main page"
Subset Pattern - Solution

# O S N 2 0 1 8
• How duplication is handled
A. Update both source and target in real time from application (optional:
Txn)
B. Use Change Streams to subscribe to change and async update the
target
C. Update target from source at regular intervals. Examples:
• Most popular items => update nightly
• Revenues from a movie => update every hour
• Last 10 reviews => update hourly? daily?
Implementation Reality of Patterns:
Consistency

# O S N 2 0 1 8
Change Streams For Sync and Real-Time
Apps
ChangeStreamsAPI
Business
Apps
User Data
Sensors
Clickstream
Real-Time
Event Notifications
Message Queue
Syncing with other
collections/microservices

# O S N 2 0 1 8
• CPU is on fire!
Issue #3: High CPU Usage

# O S N 2 0 1 8
{
title: "The Shape of Water",
...
viewings: 5,000
viewers: 385,000
revenues: 5,074,800
}
Issue #3: ..caused by repeated
calculations

# O S N 2 0 1 8
For example:
• Apply a sum, count, ...
• rollup data by minute, hour,
day
• As long as you don’t mess
with your source, you can
recreate the rollups
Pattern #3: Computed

# O S N 2 0 1 8
Problem:
• There is data that needs to be computed
• The same calculations would happen over and over
• Reads outnumber writes:
• example: 1K writes per hour vs 1M read per hour
Use cases:
• Have revenues per movie showing, want to display sums
• Time series data, Event Sourcing
Computed Pattern

# O S N 2 0 1 8
Solution:
• Apply a computation or operation on data and store the result
Benefits:
• Avoid re-computing the same thing over and over
Computed Pattern - Solution

# O S N 2 0 1 8
• How to quickly change schemas over time with new
requirements?
• How to know what fields are in the results?
Issue #4: Need to change the fields in the
documents

# O S N 2 0 1 8
Problem:
• Updating the schema of a collection or database is:
• Not atomic
• Long operation
• Is not necessary, as there is not one schema as in RDBMSs
• May not want to update all documents, only do it going forward
Use cases:
• Practically any database that will go to production
Schema Versioning Pattern

# O S N 2 0 1 8
Solution:
• Have a field keeping track of the schema version
Benefits:
• Don't need to update all the documents at once
• May not have to update documents until their next modification
Schema Versioning Pattern – Solution

# O S N 2 0 1 8
Add a field to track the
schema version number, per
document
Does not have to exist for
version 1
Always have the option to
loop through and update all
docs but not forced to
Pattern #4: Schema Versioning

# O S N 2 0 1 8
• Updating data in place can be seen as deleting previous version
• Regulated industries often require an audit trail for X years
• Insight can be gleaned from measuring changing data (e.g. claims
processing, code check-ins, etc.)
• Many possible approaches here
Issue #5: Need to track and query current
and previous versions of documents

# O S N 2 0 1 8
Problem:
• Should we track field-level changes or entire documents?
• Consider how to handle consistency requirements during changes
Use cases:
• Most apps storing business transactions
• Any data useful to see over time
Pattern #5: Document Versioning

# O S N 2 0 1 8
Solution:
• Ultimately dependent on the situation
• But 2 main approaches are most common
• Tracking a few updates in one document
• Separate collections for latest and for historical changes
Benefits:
• First option saves on disk space
• Second option gives good performance no matter how many
changes
Document Versioning Pattern – Solution

# O S N 2 0 1 8
Have an array of
previous values that
were changed
Compare-and-swap
(on version) for
thread-safe update
to the document
If Few Changes
Movie
{
_id: 100,
current: {
v: 3, name: “Best Movie Ever”, budget: 450, actual: 450
},
prev: [
{v: 1, name: “OK Movie”, budget: 450},
{v: 2, name: “Good Movie”, actual: 400}
]
}

# O S N 2 0 1 8
Unbounded Numbers of Changes
Current Collection
{
_id: 100,
v: 3,
budget: 450,
actualBudget: 450
}
History Collection
{
movieId: 100,
v: 1,
name: “OK Movie”,
budget: 450,
t: Date(“2018-06-01…”)
}
History Collection
{
movieId : 100,
v: 2,
name: “Good Movie”,
budget: 450,
actual: 400,
t: Date(“2018-06-01…”)
}
History Collection
{
movieId : 100,
v: 3,
budget: 450,
actual: 450,
t: Date(“2018-06-01…”)
}

# O S N 2 0 1 8
• It is known that a series of items are often read/written together
• E.g. last month’s transactions, 100 device samples, prices for an
hour
• Often would store each item in a separate record in RDBMSs
• With arrays in documents, have the option of storing many items
together
Issue #6: Poor Performance
Reading/Writing a Series of Many Items

# O S N 2 0 1 8
Problem:
• Do we know a series of items will be access together and not
randomly?
• Should we store a document per item, like with RDBMSs?
• How to balance write vs. read performance?
Use cases:
• Transactions: orders, claims, payments, etc.
• Time series: IoT, market data, tweets, reviews, comments, etc.
• Often used for analytics and reporting
Pattern #6: Bucket Pattern

# O S N 2 0 1 8
Solution:
• Store as an array of items in a document (a certain number or
time window)
• Often each item is written by itself, and then rolled into the bucket
asynchronously for high performance reading
• Retainment period can be different for item vs. the bucket
Benefits:
• Reads are many times faster (easily 10x or more)
• Also often saves on disk space as field names are stored less
times
Bucket Pattern – Solution

# O S N 2 0 1 8
• Likely need to
write each
item in case
of app failure
(short
retainment)
• Async write
the buckets
• Might keep
buckets
longer than
raw items
Storing Buckets and Optionally
Each Item
Screening
{
_id: 200,
location: “135 W. 34th St., NYC”,
date: Date(“2018-06-01 5:00PM”),
numViewers: 500,
revenues: 5000
}
ScreeningBucket
{ _id: 2000,
movieId: 100,
metro: “New York”,
day: Date(“2018-06-01”),
numViewers: 50000,
...,
screenings: [
{id: 200, t: “5:00”, v: 500},
{id: 201, t: “7:30”, v: 1500},
]
}

# O S N 2 0 1 8
Lambda Architecture Helps Balance
Reads/Writes App Writes
Data
Async Processing
(change stream or
periodic batch)
Each Item (MongoDB)
Buckets of Items in MongoDB
Queries
Message Queue
And/Or

# O S N 2 0 1 8
Extremely Common with Time Series &
IoT
SensorSample
{
_id: 200,
loc: {
type: “Point”,
coordinates: [-93, 45] },
date: Date(“2018-06-01 5:00PM”),
temp: 54
}
SampleBucket
{ _id: 2000,
loc: {
type: “Point”,
coordinates: [-93, 45] },
startTime: Date(“2018-06-01 5:00PM”),
endTime: Date(“2018-06-01 6:00PM”),
minTemp: 50, maxTemp: 60, ...,
samples: [
{t: Date(“2018-06-01 5:00PM”), v: 51.5},
{t: Date(“2018-06-01 5:01PM”), v: 52},
...
]
}

# O S N 2 0 1 8
What our Patterns did for us
Problem Pattern
How to model data in documents Entity
Using too much RAM Subset
Using too much CPU Computed
No downtime to upgrade schema Schema Versioning
How to track previous versions Document Versioning
How to improve performance of series of
data
Bucket

# O S N 2 0 1 8
• Mixed Attributes* – using key/values in arrays for allow searching on dozens of variable
fields
• Approximation* – reducing frequency of calculations with approximate values
• Extended Reference – detailed data stored in separate collection for lookup on drill down
• Trees – store 1 or multiple levels as one document and/or use $graphLookup to recursively
traverse
• Polymorphism – each document represents an item, but each item can have different fields
(e.g. product catalog)
• Outlier* - avoid having a few documents drive the design, and impact performance for all
* = covered in other presentations on Mongodb.com
Other Patterns

# O S N 2 0 1 8
A. Simple grouping from tables to collections is often not optimal
B. Learn a common vocabulary for designing schemas with MongoDB
C. Use patterns as "plug-and-play" to improve performance
Take Aways

# O S N 2 0 1 8
• Previous webinar I extended covers 3 different patterns
https://www.mongodb.com/presentations/advanced-schema-design-patterns
• MongoDB in-person training courses on Schema Design
• MongoDB University
https://university.mongodb.com
• M001: MongoDB Basics
• (Upcoming) M220: Data Modeling
How Can I Learn More About Schema
Design?

# O S N 2 0 1 8
For More Information About MongoDB
Resource Location
Public Atlas DBaaS mongodb.com/cloud/atlas
Case Studies mongodb.com/customers
Presentations mongodb.com/presentations
Free Online Training university.mongodb.com
Webinars and Events mongodb.com/events
Documentation docs.mongodb.com
MongoDB Downloads mongodb.com/download

# M D B l o c a l
Thank You for using MongoDB !

Open Source North - MongoDB Advanced Schema Design Patterns

More Related Content

What's hot (20)

Similar to Open Source North - MongoDB Advanced Schema Design Patterns (20)

Recently uploaded (20)

Open Source North - MongoDB Advanced Schema Design Patterns