SlideShare a Scribd company logo
J U N E 1 4 , 2 0 1 8 | T W I N C I T I E S
# O S N 2 0 1 8
Advanced Schema
Design Patterns
# O S N 2 0 1 8
{ “name”: ”Matt Kalan",
“titles”: [ “Master Solution Architect”,
“Enterprise Architect”],
“location” : "Minneapolis, MN",
“yearsAtMDB” : 5.5,
“contactInfo” : {
“email”: : “matt.kalan@mongodb.com”,
“twitter” : ["@MatthewKalan", "@MongoDB"],
“linkedIn” : ["mkalan", "MongoDB"]
}
}
Who Am I?
# O S N 2 0 1 8
• Quick MongoDB overview
• Review of each Schema Design Pattern
• Patterns we couldn’t get to
• Q&A (and throughout)
Agenda
# O S N 2 0 1 8
Quick MongoDB Overview
# O S N 2 0 1 8
Why MongoDB?
Best way to
work with data
Intelligently put data
where you need it
Freedom
to run anywhere
Intelligent Operational Data PlatformIntelligent Operational Data PlatformIntelligent Operational Data PlatformIntelligent Operational Data PlatformIntelligent Operational Data Platform
# O S N 2 0 1 8
Best way to work with data
Easy: Work with data in a natural,
intuitive way
Flexible: Adapt and make
changes quickly
Fast: Get great performance
with less code
Versatile: Supports a wide
variety of data models and
queries
# O S N 2 0 1 8
Easy & Versatile - Rich Query
Functionality MongoDB
Expressive Queries
• Find anyone with phone # “1-212…”
• Check if the person with number “555…” is on the “do not call” list
Geospatial
• Find the best offer for the customer at geo coordinates of 42nd St. and
6th Ave
Text Search • Find all tweets that mention the firm within the last 2 days
Aggregation • Count and sort number of customers by city
Native Binary
JSON support
• Add an additional phone number to Mark Smith’s without rewriting
the document
• Update just 2 phone numbers out of 10
• Sort on the modified date
{ customer_id : 1,
first_name : "Mark",
last_name : "Smith",
city : "San Francisco",
phones: [ {
number : “1-212-777-1212”,
dnc : true,
type : “home”
},
{
number : “1-212-777-1213”,
type : “cell”
}]
}
Joins ($lookup)
• Query for all San Francisco residences, lookup their transactions, and
sum the amount by person
Graph queries
($graphLookup)
• Query for all people within 3 degrees of separation from Mark
# O S N 2 0 1 8
Intelligently put data where you need it
Ability to run both operational &
analytics workloads on same cluster,
for timely insight and lower cost
Workload Isolation
Elastic horizontal scalability -
add/remove capacity dynamically
without downtime
Scalability
Declare data locality rules for
governance (e.g. data sovereignty), tiers of
service & local low latency access
Locality
Built-in multi-geography high
availability, replication & automated
failover
Highly Availability
# O S N 2 0 1 8
Freedom to run anywhere
Local
On-premises
Server & Mainframe Private cloud
Fully managed cloud service
Hybrid cloud Public cloud
● Database that runs the same everywhere
● Leverage the benefits of a multi-cloud strategy
● Global coverage
● Avoid lock-in
Convenience: same codebase, same APIs, same tools, wherever you run
# O S N 2 0 1 8
MongoDB Atlas: Database as a service
mongodb.com/atlas
Self-service and elastic
• Deploy in minutes
• Scale up/down without
downtime
• Automated upgrades
Global and highly available
• 52 Regions worldwide
• Replica sets optimized for
availability
• Cross-region replication
Secure by default
• Network isolation and Peering
• Encryption in flight and at rest
• Role-based access control
• SOC 2 Type 1 / Privacy Shield
Comprehensive Monitoring
• Performance Advisor
• Dashboards w/ 100+ metrics
• Real Time Performance
• Customizable alerting
Managed Backup
• Point in Time Restore
• Queryable backups
• Consistent snapshots
Cloud Agnostic
• AWS, Azure, and GCP
• Easy migrations
• Consistent experience
# O S N 2 0 1 8
MongoDB Compass MongoDB Connector for BI
MongoDB Enterprise Server
Enterprise Advanced for Self-Managed
CommercialLicense
(NoAGPLCopyleftRestrictions)
Platform
Certifications
MongoDB Ops Manager
Monitoring &
Alerting
Query
Optimization
Backup &
Recovery
Automation &
Configuration
Schema Visualization
Data Exploration
Ad-Hoc Queries
Visualization
Analysis
Reporting
LDAP & Kerberos Auditing
In-Memory
Storage Engine
Encryption at Rest
REST APIEmergency
Patches
Customer
Success
Program
On-Demand
Online Training
Warranty
Limitation of
Liability
Indemnification
24x7Support
(1hourSLA)
# O S N 2 0 1 8
Schema Design Patterns
# O S N 2 0 1 8
• 10 years with the document
model
• Use of a common
methodology and
vocabulary when designing
schemas for MongoDB
• Ability to model schemas
using building blocks
• Less art and more
methodology
Why this Talk?
# O S N 2 0 1 8
Ensure:
• Good performance &
scalability
• Fast development
despite constraints
• Hardware
• RAM faster than Disk
• Disk cheaper than RAM
• Network latency
• Reduce costs $$$
• Database Server
• Maximum size for a document
• Atomicity of a write (ACID GA soon)
• Data set
• Size of data
Why do we Create Models?
# O S N 2 0 1 8
However, Don't Over Design!
# O S N 2 0 1 8
World Movie Database (WMDB)
- Logical Data Model
Any events, characters and
entities depicted in this
presentation are fictional.
Any resemblance or similarity to
reality is entirely coincidental
# O S N 2 0 1 8
• Frequency of Access
• Subset ✔️
• Approximation
• Extended Reference
Patterns by Category
• Grouping
• Computed ✔️
• Bucket ✔️
• Outlier
• Representation
• Entity ✔️
• Document Versioning
✔️
• Schema Versioning ✔️
• Mixed Attributes
• Tree
• Polymorphism
# O S N 2 0 1 8
Problem:
• How to get started modeling data in MongoDB, not as a relational
model
• Logical model is spread across tables
• Today’s languages used OOP and JSON
• Hard to use and worse performance spreading across tables
Use cases:
• Most every operational application with modern languages
• Also applicable to analytics environments
Issue #1 – How to Model Data in Documents
# O S N 2 0 1 8
Solution:
• Simply store data in the objects or JSON used in the
application/service
Benefits:
• Faster development
• Faster performance
• Easier to partition and scale
Pattern #1 - Entity
# O S N 2 0 1 8
Logical Model to Documents
Typically map to objects & JSON
3 collections:
A. movies
B. moviegoers
C. screenings
# O S N 2 0 1 8
Moviegoer
{
_id: 1,
...
viewings: [
{m: 100, d: 2016-05-24}
{m: 200, d: 2017-03-18}
],
ratings: [
{m: 100, v: 3, c: “great“}
]
}
3 Main Entities
Movie
{
_id: 100,
name: “Best Movie Ever”,
castAndCrew: [
{fn: “Joe”, ln: Smith, …}
… ],
reviews: [
{d: 2018-05-25, r: “awful”, …}
… ],
quotes: […]
}
Screening
{
_id: 200,
movieId: 100
location: “NYC”,
numViewers: 500,
revenues: 100,000
}
# O S N 2 0 1 8
Possible solutions:
A. Reduce the size of your working set (no extra cost!)
B. Add more RAM per machine
C. Start sharding or add more shards
Issue #2: Working Set Doesn’t Fit in RAM
# O S N 2 0 1 8
In this example, we can:
• Limit the list of actors and
crew to 20
• Limit the embedded reviews
to the top 20
• …
Pattern #2: Subset
# O S N 2 0 1 8
Problem:
• There are 1-N or N-N relationships, and only a few fields or
documents that always need to be shown
• Only infrequently do you need to pull all of the related data
Use cases:
• Main actors of a movie
• List of reviews or comments
Generalizing the Subset Pattern
# O S N 2 0 1 8
Solution:
• Keep duplicates of a small subset of fields in the main collection
Benefits:
• Allows for fast data retrieval and a reduced working set size
• One query brings all the information needed for the "main page"
Subset Pattern - Solution
# O S N 2 0 1 8
• How duplication is handled
A. Update both source and target in real time from application (optional:
Txn)
B. Use Change Streams to subscribe to change and async update the
target
C. Update target from source at regular intervals. Examples:
• Most popular items => update nightly
• Revenues from a movie => update every hour
• Last 10 reviews => update hourly? daily?
Implementation Reality of Patterns:
Consistency
# O S N 2 0 1 8
Change Streams For Sync and Real-Time
Apps
ChangeStreamsAPI
Business
Apps
User Data
Sensors
Clickstream
Real-Time
Event Notifications
Message Queue
Syncing with other
collections/microservices
# O S N 2 0 1 8
• CPU is on fire!
Issue #3: High CPU Usage
# O S N 2 0 1 8
{
title: "The Shape of Water",
...
viewings: 5,000
viewers: 385,000
revenues: 5,074,800
}
Issue #3: ..caused by repeated
calculations
# O S N 2 0 1 8
For example:
• Apply a sum, count, ...
• rollup data by minute, hour,
day
• As long as you don’t mess
with your source, you can
recreate the rollups
Pattern #3: Computed
# O S N 2 0 1 8
Problem:
• There is data that needs to be computed
• The same calculations would happen over and over
• Reads outnumber writes:
• example: 1K writes per hour vs 1M read per hour
Use cases:
• Have revenues per movie showing, want to display sums
• Time series data, Event Sourcing
Computed Pattern
# O S N 2 0 1 8
Solution:
• Apply a computation or operation on data and store the result
Benefits:
• Avoid re-computing the same thing over and over
Computed Pattern - Solution
# O S N 2 0 1 8
• How to quickly change schemas over time with new
requirements?
• How to know what fields are in the results?
Issue #4: Need to change the fields in the
documents
# O S N 2 0 1 8
Problem:
• Updating the schema of a collection or database is:
• Not atomic
• Long operation
• Is not necessary, as there is not one schema as in RDBMSs
• May not want to update all documents, only do it going forward
Use cases:
• Practically any database that will go to production
Schema Versioning Pattern
# O S N 2 0 1 8
Solution:
• Have a field keeping track of the schema version
Benefits:
• Don't need to update all the documents at once
• May not have to update documents until their next modification
Schema Versioning Pattern – Solution
# O S N 2 0 1 8
Add a field to track the
schema version number, per
document
Does not have to exist for
version 1
Always have the option to
loop through and update all
docs but not forced to
Pattern #4: Schema Versioning
# O S N 2 0 1 8
• Updating data in place can be seen as deleting previous version
• Regulated industries often require an audit trail for X years
• Insight can be gleaned from measuring changing data (e.g. claims
processing, code check-ins, etc.)
• Many possible approaches here
Issue #5: Need to track and query current
and previous versions of documents
# O S N 2 0 1 8
Problem:
• Should we track field-level changes or entire documents?
• Consider how to handle consistency requirements during changes
Use cases:
• Most apps storing business transactions
• Any data useful to see over time
Pattern #5: Document Versioning
# O S N 2 0 1 8
Solution:
• Ultimately dependent on the situation
• But 2 main approaches are most common
• Tracking a few updates in one document
• Separate collections for latest and for historical changes
Benefits:
• First option saves on disk space
• Second option gives good performance no matter how many
changes
Document Versioning Pattern – Solution
# O S N 2 0 1 8
Have an array of
previous values that
were changed
Compare-and-swap
(on version) for
thread-safe update
to the document
If Few Changes
Movie
{
_id: 100,
current: {
v: 3, name: “Best Movie Ever”, budget: 450, actual: 450
},
prev: [
{v: 1, name: “OK Movie”, budget: 450},
{v: 2, name: “Good Movie”, actual: 400}
]
}
# O S N 2 0 1 8
Unbounded Numbers of Changes
Current Collection
{
_id: 100,
v: 3,
name: “Best Movie Ever”,
budget: 450,
actualBudget: 450
}
History Collection
{
movieId: 100,
v: 1,
name: “OK Movie”,
budget: 450,
t: Date(“2018-06-01…”)
}
History Collection
{
movieId : 100,
v: 2,
name: “Good Movie”,
budget: 450,
actual: 400,
t: Date(“2018-06-01…”)
}
History Collection
{
movieId : 100,
v: 3,
name: “Best Movie Ever”,
budget: 450,
actual: 450,
t: Date(“2018-06-01…”)
}
# O S N 2 0 1 8
• It is known that a series of items are often read/written together
• E.g. last month’s transactions, 100 device samples, prices for an
hour
• Often would store each item in a separate record in RDBMSs
• With arrays in documents, have the option of storing many items
together
Issue #6: Poor Performance
Reading/Writing a Series of Many Items
# O S N 2 0 1 8
Problem:
• Do we know a series of items will be access together and not
randomly?
• Should we store a document per item, like with RDBMSs?
• How to balance write vs. read performance?
Use cases:
• Transactions: orders, claims, payments, etc.
• Time series: IoT, market data, tweets, reviews, comments, etc.
• Often used for analytics and reporting
Pattern #6: Bucket Pattern
# O S N 2 0 1 8
Solution:
• Store as an array of items in a document (a certain number or
time window)
• Often each item is written by itself, and then rolled into the bucket
asynchronously for high performance reading
• Retainment period can be different for item vs. the bucket
Benefits:
• Reads are many times faster (easily 10x or more)
• Also often saves on disk space as field names are stored less
times
Bucket Pattern – Solution
# O S N 2 0 1 8
• Likely need to
write each
item in case
of app failure
(short
retainment)
• Async write
the buckets
• Might keep
buckets
longer than
raw items
Storing Buckets and Optionally
Each Item
Screening
{
_id: 200,
location: “135 W. 34th St., NYC”,
date: Date(“2018-06-01 5:00PM”),
numViewers: 500,
revenues: 5000
}
ScreeningBucket
{ _id: 2000,
movieId: 100,
metro: “New York”,
day: Date(“2018-06-01”),
numViewers: 50000,
...,
screenings: [
{id: 200, t: “5:00”, v: 500},
{id: 201, t: “7:30”, v: 1500},
]
}
# O S N 2 0 1 8
Lambda Architecture Helps Balance
Reads/Writes App Writes
Data
Async Processing
(change stream or
periodic batch)
Each Item (MongoDB)
Buckets of Items in MongoDB
Queries
Message Queue
And/Or
# O S N 2 0 1 8
Extremely Common with Time Series &
IoT
SensorSample
{
_id: 200,
loc: {
type: “Point”,
coordinates: [-93, 45] },
date: Date(“2018-06-01 5:00PM”),
temp: 54
}
SampleBucket
{ _id: 2000,
loc: {
type: “Point”,
coordinates: [-93, 45] },
startTime: Date(“2018-06-01 5:00PM”),
endTime: Date(“2018-06-01 6:00PM”),
minTemp: 50, maxTemp: 60, ...,
samples: [
{t: Date(“2018-06-01 5:00PM”), v: 51.5},
{t: Date(“2018-06-01 5:01PM”), v: 52},
...
]
}
# O S N 2 0 1 8
What our Patterns did for us
Problem Pattern
How to model data in documents Entity
Using too much RAM Subset
Using too much CPU Computed
No downtime to upgrade schema Schema Versioning
How to track previous versions Document Versioning
How to improve performance of series of
data
Bucket
# O S N 2 0 1 8
• Mixed Attributes* – using key/values in arrays for allow searching on dozens of variable
fields
• Approximation* – reducing frequency of calculations with approximate values
• Extended Reference – detailed data stored in separate collection for lookup on drill down
• Trees – store 1 or multiple levels as one document and/or use $graphLookup to recursively
traverse
• Polymorphism – each document represents an item, but each item can have different fields
(e.g. product catalog)
• Outlier* - avoid having a few documents drive the design, and impact performance for all
* = covered in other presentations on Mongodb.com
Other Patterns
# O S N 2 0 1 8
A. Simple grouping from tables to collections is often not optimal
B. Learn a common vocabulary for designing schemas with MongoDB
C. Use patterns as "plug-and-play" to improve performance
Take Aways
# O S N 2 0 1 8
• Previous webinar I extended covers 3 different patterns
https://www.mongodb.com/presentations/advanced-schema-design-patterns
• MongoDB in-person training courses on Schema Design
• MongoDB University
https://university.mongodb.com
• M001: MongoDB Basics
• (Upcoming) M220: Data Modeling
How Can I Learn More About Schema
Design?
# O S N 2 0 1 8
For More Information About MongoDB
Resource Location
Public Atlas DBaaS mongodb.com/cloud/atlas
Case Studies mongodb.com/customers
Presentations mongodb.com/presentations
Free Online Training university.mongodb.com
Webinars and Events mongodb.com/events
Documentation docs.mongodb.com
MongoDB Downloads mongodb.com/download
# M D B l o c a l
Thank You for using MongoDB !

More Related Content

What's hot (20)

PPTX
Oracle database performance tuning
Yogiji Creations
 
PDF
MongoDB Performance Tuning
Puneet Behl
 
PDF
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
Altinity Ltd
 
PDF
Percona Live 2022 - MySQL Architectures
Frederic Descamps
 
PDF
BlueStore: a new, faster storage backend for Ceph
Sage Weil
 
PDF
[❤PDF❤] Oracle 19c Database Administration Oracle Simplified
ZanderHaney
 
PPTX
Mongodb basics and architecture
Bishal Khanal
 
PDF
ClickHouse Materialized Views: The Magic Continues
Altinity Ltd
 
PPTX
[135] 오픈소스 데이터베이스, 은행 서비스에 첫발을 내밀다.
NAVER D2
 
PDF
MongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDB
MongoDB
 
ODP
Introduction to PostgreSQL
Jim Mlodgenski
 
PDF
[2018] MySQL 이중화 진화기
NHN FORWARD
 
PDF
[pgday.Seoul 2022] 서비스개편시 PostgreSQL 도입기 - 진소린 & 김태정
PgDay.Seoul
 
PDF
MariaDB 10.11 key features overview for DBAs
Federico Razzoli
 
PDF
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
Altinity Ltd
 
PDF
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)
Altinity Ltd
 
PDF
Average Active Sessions - OaktableWorld 2013
John Beresniewicz
 
PDF
Velero search & practice 20210609
KAI CHU CHUNG
 
PDF
Federated Engine 실무적용사례
I Goo Lee
 
Oracle database performance tuning
Yogiji Creations
 
MongoDB Performance Tuning
Puneet Behl
 
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
Altinity Ltd
 
Percona Live 2022 - MySQL Architectures
Frederic Descamps
 
BlueStore: a new, faster storage backend for Ceph
Sage Weil
 
[❤PDF❤] Oracle 19c Database Administration Oracle Simplified
ZanderHaney
 
Mongodb basics and architecture
Bishal Khanal
 
ClickHouse Materialized Views: The Magic Continues
Altinity Ltd
 
[135] 오픈소스 데이터베이스, 은행 서비스에 첫발을 내밀다.
NAVER D2
 
MongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDB
MongoDB
 
Introduction to PostgreSQL
Jim Mlodgenski
 
[2018] MySQL 이중화 진화기
NHN FORWARD
 
[pgday.Seoul 2022] 서비스개편시 PostgreSQL 도입기 - 진소린 & 김태정
PgDay.Seoul
 
MariaDB 10.11 key features overview for DBAs
Federico Razzoli
 
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
Altinity Ltd
 
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)
Altinity Ltd
 
Average Active Sessions - OaktableWorld 2013
John Beresniewicz
 
Velero search & practice 20210609
KAI CHU CHUNG
 
Federated Engine 실무적용사례
I Goo Lee
 

Similar to Open Source North - MongoDB Advanced Schema Design Patterns (20)

PPTX
MongoDb Schema Pattern - Kalpit Pandit.pptx
KalpitPandit1
 
PPTX
Advanced Schema Design Patterns
MongoDB
 
KEY
Modeling Data in MongoDB
lehresman
 
PPTX
MongoDB.local Seattle 2019: Advanced Schema Design Patterns
MongoDB
 
PDF
How to Get Started with Your MongoDB Pilot Project
DATAVERSITY
 
PPTX
MongoDB.local Dallas 2019: Advanced Schema Design Patterns
MongoDB
 
PPTX
An Enterprise Architect's View of MongoDB
MongoDB
 
PDF
Data_Modeling_MongoDB.pdf
jill734733
 
PDF
Mongo db data-models guide
Deysi Gmarra
 
PDF
Mongo db data-models-guide
Dan Llimpe
 
PPTX
Webinar: Scaling MongoDB
MongoDB
 
PPTX
Data Modeling for NoSQL
Tony Tam
 
KEY
MongoDB
Steven Francia
 
ODP
No More SQL
Glenn Street
 
PPT
Building Your First MongoDB App ~ Metadata Catalog
hungarianhc
 
PPTX
Advanced Schema Design Patterns
MongoDB
 
KEY
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
Daniel Cousineau
 
PPTX
Einführung in MongoDB
NETUserGroupBern
 
PDF
Best Practices for Migrating RDBMS to MongoDB
Sheeri Cabral
 
PPTX
MongoDB Evenings Minneapolis: MongoDB is Cool But When Should I Use It?
MongoDB
 
MongoDb Schema Pattern - Kalpit Pandit.pptx
KalpitPandit1
 
Advanced Schema Design Patterns
MongoDB
 
Modeling Data in MongoDB
lehresman
 
MongoDB.local Seattle 2019: Advanced Schema Design Patterns
MongoDB
 
How to Get Started with Your MongoDB Pilot Project
DATAVERSITY
 
MongoDB.local Dallas 2019: Advanced Schema Design Patterns
MongoDB
 
An Enterprise Architect's View of MongoDB
MongoDB
 
Data_Modeling_MongoDB.pdf
jill734733
 
Mongo db data-models guide
Deysi Gmarra
 
Mongo db data-models-guide
Dan Llimpe
 
Webinar: Scaling MongoDB
MongoDB
 
Data Modeling for NoSQL
Tony Tam
 
No More SQL
Glenn Street
 
Building Your First MongoDB App ~ Metadata Catalog
hungarianhc
 
Advanced Schema Design Patterns
MongoDB
 
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
Daniel Cousineau
 
Einführung in MongoDB
NETUserGroupBern
 
Best Practices for Migrating RDBMS to MongoDB
Sheeri Cabral
 
MongoDB Evenings Minneapolis: MongoDB is Cool But When Should I Use It?
MongoDB
 
Ad

Recently uploaded (20)

PDF
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
PDF
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
PDF
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
PDF
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
PPTX
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
PDF
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
PDF
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
PPTX
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
PPTX
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
PDF
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
PDF
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
PPTX
OpenChain @ OSS NA - In From the Cold: Open Source as Part of Mainstream Soft...
Shane Coughlan
 
PPTX
Transforming Mining & Engineering Operations with Odoo ERP | Streamline Proje...
SatishKumar2651
 
PPTX
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
PDF
Generic or Specific? Making sensible software design decisions
Bert Jan Schrijver
 
PPTX
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
PPTX
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
PDF
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
PDF
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
PDF
Driver Easy Pro 6.1.1 Crack Licensce key 2025 FREE
utfefguu
 
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
OpenChain @ OSS NA - In From the Cold: Open Source as Part of Mainstream Soft...
Shane Coughlan
 
Transforming Mining & Engineering Operations with Odoo ERP | Streamline Proje...
SatishKumar2651
 
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
Generic or Specific? Making sensible software design decisions
Bert Jan Schrijver
 
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
Driver Easy Pro 6.1.1 Crack Licensce key 2025 FREE
utfefguu
 
Ad

Open Source North - MongoDB Advanced Schema Design Patterns

  • 1. J U N E 1 4 , 2 0 1 8 | T W I N C I T I E S # O S N 2 0 1 8 Advanced Schema Design Patterns
  • 2. # O S N 2 0 1 8 { “name”: ”Matt Kalan", “titles”: [ “Master Solution Architect”, “Enterprise Architect”], “location” : "Minneapolis, MN", “yearsAtMDB” : 5.5, “contactInfo” : { “email”: : “[email protected]”, “twitter” : ["@MatthewKalan", "@MongoDB"], “linkedIn” : ["mkalan", "MongoDB"] } } Who Am I?
  • 3. # O S N 2 0 1 8 • Quick MongoDB overview • Review of each Schema Design Pattern • Patterns we couldn’t get to • Q&A (and throughout) Agenda
  • 4. # O S N 2 0 1 8 Quick MongoDB Overview
  • 5. # O S N 2 0 1 8 Why MongoDB? Best way to work with data Intelligently put data where you need it Freedom to run anywhere Intelligent Operational Data PlatformIntelligent Operational Data PlatformIntelligent Operational Data PlatformIntelligent Operational Data PlatformIntelligent Operational Data Platform
  • 6. # O S N 2 0 1 8 Best way to work with data Easy: Work with data in a natural, intuitive way Flexible: Adapt and make changes quickly Fast: Get great performance with less code Versatile: Supports a wide variety of data models and queries
  • 7. # O S N 2 0 1 8 Easy & Versatile - Rich Query Functionality MongoDB Expressive Queries • Find anyone with phone # “1-212…” • Check if the person with number “555…” is on the “do not call” list Geospatial • Find the best offer for the customer at geo coordinates of 42nd St. and 6th Ave Text Search • Find all tweets that mention the firm within the last 2 days Aggregation • Count and sort number of customers by city Native Binary JSON support • Add an additional phone number to Mark Smith’s without rewriting the document • Update just 2 phone numbers out of 10 • Sort on the modified date { customer_id : 1, first_name : "Mark", last_name : "Smith", city : "San Francisco", phones: [ { number : “1-212-777-1212”, dnc : true, type : “home” }, { number : “1-212-777-1213”, type : “cell” }] } Joins ($lookup) • Query for all San Francisco residences, lookup their transactions, and sum the amount by person Graph queries ($graphLookup) • Query for all people within 3 degrees of separation from Mark
  • 8. # O S N 2 0 1 8 Intelligently put data where you need it Ability to run both operational & analytics workloads on same cluster, for timely insight and lower cost Workload Isolation Elastic horizontal scalability - add/remove capacity dynamically without downtime Scalability Declare data locality rules for governance (e.g. data sovereignty), tiers of service & local low latency access Locality Built-in multi-geography high availability, replication & automated failover Highly Availability
  • 9. # O S N 2 0 1 8 Freedom to run anywhere Local On-premises Server & Mainframe Private cloud Fully managed cloud service Hybrid cloud Public cloud ● Database that runs the same everywhere ● Leverage the benefits of a multi-cloud strategy ● Global coverage ● Avoid lock-in Convenience: same codebase, same APIs, same tools, wherever you run
  • 10. # O S N 2 0 1 8 MongoDB Atlas: Database as a service mongodb.com/atlas Self-service and elastic • Deploy in minutes • Scale up/down without downtime • Automated upgrades Global and highly available • 52 Regions worldwide • Replica sets optimized for availability • Cross-region replication Secure by default • Network isolation and Peering • Encryption in flight and at rest • Role-based access control • SOC 2 Type 1 / Privacy Shield Comprehensive Monitoring • Performance Advisor • Dashboards w/ 100+ metrics • Real Time Performance • Customizable alerting Managed Backup • Point in Time Restore • Queryable backups • Consistent snapshots Cloud Agnostic • AWS, Azure, and GCP • Easy migrations • Consistent experience
  • 11. # O S N 2 0 1 8 MongoDB Compass MongoDB Connector for BI MongoDB Enterprise Server Enterprise Advanced for Self-Managed CommercialLicense (NoAGPLCopyleftRestrictions) Platform Certifications MongoDB Ops Manager Monitoring & Alerting Query Optimization Backup & Recovery Automation & Configuration Schema Visualization Data Exploration Ad-Hoc Queries Visualization Analysis Reporting LDAP & Kerberos Auditing In-Memory Storage Engine Encryption at Rest REST APIEmergency Patches Customer Success Program On-Demand Online Training Warranty Limitation of Liability Indemnification 24x7Support (1hourSLA)
  • 12. # O S N 2 0 1 8 Schema Design Patterns
  • 13. # O S N 2 0 1 8 • 10 years with the document model • Use of a common methodology and vocabulary when designing schemas for MongoDB • Ability to model schemas using building blocks • Less art and more methodology Why this Talk?
  • 14. # O S N 2 0 1 8 Ensure: • Good performance & scalability • Fast development despite constraints • Hardware • RAM faster than Disk • Disk cheaper than RAM • Network latency • Reduce costs $$$ • Database Server • Maximum size for a document • Atomicity of a write (ACID GA soon) • Data set • Size of data Why do we Create Models?
  • 15. # O S N 2 0 1 8 However, Don't Over Design!
  • 16. # O S N 2 0 1 8 World Movie Database (WMDB) - Logical Data Model Any events, characters and entities depicted in this presentation are fictional. Any resemblance or similarity to reality is entirely coincidental
  • 17. # O S N 2 0 1 8 • Frequency of Access • Subset ✔️ • Approximation • Extended Reference Patterns by Category • Grouping • Computed ✔️ • Bucket ✔️ • Outlier • Representation • Entity ✔️ • Document Versioning ✔️ • Schema Versioning ✔️ • Mixed Attributes • Tree • Polymorphism
  • 18. # O S N 2 0 1 8 Problem: • How to get started modeling data in MongoDB, not as a relational model • Logical model is spread across tables • Today’s languages used OOP and JSON • Hard to use and worse performance spreading across tables Use cases: • Most every operational application with modern languages • Also applicable to analytics environments Issue #1 – How to Model Data in Documents
  • 19. # O S N 2 0 1 8 Solution: • Simply store data in the objects or JSON used in the application/service Benefits: • Faster development • Faster performance • Easier to partition and scale Pattern #1 - Entity
  • 20. # O S N 2 0 1 8 Logical Model to Documents Typically map to objects & JSON 3 collections: A. movies B. moviegoers C. screenings
  • 21. # O S N 2 0 1 8 Moviegoer { _id: 1, ... viewings: [ {m: 100, d: 2016-05-24} {m: 200, d: 2017-03-18} ], ratings: [ {m: 100, v: 3, c: “great“} ] } 3 Main Entities Movie { _id: 100, name: “Best Movie Ever”, castAndCrew: [ {fn: “Joe”, ln: Smith, …} … ], reviews: [ {d: 2018-05-25, r: “awful”, …} … ], quotes: […] } Screening { _id: 200, movieId: 100 location: “NYC”, numViewers: 500, revenues: 100,000 }
  • 22. # O S N 2 0 1 8 Possible solutions: A. Reduce the size of your working set (no extra cost!) B. Add more RAM per machine C. Start sharding or add more shards Issue #2: Working Set Doesn’t Fit in RAM
  • 23. # O S N 2 0 1 8 In this example, we can: • Limit the list of actors and crew to 20 • Limit the embedded reviews to the top 20 • … Pattern #2: Subset
  • 24. # O S N 2 0 1 8 Problem: • There are 1-N or N-N relationships, and only a few fields or documents that always need to be shown • Only infrequently do you need to pull all of the related data Use cases: • Main actors of a movie • List of reviews or comments Generalizing the Subset Pattern
  • 25. # O S N 2 0 1 8 Solution: • Keep duplicates of a small subset of fields in the main collection Benefits: • Allows for fast data retrieval and a reduced working set size • One query brings all the information needed for the "main page" Subset Pattern - Solution
  • 26. # O S N 2 0 1 8 • How duplication is handled A. Update both source and target in real time from application (optional: Txn) B. Use Change Streams to subscribe to change and async update the target C. Update target from source at regular intervals. Examples: • Most popular items => update nightly • Revenues from a movie => update every hour • Last 10 reviews => update hourly? daily? Implementation Reality of Patterns: Consistency
  • 27. # O S N 2 0 1 8 Change Streams For Sync and Real-Time Apps ChangeStreamsAPI Business Apps User Data Sensors Clickstream Real-Time Event Notifications Message Queue Syncing with other collections/microservices
  • 28. # O S N 2 0 1 8 • CPU is on fire! Issue #3: High CPU Usage
  • 29. # O S N 2 0 1 8 { title: "The Shape of Water", ... viewings: 5,000 viewers: 385,000 revenues: 5,074,800 } Issue #3: ..caused by repeated calculations
  • 30. # O S N 2 0 1 8 For example: • Apply a sum, count, ... • rollup data by minute, hour, day • As long as you don’t mess with your source, you can recreate the rollups Pattern #3: Computed
  • 31. # O S N 2 0 1 8 Problem: • There is data that needs to be computed • The same calculations would happen over and over • Reads outnumber writes: • example: 1K writes per hour vs 1M read per hour Use cases: • Have revenues per movie showing, want to display sums • Time series data, Event Sourcing Computed Pattern
  • 32. # O S N 2 0 1 8 Solution: • Apply a computation or operation on data and store the result Benefits: • Avoid re-computing the same thing over and over Computed Pattern - Solution
  • 33. # O S N 2 0 1 8 • How to quickly change schemas over time with new requirements? • How to know what fields are in the results? Issue #4: Need to change the fields in the documents
  • 34. # O S N 2 0 1 8 Problem: • Updating the schema of a collection or database is: • Not atomic • Long operation • Is not necessary, as there is not one schema as in RDBMSs • May not want to update all documents, only do it going forward Use cases: • Practically any database that will go to production Schema Versioning Pattern
  • 35. # O S N 2 0 1 8 Solution: • Have a field keeping track of the schema version Benefits: • Don't need to update all the documents at once • May not have to update documents until their next modification Schema Versioning Pattern – Solution
  • 36. # O S N 2 0 1 8 Add a field to track the schema version number, per document Does not have to exist for version 1 Always have the option to loop through and update all docs but not forced to Pattern #4: Schema Versioning
  • 37. # O S N 2 0 1 8 • Updating data in place can be seen as deleting previous version • Regulated industries often require an audit trail for X years • Insight can be gleaned from measuring changing data (e.g. claims processing, code check-ins, etc.) • Many possible approaches here Issue #5: Need to track and query current and previous versions of documents
  • 38. # O S N 2 0 1 8 Problem: • Should we track field-level changes or entire documents? • Consider how to handle consistency requirements during changes Use cases: • Most apps storing business transactions • Any data useful to see over time Pattern #5: Document Versioning
  • 39. # O S N 2 0 1 8 Solution: • Ultimately dependent on the situation • But 2 main approaches are most common • Tracking a few updates in one document • Separate collections for latest and for historical changes Benefits: • First option saves on disk space • Second option gives good performance no matter how many changes Document Versioning Pattern – Solution
  • 40. # O S N 2 0 1 8 Have an array of previous values that were changed Compare-and-swap (on version) for thread-safe update to the document If Few Changes Movie { _id: 100, current: { v: 3, name: “Best Movie Ever”, budget: 450, actual: 450 }, prev: [ {v: 1, name: “OK Movie”, budget: 450}, {v: 2, name: “Good Movie”, actual: 400} ] }
  • 41. # O S N 2 0 1 8 Unbounded Numbers of Changes Current Collection { _id: 100, v: 3, name: “Best Movie Ever”, budget: 450, actualBudget: 450 } History Collection { movieId: 100, v: 1, name: “OK Movie”, budget: 450, t: Date(“2018-06-01…”) } History Collection { movieId : 100, v: 2, name: “Good Movie”, budget: 450, actual: 400, t: Date(“2018-06-01…”) } History Collection { movieId : 100, v: 3, name: “Best Movie Ever”, budget: 450, actual: 450, t: Date(“2018-06-01…”) }
  • 42. # O S N 2 0 1 8 • It is known that a series of items are often read/written together • E.g. last month’s transactions, 100 device samples, prices for an hour • Often would store each item in a separate record in RDBMSs • With arrays in documents, have the option of storing many items together Issue #6: Poor Performance Reading/Writing a Series of Many Items
  • 43. # O S N 2 0 1 8 Problem: • Do we know a series of items will be access together and not randomly? • Should we store a document per item, like with RDBMSs? • How to balance write vs. read performance? Use cases: • Transactions: orders, claims, payments, etc. • Time series: IoT, market data, tweets, reviews, comments, etc. • Often used for analytics and reporting Pattern #6: Bucket Pattern
  • 44. # O S N 2 0 1 8 Solution: • Store as an array of items in a document (a certain number or time window) • Often each item is written by itself, and then rolled into the bucket asynchronously for high performance reading • Retainment period can be different for item vs. the bucket Benefits: • Reads are many times faster (easily 10x or more) • Also often saves on disk space as field names are stored less times Bucket Pattern – Solution
  • 45. # O S N 2 0 1 8 • Likely need to write each item in case of app failure (short retainment) • Async write the buckets • Might keep buckets longer than raw items Storing Buckets and Optionally Each Item Screening { _id: 200, location: “135 W. 34th St., NYC”, date: Date(“2018-06-01 5:00PM”), numViewers: 500, revenues: 5000 } ScreeningBucket { _id: 2000, movieId: 100, metro: “New York”, day: Date(“2018-06-01”), numViewers: 50000, ..., screenings: [ {id: 200, t: “5:00”, v: 500}, {id: 201, t: “7:30”, v: 1500}, ] }
  • 46. # O S N 2 0 1 8 Lambda Architecture Helps Balance Reads/Writes App Writes Data Async Processing (change stream or periodic batch) Each Item (MongoDB) Buckets of Items in MongoDB Queries Message Queue And/Or
  • 47. # O S N 2 0 1 8 Extremely Common with Time Series & IoT SensorSample { _id: 200, loc: { type: “Point”, coordinates: [-93, 45] }, date: Date(“2018-06-01 5:00PM”), temp: 54 } SampleBucket { _id: 2000, loc: { type: “Point”, coordinates: [-93, 45] }, startTime: Date(“2018-06-01 5:00PM”), endTime: Date(“2018-06-01 6:00PM”), minTemp: 50, maxTemp: 60, ..., samples: [ {t: Date(“2018-06-01 5:00PM”), v: 51.5}, {t: Date(“2018-06-01 5:01PM”), v: 52}, ... ] }
  • 48. # O S N 2 0 1 8 What our Patterns did for us Problem Pattern How to model data in documents Entity Using too much RAM Subset Using too much CPU Computed No downtime to upgrade schema Schema Versioning How to track previous versions Document Versioning How to improve performance of series of data Bucket
  • 49. # O S N 2 0 1 8 • Mixed Attributes* – using key/values in arrays for allow searching on dozens of variable fields • Approximation* – reducing frequency of calculations with approximate values • Extended Reference – detailed data stored in separate collection for lookup on drill down • Trees – store 1 or multiple levels as one document and/or use $graphLookup to recursively traverse • Polymorphism – each document represents an item, but each item can have different fields (e.g. product catalog) • Outlier* - avoid having a few documents drive the design, and impact performance for all * = covered in other presentations on Mongodb.com Other Patterns
  • 50. # O S N 2 0 1 8 A. Simple grouping from tables to collections is often not optimal B. Learn a common vocabulary for designing schemas with MongoDB C. Use patterns as "plug-and-play" to improve performance Take Aways
  • 51. # O S N 2 0 1 8 • Previous webinar I extended covers 3 different patterns https://www.mongodb.com/presentations/advanced-schema-design-patterns • MongoDB in-person training courses on Schema Design • MongoDB University https://university.mongodb.com • M001: MongoDB Basics • (Upcoming) M220: Data Modeling How Can I Learn More About Schema Design?
  • 52. # O S N 2 0 1 8 For More Information About MongoDB Resource Location Public Atlas DBaaS mongodb.com/cloud/atlas Case Studies mongodb.com/customers Presentations mongodb.com/presentations Free Online Training university.mongodb.com Webinars and Events mongodb.com/events Documentation docs.mongodb.com MongoDB Downloads mongodb.com/download
  • 53. # M D B l o c a l Thank You for using MongoDB !