SlideShare a Scribd company logo
Data Modelling for MongoDB
Daniel Coupal, Curriculum Team, MongoDB
March 14th, 2019
Sydney, Australia
Daniel Coupal
Curriculum Team, MongoDB
daniel.coupal@mongodb.com
Palo Alto, CA, USA
github.com/dcoupal
Goals of the Presentation
Recognize the
differences when
modelling for a
Document Database
versus a Relational
Database
Summarize the steps
of a methodology
when modelling for
MongoDB
Recognize the need
and when to apply
Schema Design
Patterns
Goals of the Presentation
Recognize the
differences when
modelling for a
Document Database
versus a Relational
Database
Summarize the steps
of a methodology
when modelling for
MongoDB
Recognize the need
and when to apply
Schema Design
Patterns
Goals of the Presentation
Recognize the
differences when
modelling for a
Document Database
versus a Relational
Database
Summarize the steps
of a methodology
when modelling for
MongoDB
Recognize the need
and when to apply
Schema Design
Patterns
Differences when Modelling for
a Document Database versus a
Relational Database
MongoDB.local Sydney 2019: Data Modeling for MongoDB
Thinking in Documents
1. Polymorphism
• different documents may contain
different fields
2. Array
• represent a "one-to-many" relation
• index is on all entries
3. Sub Document
• grouping some fields together
4. JSON/BSON
• documents are often shown as
JSON
• BSON is the physical format
Example: modelling a blog
… 5 tabes become 1 or 2
collections
Example: Modelling a Social
Network
Relationnel MongoDB
Steps to create the model 1 – define schema
2 – develop app and queries
1 – identifying the queries
2 – define schema
Initial schema 3rd normal form
One solution
many solutions possible
Final schema likely denormalized few changes
Schema evolution difficult and not optimal
Likely downtime
easy and no downtime
Performance mediocre optimized
Differences: Relational/Tabular vs
Document
Other Considerations for the
Model1. one-to-many relationships where "many" is a humongous number
2. Embed or Reference
• Joins via $lookup
• Transactions for multi document writes
3. Transactions available for Replica set, and soon for Sharded Clusters
4. Sharding Key
5. Indexes
6. Simple queries, or more complex ones with the Aggregation Framework
Flexible Modelling Methodology for
MongoDB
MongoDB.local Sydney 2019: Data Modeling for MongoDB
Methodology
Methodology
1. Describe the Workload
Methodology
1. Describe the Workload
2. Identify and Model
the Relationships
MongoDB.local Sydney 2019: Data Modeling for MongoDB
MongoDB.local Sydney 2019: Data Modeling for MongoDB
MongoDB.local Sydney 2019: Data Modeling for MongoDB
Methodology
1. Describe the Workload
2. Identify and Model
the Relationships
3. Apply Patterns
Flexible Methodology
Case Study: Cuppa Coffee
A. Business: coffee shop franchises
B. Name: Cuppa Coffee
also considered: Coffee Mate, Crocodile Coffee
C. Objective:
• 10 000 stores in Australia, New Zealand and South Asia
• … then we invade America
D. Keys to success:
• Best coffee in the world
• Technology
Make the Best Coffee in the World
23g of ground coffee in, 20g of extracted
coffee out, in approximately 20 seconds
1. Fill a small or regular cup with 80% hot
water (not boiling but pretty hot). Your
cup should be 150ml to 200ml in total
volume, 80% of which will be hot water.
2. Grind 23g of coffee into your portafilter
using the double basket. We use a scale
that you can get here.
3. Draw 20g of coffee over the hot water by
placing your cup on a scale, press tare
and extract your shot.
Technology
1. Measure inventory in real time
• Shelves with scales
2. Big Data collection on cups of coffee
• weighings, temperature, time to produce, …
3. Data Analysis
• Coffee perfection
• Rush hours -> staffing needs
4. MongoDB
Methodology
1. Describe the Workload
2. Identify and Model
the Relationships
3. Apply Patterns
1 – Workload: List Queries
Query Operation Description
1. Coffee weight on the
shelves
write A shelf send information when coffee bags are
added or removed
2. Coffee to deliver to stores read How much coffee do we have to ship to the store in
the next days
3. Anomalies in the inventory read Analytics
4. Making a cup of coffee write A coffee machine reporting on the production of a
coffee cup
5. Analysis of cups of coffee read Analytics
6. Technical Support read Helping our franchisees
Query Quantification Qualification
1. Coffee weight on the shelves 10/day*shelf*store
=> 1/sec
<1s
critical write
2. Coffee to deliver to stores 1/day*store
=> 0.1/sec
<60s
3. Anomalies in the inventory 24 reads/day <5mins
"collection scan"
4. Making a cup of coffee 10 000 000 writes/day
115 writes/sec
<100ms
non-critical write
… cups of coffee at rush hour 3 000 000 writes/hr
833 writes/sec
<100ms
non-critical write
5. Analysis of cups of coffee 24 reads/day stale data is fine
"collection scan"
6. Technical Support 1000 reads/day <1s
1 – Workload: quantify/qualify the
queries
1 – Workload: quantify/qualify the
queriesQuery Quantification Qualification
1. Coffee weight on the shelves 10/day*shelf*store
=> 1/sec
<1s
critical write
2. Coffee to deliver to stores 1/day*store
=> 0.1/sec
<60s
3. Anomalies in the inventory 24 reads/day <5mins
"collection scan"
4. Making a cup of coffee 10 000 000 writes/day
115 writes/sec
<100ms
non-critical write
… cups of coffee at rush hour 3 000 000 writes/hr
833 writes/sec
<100ms
non-critical write
5. Analysis of cups of coffee 24 reads/day stale data is fine
"collection scan"
6. Technical Support 1000 reads/day <1s
Disk Space
Cups of coffee (one year of data)
• 10000 x 1000/day x 365
• 3.7 billions/year
• 370 GB (100 bytes/cup of coffee)
Weighings
• 10000 x 10/day x 365
• 365 billions/year
• 3.7 GB (100 bytes/weighings)
Methodology
1. Describe the Workload
2. Identify and Model
the Relationships
3. Apply Patterns
2 - Relations are still important
Type of Relation -> one-to-one/1-1 one-to-many/1-N many-to-many/N-N
Document embedded
in the parent
document
• one read
• no joins
• one read
• no joins
• one read
• no joins
• duplication of
information
Document referenced
in the parent
document
• smaller reads
• many reads
• smaller reads
• many reads
• smaller reads
• many reads
2 - Entities for Cuppa Café
- Coffee cups
- Stores
- Coffee machines
- Shelves
- Weighings
- Coffee bags
Methodology
1. Describe the Workload
2. Identify and Model
the Relationships
3. Apply Patterns
Schema Design Patterns
Schema Design Patterns
RessourcesA. Advanced Schema Design
Patterns
• MongoDB World 2017
• Webinar
B. MongoDB University
• university.mongodb.com
• M320 – Data Modeling (2019)
C. Blogs on Schema Design
Patterns
D. Appendix to this presentation
• Schema Versioning Pattern
• Computed Pattern
Schema Versioning
Computed Pattern
Subset Pattern
Subset Pattern
Bucket Pattern
Bucket Pattern
{
"device_id": 000123456,
"type": "2A",
"date": ISODate("2018-03-02"),
"temp": [ [ 20.0, 20.1, 20.2, ... ],
[ 22.1, 22.1, 22.0, ... ],
...
]
}
{
"device_id": 000123456,
"type": "2A",
"date": ISODate("2018-03-03"),
"temp": [ [ 20.1, 20.2, 20.3, ... ],
[ 22.4, 22.4, 22.3, ... ],
...
]
}
{
"device_id": 000123456,
"type": "2A",
"date": ISODate("2018-03-02T13"),
"temp": { 1: 20.0, 2: 20.1, 3: 20.2, ... }
}
{
"device_id": 000123456,
"type": "2A",
"date": ISODate("2018-03-02T14"),
"temp": { 1: 22.1, 2: 22.1, 3: 22.0, ... }
}
Bucket per
Day
Bucket per
Hour
External Reference Pattern
Cuppa Coffee - Solution with
Patterns• Schema Versioning
• Subset
• Computed
• Bucket
• External Reference
Conclusion
Takeaways from the Presentation
Recognize the
differences when
modelling for a
Document Database
versus a Relational
Database
Takeaways from the Presentation
Recognize the
differences when
modelling for a
Document Database
versus a Relational
Database
Summarize the steps
of a methodology
when modelling for
MongoDB
• Workload
• Relationships
• Patterns
Takeaways from the Presentation
Recognize the
differences when
modelling for a
Document Database
versus a Relational
Database
Summarize the steps
of a methodology
when modelling for
MongoDB
• Workload
• Relationships
• Patterns
Recognize the need
and when to apply
Schema Design
Patterns
Coming Soon …
• "Data Modelling" course at:
university.mongodb.com
Cheers!
Appendix A
Schema Versioning Pattern
Nightmare: Alter Table
This is what your dreams should be
when
thinking about a schema upgrade !
Schema Revision
Relational MongoDB
Versioned Unit Schema Document
Migration Procedure Difficult Easy
Service Uptime Interrupted No interruption
Rollback Difficult to nightmare-ish Easy
MongoDB.local Sydney 2019: Data Modeling for MongoDB
MongoDB.local Sydney 2019: Data Modeling for MongoDB
Application Lifecycle
Modify Application
• Can read/process all versions of documents
• Have different handler per version
• Reshape the document before processing it
Update all Application servers
• Install updated application
• Remove old processes
Once migration completed
• remove the code to process old versions.
Document Lifecycle
New Documents:
• Application writes them in latest version
Existing Documents
A) Use updates to documents
• to transform to latest version
• keep forever documents that
never need an update
B) or transform all documents in
batch
• no worry even if process takes
days
Timeline of the migration
Problem Solution
Use Cases Examples Benefits and Trade-Offs
Schema Versioning Pattern
● Avoid downtime while doing schema
upgrades
● Upgrading all documents can take hours,
days or even weeks when dealing with
big data
● Don't want to update all documents
✅ No downtime needed
✅ Feel in control of the migration
✅ Less future technical debt
� May need 2 indexes for same field while
in migration period
● Each document gets a "schema_version"
field
● Application can handle all versions
● Choose your strategy to migrate the
documents
● Every application that use a database,
deployed in production and heavily used.
● System with a lot of legacy data
Appendix B
Computed Pattern
Mathematical Operations
Mathematical Operations
"Fan Out" Operations
"Roll Up" Operations
Problem Solution
Use Cases Examples Benefits and Trade-Offs
Computed Pattern
● Costly computation or manipulation of
data
● Executed frequently on the same data,
producing the same result
✅ Read queries are faster
✅ Saving on resources like CPU and Disk
� May be difficult to identify the need
� Avoid applying or overusing it unless
needed
● Perform the operation and store the result
in the appropriate document and
collection
● If need to redo the operations, keep the
source of them
● Internet Of Things (IOT)
● Event Sourcing
● Time Series Data
● Frequent Aggregation Framework queries
MongoDB.local Sydney 2019: Data Modeling for MongoDB

More Related Content

What's hot (20)

PDF
Bài tập thiết kế cơ sở dữ liệu
Lê Minh
 
DOCX
Tìm hiểu và triển khai hệ thống tường lửa OPNSense cho doanh nghiệp.docx
DV Viết Luận văn luanvanmaster.com ZALO 0973287149
 
DOCX
Đồ án tốt nghiệp_ Xây dựng website bán hàng trực tuyến_964063.docx
hongmai178731
 
DOC
Baitapmangmaytinh
Đấy Vợ
 
DOC
Quan li cua hang laptop
kukitaka
 
DOC
BTL Lập trình C#
Lê Hoàng Anh
 
PDF
Quy tắc thiết kế giao diện và viết code C#
An Nguyen
 
DOC
Đồ Án Tốt Nghiệp công Nghệ Thông Tin.doc
sividocz
 
PDF
Do an xay_dung_website_thuong_mai_dien_tu
ThiênĐàng CôngDân
 
DOC
Báo cáo thực tập lập dự án khởi nghiệp với thương mại điện tử, HOT, ĐIỂM 8
Dịch Vụ Viết Bài Trọn Gói ZALO 0917193864
 
PDF
Báo cáo tốt nghiệp - XÂY DỰNG CHƯƠNG TRÌNH QUẢN LÝ NHÀ HÀNG VỪA VÀ NHỎ SỬ DỤ...
Duc Tran
 
PDF
Phân tích thiết kế hệ thống của hàng bán điện thoại di động
Nguyễn Danh Thanh
 
PDF
[Giasunhatrang.edu.vn]cong thuc-giai-nhanh-hop-chat-nhom-kem(hoa-hoc-va-ung-d...
GiaSư NhaTrang
 
PDF
Bài giảng quản lý nhân sự
jackjohn45
 
DOC
Phân tích thiết kế hệ thống quản lý bán nước giải khát
Minh Nguyển
 
DOC
Đề tài: Quản lý cửa hàng điện thoại di động, HAY
Dịch vụ viết thuê Khóa Luận - ZALO 0932091562
 
DOC
Báo cáo đồ án - Xây Dựng Website Bán Quần Áo Trực Tuyến.doc
Dịch vụ viết thuê đề tài trọn gói ☎☎☎ Liên hệ ZALO/TELE: 0973.287.149 👍👍
 
DOC
Bảng câu hỏi về thiết kế web
Alexis Nguyen
 
PDF
Trang nguyen-tieng-anh-lop-1
toantieuhociq
 
PDF
Tài liệu lập trình PHP từ căn bản đến nâng cao
ZendVN
 
Bài tập thiết kế cơ sở dữ liệu
Lê Minh
 
Tìm hiểu và triển khai hệ thống tường lửa OPNSense cho doanh nghiệp.docx
DV Viết Luận văn luanvanmaster.com ZALO 0973287149
 
Đồ án tốt nghiệp_ Xây dựng website bán hàng trực tuyến_964063.docx
hongmai178731
 
Baitapmangmaytinh
Đấy Vợ
 
Quan li cua hang laptop
kukitaka
 
BTL Lập trình C#
Lê Hoàng Anh
 
Quy tắc thiết kế giao diện và viết code C#
An Nguyen
 
Đồ Án Tốt Nghiệp công Nghệ Thông Tin.doc
sividocz
 
Do an xay_dung_website_thuong_mai_dien_tu
ThiênĐàng CôngDân
 
Báo cáo thực tập lập dự án khởi nghiệp với thương mại điện tử, HOT, ĐIỂM 8
Dịch Vụ Viết Bài Trọn Gói ZALO 0917193864
 
Báo cáo tốt nghiệp - XÂY DỰNG CHƯƠNG TRÌNH QUẢN LÝ NHÀ HÀNG VỪA VÀ NHỎ SỬ DỤ...
Duc Tran
 
Phân tích thiết kế hệ thống của hàng bán điện thoại di động
Nguyễn Danh Thanh
 
[Giasunhatrang.edu.vn]cong thuc-giai-nhanh-hop-chat-nhom-kem(hoa-hoc-va-ung-d...
GiaSư NhaTrang
 
Bài giảng quản lý nhân sự
jackjohn45
 
Phân tích thiết kế hệ thống quản lý bán nước giải khát
Minh Nguyển
 
Đề tài: Quản lý cửa hàng điện thoại di động, HAY
Dịch vụ viết thuê Khóa Luận - ZALO 0932091562
 
Báo cáo đồ án - Xây Dựng Website Bán Quần Áo Trực Tuyến.doc
Dịch vụ viết thuê đề tài trọn gói ☎☎☎ Liên hệ ZALO/TELE: 0973.287.149 👍👍
 
Bảng câu hỏi về thiết kế web
Alexis Nguyen
 
Trang nguyen-tieng-anh-lop-1
toantieuhociq
 
Tài liệu lập trình PHP từ căn bản đến nâng cao
ZendVN
 

Similar to MongoDB.local Sydney 2019: Data Modeling for MongoDB (20)

PDF
Data Modeling for MongoDB
MongoDB
 
PDF
MongoDB .local Bengaluru 2019: A Complete Methodology to Data Modeling for Mo...
MongoDB
 
PDF
Data Modelling for MongoDB - MongoDB.local Tel Aviv
Norberto Leite
 
PDF
MongoDB World 2019: A Complete Methodology to Data Modeling for MongoDB
MongoDB
 
PDF
MongoDB World 2019 - A Complete Methodology to Data Modeling for MongoDB
Daniel Coupal
 
PDF
MongoDB .local Chicago 2019: A Complete Methodology to Data Modeling for MongoDB
MongoDB
 
PDF
MongoDB .local Toronto 2019: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
PDF
MongoDB .local London 2019: A Complete Methodology to Data Modeling for MongoDB
Lisa Roth, PMP
 
PDF
MongoDB .local London 2019: A Complete Methodology to Data Modeling for MongoDB
MongoDB
 
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
PDF
Silicon Valley Code Camp 2014 - Advanced MongoDB
Daniel Coupal
 
PPTX
Bdf16 big-data-warehouse-case-study-data kitchen
Christopher Bergh
 
PPTX
Rapid Development with Schemaless Data Models
MongoDB
 
PDF
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Databricks
 
PPTX
Relational data modeling trends for transactional applications
Ike Ellis
 
PPTX
Hardware Provisioning
MongoDB
 
PPTX
Hardware Provisioning for MongoDB
MongoDB
 
PPTX
Advanced Schema Design Patterns
MongoDB
 
PDF
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Daniel Coupal
 
Data Modeling for MongoDB
MongoDB
 
MongoDB .local Bengaluru 2019: A Complete Methodology to Data Modeling for Mo...
MongoDB
 
Data Modelling for MongoDB - MongoDB.local Tel Aviv
Norberto Leite
 
MongoDB World 2019: A Complete Methodology to Data Modeling for MongoDB
MongoDB
 
MongoDB World 2019 - A Complete Methodology to Data Modeling for MongoDB
Daniel Coupal
 
MongoDB .local Chicago 2019: A Complete Methodology to Data Modeling for MongoDB
MongoDB
 
MongoDB .local Toronto 2019: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB .local London 2019: A Complete Methodology to Data Modeling for MongoDB
Lisa Roth, PMP
 
MongoDB .local London 2019: A Complete Methodology to Data Modeling for MongoDB
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
Silicon Valley Code Camp 2014 - Advanced MongoDB
Daniel Coupal
 
Bdf16 big-data-warehouse-case-study-data kitchen
Christopher Bergh
 
Rapid Development with Schemaless Data Models
MongoDB
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Databricks
 
Relational data modeling trends for transactional applications
Ike Ellis
 
Hardware Provisioning
MongoDB
 
Hardware Provisioning for MongoDB
MongoDB
 
Advanced Schema Design Patterns
MongoDB
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Daniel Coupal
 
Ad

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
PDF
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB
 
PDF
MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...
MongoDB
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB
 
MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...
MongoDB
 
Ad

Recently uploaded (20)

PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 

MongoDB.local Sydney 2019: Data Modeling for MongoDB

  • 1. Data Modelling for MongoDB Daniel Coupal, Curriculum Team, MongoDB March 14th, 2019 Sydney, Australia
  • 2. Daniel Coupal Curriculum Team, MongoDB [email protected] Palo Alto, CA, USA github.com/dcoupal
  • 3. Goals of the Presentation Recognize the differences when modelling for a Document Database versus a Relational Database Summarize the steps of a methodology when modelling for MongoDB Recognize the need and when to apply Schema Design Patterns
  • 4. Goals of the Presentation Recognize the differences when modelling for a Document Database versus a Relational Database Summarize the steps of a methodology when modelling for MongoDB Recognize the need and when to apply Schema Design Patterns
  • 5. Goals of the Presentation Recognize the differences when modelling for a Document Database versus a Relational Database Summarize the steps of a methodology when modelling for MongoDB Recognize the need and when to apply Schema Design Patterns
  • 6. Differences when Modelling for a Document Database versus a Relational Database
  • 8. Thinking in Documents 1. Polymorphism • different documents may contain different fields 2. Array • represent a "one-to-many" relation • index is on all entries 3. Sub Document • grouping some fields together 4. JSON/BSON • documents are often shown as JSON • BSON is the physical format
  • 10. … 5 tabes become 1 or 2 collections
  • 11. Example: Modelling a Social Network
  • 12. Relationnel MongoDB Steps to create the model 1 – define schema 2 – develop app and queries 1 – identifying the queries 2 – define schema Initial schema 3rd normal form One solution many solutions possible Final schema likely denormalized few changes Schema evolution difficult and not optimal Likely downtime easy and no downtime Performance mediocre optimized Differences: Relational/Tabular vs Document
  • 13. Other Considerations for the Model1. one-to-many relationships where "many" is a humongous number 2. Embed or Reference • Joins via $lookup • Transactions for multi document writes 3. Transactions available for Replica set, and soon for Sharded Clusters 4. Sharding Key 5. Indexes 6. Simple queries, or more complex ones with the Aggregation Framework
  • 18. Methodology 1. Describe the Workload 2. Identify and Model the Relationships
  • 22. Methodology 1. Describe the Workload 2. Identify and Model the Relationships 3. Apply Patterns
  • 24. Case Study: Cuppa Coffee A. Business: coffee shop franchises B. Name: Cuppa Coffee also considered: Coffee Mate, Crocodile Coffee C. Objective: • 10 000 stores in Australia, New Zealand and South Asia • … then we invade America D. Keys to success: • Best coffee in the world • Technology
  • 25. Make the Best Coffee in the World 23g of ground coffee in, 20g of extracted coffee out, in approximately 20 seconds 1. Fill a small or regular cup with 80% hot water (not boiling but pretty hot). Your cup should be 150ml to 200ml in total volume, 80% of which will be hot water. 2. Grind 23g of coffee into your portafilter using the double basket. We use a scale that you can get here. 3. Draw 20g of coffee over the hot water by placing your cup on a scale, press tare and extract your shot.
  • 26. Technology 1. Measure inventory in real time • Shelves with scales 2. Big Data collection on cups of coffee • weighings, temperature, time to produce, … 3. Data Analysis • Coffee perfection • Rush hours -> staffing needs 4. MongoDB
  • 27. Methodology 1. Describe the Workload 2. Identify and Model the Relationships 3. Apply Patterns
  • 28. 1 – Workload: List Queries Query Operation Description 1. Coffee weight on the shelves write A shelf send information when coffee bags are added or removed 2. Coffee to deliver to stores read How much coffee do we have to ship to the store in the next days 3. Anomalies in the inventory read Analytics 4. Making a cup of coffee write A coffee machine reporting on the production of a coffee cup 5. Analysis of cups of coffee read Analytics 6. Technical Support read Helping our franchisees
  • 29. Query Quantification Qualification 1. Coffee weight on the shelves 10/day*shelf*store => 1/sec <1s critical write 2. Coffee to deliver to stores 1/day*store => 0.1/sec <60s 3. Anomalies in the inventory 24 reads/day <5mins "collection scan" 4. Making a cup of coffee 10 000 000 writes/day 115 writes/sec <100ms non-critical write … cups of coffee at rush hour 3 000 000 writes/hr 833 writes/sec <100ms non-critical write 5. Analysis of cups of coffee 24 reads/day stale data is fine "collection scan" 6. Technical Support 1000 reads/day <1s 1 – Workload: quantify/qualify the queries
  • 30. 1 – Workload: quantify/qualify the queriesQuery Quantification Qualification 1. Coffee weight on the shelves 10/day*shelf*store => 1/sec <1s critical write 2. Coffee to deliver to stores 1/day*store => 0.1/sec <60s 3. Anomalies in the inventory 24 reads/day <5mins "collection scan" 4. Making a cup of coffee 10 000 000 writes/day 115 writes/sec <100ms non-critical write … cups of coffee at rush hour 3 000 000 writes/hr 833 writes/sec <100ms non-critical write 5. Analysis of cups of coffee 24 reads/day stale data is fine "collection scan" 6. Technical Support 1000 reads/day <1s
  • 31. Disk Space Cups of coffee (one year of data) • 10000 x 1000/day x 365 • 3.7 billions/year • 370 GB (100 bytes/cup of coffee) Weighings • 10000 x 10/day x 365 • 365 billions/year • 3.7 GB (100 bytes/weighings)
  • 32. Methodology 1. Describe the Workload 2. Identify and Model the Relationships 3. Apply Patterns
  • 33. 2 - Relations are still important Type of Relation -> one-to-one/1-1 one-to-many/1-N many-to-many/N-N Document embedded in the parent document • one read • no joins • one read • no joins • one read • no joins • duplication of information Document referenced in the parent document • smaller reads • many reads • smaller reads • many reads • smaller reads • many reads
  • 34. 2 - Entities for Cuppa Café - Coffee cups - Stores - Coffee machines - Shelves - Weighings - Coffee bags
  • 35. Methodology 1. Describe the Workload 2. Identify and Model the Relationships 3. Apply Patterns
  • 37. Schema Design Patterns RessourcesA. Advanced Schema Design Patterns • MongoDB World 2017 • Webinar B. MongoDB University • university.mongodb.com • M320 – Data Modeling (2019) C. Blogs on Schema Design Patterns D. Appendix to this presentation • Schema Versioning Pattern • Computed Pattern
  • 43. Bucket Pattern { "device_id": 000123456, "type": "2A", "date": ISODate("2018-03-02"), "temp": [ [ 20.0, 20.1, 20.2, ... ], [ 22.1, 22.1, 22.0, ... ], ... ] } { "device_id": 000123456, "type": "2A", "date": ISODate("2018-03-03"), "temp": [ [ 20.1, 20.2, 20.3, ... ], [ 22.4, 22.4, 22.3, ... ], ... ] } { "device_id": 000123456, "type": "2A", "date": ISODate("2018-03-02T13"), "temp": { 1: 20.0, 2: 20.1, 3: 20.2, ... } } { "device_id": 000123456, "type": "2A", "date": ISODate("2018-03-02T14"), "temp": { 1: 22.1, 2: 22.1, 3: 22.0, ... } } Bucket per Day Bucket per Hour
  • 45. Cuppa Coffee - Solution with Patterns• Schema Versioning • Subset • Computed • Bucket • External Reference
  • 47. Takeaways from the Presentation Recognize the differences when modelling for a Document Database versus a Relational Database
  • 48. Takeaways from the Presentation Recognize the differences when modelling for a Document Database versus a Relational Database Summarize the steps of a methodology when modelling for MongoDB • Workload • Relationships • Patterns
  • 49. Takeaways from the Presentation Recognize the differences when modelling for a Document Database versus a Relational Database Summarize the steps of a methodology when modelling for MongoDB • Workload • Relationships • Patterns Recognize the need and when to apply Schema Design Patterns
  • 50. Coming Soon … • "Data Modelling" course at: university.mongodb.com
  • 54. This is what your dreams should be when thinking about a schema upgrade !
  • 55. Schema Revision Relational MongoDB Versioned Unit Schema Document Migration Procedure Difficult Easy Service Uptime Interrupted No interruption Rollback Difficult to nightmare-ish Easy
  • 58. Application Lifecycle Modify Application • Can read/process all versions of documents • Have different handler per version • Reshape the document before processing it Update all Application servers • Install updated application • Remove old processes Once migration completed • remove the code to process old versions.
  • 59. Document Lifecycle New Documents: • Application writes them in latest version Existing Documents A) Use updates to documents • to transform to latest version • keep forever documents that never need an update B) or transform all documents in batch • no worry even if process takes days
  • 60. Timeline of the migration
  • 61. Problem Solution Use Cases Examples Benefits and Trade-Offs Schema Versioning Pattern ● Avoid downtime while doing schema upgrades ● Upgrading all documents can take hours, days or even weeks when dealing with big data ● Don't want to update all documents ✅ No downtime needed ✅ Feel in control of the migration ✅ Less future technical debt � May need 2 indexes for same field while in migration period ● Each document gets a "schema_version" field ● Application can handle all versions ● Choose your strategy to migrate the documents ● Every application that use a database, deployed in production and heavily used. ● System with a lot of legacy data
  • 67. Problem Solution Use Cases Examples Benefits and Trade-Offs Computed Pattern ● Costly computation or manipulation of data ● Executed frequently on the same data, producing the same result ✅ Read queries are faster ✅ Saving on resources like CPU and Disk � May be difficult to identify the need � Avoid applying or overusing it unless needed ● Perform the operation and store the result in the appropriate document and collection ● If need to redo the operations, keep the source of them ● Internet Of Things (IOT) ● Event Sourcing ● Time Series Data ● Frequent Aggregation Framework queries

Editor's Notes

  • #2: Modelling (Aus) vs Modeling (USA)
  • #3: Thanks for attending the conference Topic is data modeling, more specifically data modeling for MongoDB Why this presentation? More than using examples of documents Complement of Schema Design Patterns Talks
  • #4: 1. Recognize the differences when modelling for a Document Database vs a Relational Database
  • #5: 2. Summarize the steps of a flexible methodology
  • #6: 3. Recognize the need and when to apply Schema Design Patterns
  • #8: Document is key-value pairs, key being the column name and value, the associated value Value can be usual types: string, number, geolocation Or subdocument Or array Or array of subdocument
  • #9: Polymorphism Array Sub Document JSON/BSON Arrays model a one-to-many relationship. Array of document is the result of a join on 2 tables.
  • #10: There is only one solution to represent a relationship between 2 fields
  • #11: Left solution: Simpler Oriented "articles" Right solution A little more complex Oriented 'articles' and 'users' The question is how are you going to use the data, what are the queries?
  • #12: Left: Normalized representation Right pre-computed, we write every picture/blog to all the consumers/friends. It takes more space, the writes are slower, however the reads are faster Maybe the speed of reads make or break your system. Users will navigate away if the pages don't load fast enough.
  • #13: - We also refer to a Relational Database as a Tabular Database
  • #17: Different inputs available Migrating from a RDBMS would provide logs and stats on the current system
  • #20: Units of information for the domain to model Example from a movie Website
  • #22: - When you think in documents, you may assign the reviews info directly in the movies
  • #23: - A lot of patterns are about performance. Only apply them if they are needed
  • #31: - One query dwarfs the rest, this will help us provision the I/O
  • #32: - One collection dwarfs the other one, this will help size the disks
  • #38: - Let's go quickly over some of them for our use case
  • #48: TODO - Use 3 columns like in this presentation: https://docs.google.com/presentation/d/1IYlqAk6LtKIP6ZKjqQW6TGPJbD4hJ2w9smQmArhu0e0/edit?ts=5c606c37#slide=id.g4c8e0a0b6f_0_14
  • #49: TODO - Use 3 columns like in this presentation: https://docs.google.com/presentation/d/1IYlqAk6LtKIP6ZKjqQW6TGPJbD4hJ2w9smQmArhu0e0/edit?ts=5c606c37#slide=id.g4c8e0a0b6f_0_14
  • #50: TODO - Use 3 columns like in this presentation: https://docs.google.com/presentation/d/1IYlqAk6LtKIP6ZKjqQW6TGPJbD4hJ2w9smQmArhu0e0/edit?ts=5c606c37#slide=id.g4c8e0a0b6f_0_14