SlideShare a Scribd company logo
Perl Engineer & Evangelist, 10gen
Mike Friedman
#MongoDBdays
Schema Design
Four Real-World Use
Cases
Single Table En
Agenda
• Why is schema design important
• 4 Real World Schemas
– Inbox
– History
– IndexedAttributes
– Multiple Identities
• Conclusions
Why is Schema Design
important?
• Largest factor for a performant system
• Schema design with MongoDB is different
• RDBMS – "What answers do I have?"
• MongoDB – "What question will I have?"
#1 - Message Inbox
Let’s get
Social
Sending Messages
?
Design Goals
• Efficiently send new messages to recipients
• Efficiently read inbox
Reading my Inbox
?
3 Approaches (there are
more)
• Fan out on Read
• Fan out on Write
• Fan out on Write with Bucketing
// Shard on "from"
db.shardCollection( "mongodbdays.inbox", { from: 1 } )
// Make sure we have an index to handle inbox reads
db.inbox.ensureIndex( { to: 1, sent: 1 } )
msg = {
from: "Joe",
to: [ "Bob", "Jane" ],
sent: new Date(),
message: "Hi!",
}
// Send a message
db.inbox.save( msg )
// Read my inbox
db.inbox.find( { to: "Joe" } ).sort( { sent: -1 } )
Fan out on read
Fan out on read – I/O
Shard
1 Shard 2
Shard
3
Send
Message
Fan out on read – I/O
Shard
1 Shard 2
Shard
3
Read
Inbox
Send
Message
Considerations
• Write: One document per message sent
• Read: Find all messages with my own name in
the recipient field
• Read: Requires scatter-gather on sharded
cluster
• A lot of random I/O on a shard to find everything
// Shard on “recipient” and “sent”
db.shardCollection( "mongodbdays.inbox", { ”recipient”: 1, ”sent”: 1 } )
msg = {
from: "Joe",
to: [ "Bob", "Jane" ],
sent: new Date(),
message: "Hi!",
}
// Send a message
for ( recipient in msg.to ) {
msg.recipient = msg.to[recipient]
db.inbox.save( msg );
}
// Read my inbox
db.inbox.find( { recipient: "Joe" } ).sort( { sent: -1 } )
Fan out on write
Fan out on write – I/O
Shard
1
Shard
2
Shard
3
Send
Message
Fan out on write – I/O
Read
Inbox
Send
Message
Shard
1
Shard
2
Shard
3
Considerations
• Write: One document per recipient
• Read: Find all of the messages with me as the
recipient
• Can shard on recipient, so inbox reads hit one
shard
• But still lots of random I/O on the shard
// Shard on "owner / sequence"
db.shardCollection( "mongodbdays.inbox",
{ owner: 1, sequence: 1 } )
db.shardCollection( "mongodbdays.users", { user_name: 1 } )
msg = {
from: "Joe",
to: [ "Bob", "Jane" ],
sent: new Date(),
message: "Hi!",
}
Fan out on write with buckets
// Send a message
for( recipient in msg.to) {
count = db.users.findAndModify({
query: { user_name: msg.to[recipient] },
update: { "$inc": { "msg_count": 1 } },
upsert: true,
new: true }).msg_count;
sequence = Math.floor(count / 50);
db.inbox.update({
owner: msg.to[recipient], sequence: sequence },
{ $push: { "messages": msg } },
{ upsert: true } );
}
// Read my inbox
db.inbox.find( { owner: "Joe" } )
.sort ( { sequence: -1 } ).limit( 2 )
Fan out on write with buckets
Fan out on write with buckets
• Each “inbox” document is an array of messages
• Append a message onto “inbox” of recipient
• Bucket inboxes so there’s not too many
messages per document
• Can shard on recipient, so inbox reads hit one
shard
• 1 or 2 documents to read the whole inbox
Fan out on write with buckets – I/O
Shard
1
Shard
2
Shard
3
Send
Message
Shard
1
Shard
2
Shard
3
Fan out on write with buckets – I/O
Read
Inbox
Send
Message
#2 – History
Data Modeling for the Real World
Design Goals
• Need to retain a limited amount of history e.g.
– Hours, Days, Weeks
– May be legislative requirement (e.g. HIPPA, SOX, DPA)
• Need to query efficiently by
– match
– ranges
3 Approaches (there are
more)
• Bucket by Number of messages
• Fixed size array
• Bucket by date + TTL collections
db.inbox.find()
{ owner: "Joe", sequence: 25,
messages: [
{ from: "Joe",
to: [ "Bob", "Jane" ],
sent: ISODate("2013-03-01T09:59:42.689Z"),
message: "Hi!"
},
…
] }
// Query with a date range
db.inbox.find ({owner: "friend1",
messages: {
$elemMatch: {sent:{$gte: ISODate("…") }}}})
// Remove elements based on a date
db.inbox.update({owner: "friend1" },
{ $pull: { messages: {
sent: { $gte: ISODate("…") } } } } )
Bucket by number of
messages
Considerations
• Shrinking documents, space can be reclaimed
with
– db.runCommand ( { compact: '<collection>' } )
• Removing the document after the last element in
the array as been removed
– { "_id" : …, "messages" : [ ], "owner" : "friend1",
"sequence" : 0 }
msg = {
from: "Your Boss",
to: [ "Bob" ],
sent: new Date(),
message: "CALL ME NOW!"
}
// 2.4 Introduces $each, $sort and $slice for $push
db.messages.update(
{ _id: 1 },
{ $push: { messages: { $each: [ msg ],
$sort: { sent: 1 },
$slice: -50 }
}
}
)
Fixed Size Array
Considerations
• Need to compute the size of the array based on
retention period
// messages: one doc per user per day
db.inbox.findOne()
{
_id: 1,
to: "Joe",
sequence: ISODate("2013-02-04T00:00:00.392Z"),
messages: [ ]
}
// Auto expires data after 31536000 seconds = 1 year
db.messages.ensureIndex( { sequence: 1 },
{ expireAfterSeconds: 31536000 } )
TTL Collections
#3 – Indexed Attributes
Design Goal
• Application needs to stored a variable number of
attributes e.g.
– User defined Form
– Meta Data tags
• Queries needed
– Equality
– Range based
• Need to be efficient, regardless of the number of
attributes
2 Approaches (there are
more)
• Attributes as Embedded Document
• Attributes as Objects in an Array
db.files.insert( { _id: "local.0",
attr: { type: "text", size: 64,
created: ISODate("..." } } )
db.files.insert( { _id: "local.1",
attr: { type: "text", size: 128} } )
db.files.insert( { _id: "mongod",
attr: { type: "binary", size: 256,
created: ISODate("...") } } )
// Need to create an index for each item in the sub-document
db.files.ensureIndex( { "attr.type": 1 } )
db.files.find( { "attr.type": "text"} )
// Can perform range queries
db.files.ensureIndex( { "attr.size": 1 } )
db.files.find( { "attr.size": { $gt: 64, $lte: 16384 } } )
Attributes as a Sub-
Document
Considerations
• Each attribute needs an Index
• Each time you extend, you add an index
• Lots and lots of indexes
db.files.insert( {_id: "local.0",
attr: [ { type: "text" },
{ size: 64 },
{ created: ISODate("...") } ] } )
db.files.insert( { _id: "local.1",
attr: [ { type: "text" },
{ size: 128 } ] } )
db.files.insert( { _id: "mongod",
attr: [ { type: "binary" },
{ size: 256 },
{ created: ISODate("...") } ] } )
db.files.ensureIndex( { attr: 1 } )
Attributes as Objects in Array
Considerations
• Only one index needed on attr
• Can support range queries, etc.
• Index can be used only once per query
#4 – Multiple Identities
Design Goal
• Ability to look up by a number of different
identities e.g.
• Username
• Email address
• FB Handle
• LinkedIn URL
2 Approaches (there are
more)
• Identifiers in a single document
• Separate Identifiers from Content
db.users.findOne()
{ _id: "joe",
email: "joe@example.com,
fb: "joe.smith", // facebook
li: "joe.e.smith", // linkedin
other: {…}
}
// Shard collection by _id
db.shardCollection("mongodbdays.users", { _id: 1 } )
// Create indexes on each key
db.users.ensureIndex( { email: 1} )
db.users.ensureIndex( { fb: 1 } )
db.users.ensureIndex( { li: 1 } )
Single Document by User
Read by _id (shard key)
Shard 1 Shard 2 Shard 3
find( { _id: "joe"} )
Read by email (non-shard
key)
Shard 1 Shard 2 Shard 3
find ( { email: joe@example.com }
)
Considerations
• Lookup by shard key is routed to 1 shard
• Lookup by other identifier is scatter gathered
across all shards
• Secondary keys cannot have a unique index
// Create unique index
db.identities.ensureIndex( { identifier : 1} , { unique: true} )
// Create a document for each users document
db.identities.save(
{ identifier : { hndl: "joe" }, user: "1200-42" } )
db.identities.save(
{ identifier : { email: "joe@abc.com" }, user: "1200-42" } )
db.identities.save(
{ identifier : { li: "joe.e.smith" }, user: "1200-42" } )
// Shard collection by _id
db.shardCollection( "mydb.identities", { identifier : 1 } )
// Create unique index
db.users.ensureIndex( { _id: 1} , { unique: true} )
// Shard collection by _id
db.shardCollection( "mydb.users", { _id: 1 } )
Document per Identity
Read requires 2 reads
Shard 1 Shard 2 Shard 3
db.identities.find({"identifier" : {
"hndl" : "joe" }})
db.users.find( { _id: "1200-42"}
)
Considerations
• Lookup to Identities is a routed query
• Lookup to Users is a routed query
• Unique indexes available
• Must do two queries per lookup
Conclusion
Summary
• Multiple ways to model a domain problem
• Understand the key uses cases of your app
• Balance between ease of query vs. ease of write
• Random I/O should be avoided
Perl Engineer & Evangelist, 10gen
Mike Friedman
#MongoDBdays
Thank You
Next Sessions at 3:40
5th Floor:
West Side Ballroom 3&4:Advanced Replication Internals
West Side Ballroom 1&2: Building a High-Performance Distributed
Task Queue on MongoDB
Juilliard Complex: WhiteBoard Q&A
Lyceum Complex: Ask the Experts
7th Floor:
Empire Complex: Managing a Maturing MongoDB Ecosystem
SoHo Complex: MongoDB Indexing Constraints and Creative
Schemas

More Related Content

What's hot (19)

PDF
MongoDB Schema Design
Alex Litvinok
 
PPTX
Building a Scalable Inbox System with MongoDB and Java
antoinegirbal
 
PDF
Building your first app with mongo db
MongoDB
 
PPT
Building web applications with mongo db presentation
Murat Çakal
 
PPTX
Dev Jumpstart: Schema Design Best Practices
MongoDB
 
PDF
Mongo DB schema design patterns
joergreichert
 
PPTX
Webinar: Schema Design
MongoDB
 
KEY
Schema Design by Example ~ MongoSF 2012
hungarianhc
 
PDF
Building Apps with MongoDB
Nate Abele
 
PPTX
Webinar: Back to Basics: Thinking in Documents
MongoDB
 
PPTX
Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed
MongoDB
 
PPTX
Building Your First App: An Introduction to MongoDB
MongoDB
 
PPTX
Socialite, the Open Source Status Feed Part 2: Managing the Social Graph
MongoDB
 
PPT
5 Pitfalls to Avoid with MongoDB
Tim Callaghan
 
PPTX
Socialite, the Open Source Status Feed
MongoDB
 
PPT
MongoDB Schema Design
MongoDB
 
PPT
Building Your First MongoDB App ~ Metadata Catalog
hungarianhc
 
PDF
MongoDB and Ruby on Rails
rfischer20
 
PPTX
Back to Basics Webinar 3: Schema Design Thinking in Documents
MongoDB
 
MongoDB Schema Design
Alex Litvinok
 
Building a Scalable Inbox System with MongoDB and Java
antoinegirbal
 
Building your first app with mongo db
MongoDB
 
Building web applications with mongo db presentation
Murat Çakal
 
Dev Jumpstart: Schema Design Best Practices
MongoDB
 
Mongo DB schema design patterns
joergreichert
 
Webinar: Schema Design
MongoDB
 
Schema Design by Example ~ MongoSF 2012
hungarianhc
 
Building Apps with MongoDB
Nate Abele
 
Webinar: Back to Basics: Thinking in Documents
MongoDB
 
Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed
MongoDB
 
Building Your First App: An Introduction to MongoDB
MongoDB
 
Socialite, the Open Source Status Feed Part 2: Managing the Social Graph
MongoDB
 
5 Pitfalls to Avoid with MongoDB
Tim Callaghan
 
Socialite, the Open Source Status Feed
MongoDB
 
MongoDB Schema Design
MongoDB
 
Building Your First MongoDB App ~ Metadata Catalog
hungarianhc
 
MongoDB and Ruby on Rails
rfischer20
 
Back to Basics Webinar 3: Schema Design Thinking in Documents
MongoDB
 

Viewers also liked (6)

PDF
CPANci: Continuous Integration for CPAN
Mike Friedman
 
PDF
Make Your Own Perl with Moops
Mike Friedman
 
PDF
21st Century CPAN Testing: CPANci
Mike Friedman
 
PDF
Building a MongoDB App with Perl
Mike Friedman
 
PDF
The Perl API for the Mortally Terrified (beta)
Mike Friedman
 
PDF
MongoDB for Perl Developers
Ynon Perek
 
CPANci: Continuous Integration for CPAN
Mike Friedman
 
Make Your Own Perl with Moops
Mike Friedman
 
21st Century CPAN Testing: CPANci
Mike Friedman
 
Building a MongoDB App with Perl
Mike Friedman
 
The Perl API for the Mortally Terrified (beta)
Mike Friedman
 
MongoDB for Perl Developers
Ynon Perek
 
Ad

Similar to Data Modeling for the Real World (20)

PPTX
MongoDB Schema Design: Four Real-World Examples
Lewis Lin 🦊
 
PPTX
Choosing a Shard key
MongoDB
 
PPTX
Schema Design - Real world use case
Matias Cascallares
 
PDF
Mongodb in-anger-boston-rb-2011
bostonrb
 
PDF
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
Matias Cascallares
 
KEY
2012 phoenix mug
Paul Pedersen
 
PDF
MongoDB and Schema Design
Matias Cascallares
 
KEY
Scaling with MongoDB
MongoDB
 
KEY
Managing Social Content with MongoDB
MongoDB
 
PPTX
MongoDB: How We Did It – Reanimating Identity at AOL
MongoDB
 
PDF
Building your first app with MongoDB
Norberto Leite
 
KEY
Schema design
christkv
 
KEY
Schema Design (Mongo Austin)
MongoDB
 
KEY
2011 mongo sf-schemadesign
MongoDB
 
PDF
Getting Started with MongoDB: 4 Application Designs
DATAVERSITY
 
ODP
MongoDB - A Document NoSQL Database
Ruben Inoto Soto
 
PDF
Mongo db for C# Developers
Simon Elliston Ball
 
PPTX
Intro To Mongo Db
chriskite
 
PPTX
Schema design mongo_boston
MongoDB
 
PDF
10gen Presents Schema Design and Data Modeling
DATAVERSITY
 
MongoDB Schema Design: Four Real-World Examples
Lewis Lin 🦊
 
Choosing a Shard key
MongoDB
 
Schema Design - Real world use case
Matias Cascallares
 
Mongodb in-anger-boston-rb-2011
bostonrb
 
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
Matias Cascallares
 
2012 phoenix mug
Paul Pedersen
 
MongoDB and Schema Design
Matias Cascallares
 
Scaling with MongoDB
MongoDB
 
Managing Social Content with MongoDB
MongoDB
 
MongoDB: How We Did It – Reanimating Identity at AOL
MongoDB
 
Building your first app with MongoDB
Norberto Leite
 
Schema design
christkv
 
Schema Design (Mongo Austin)
MongoDB
 
2011 mongo sf-schemadesign
MongoDB
 
Getting Started with MongoDB: 4 Application Designs
DATAVERSITY
 
MongoDB - A Document NoSQL Database
Ruben Inoto Soto
 
Mongo db for C# Developers
Simon Elliston Ball
 
Intro To Mongo Db
chriskite
 
Schema design mongo_boston
MongoDB
 
10gen Presents Schema Design and Data Modeling
DATAVERSITY
 
Ad

Recently uploaded (20)

PDF
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PPTX
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
PDF
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
Smart Air Quality Monitoring with Serrax AQM190 LITE
SERRAX TECHNOLOGIES LLP
 
PDF
Wojciech Ciemski for Top Cyber News MAGAZINE. June 2025
Dr. Ludmila Morozova-Buss
 
PDF
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
Smart Air Quality Monitoring with Serrax AQM190 LITE
SERRAX TECHNOLOGIES LLP
 
Wojciech Ciemski for Top Cyber News MAGAZINE. June 2025
Dr. Ludmila Morozova-Buss
 
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 

Data Modeling for the Real World

  • 1. Perl Engineer & Evangelist, 10gen Mike Friedman #MongoDBdays Schema Design Four Real-World Use Cases
  • 2. Single Table En Agenda • Why is schema design important • 4 Real World Schemas – Inbox – History – IndexedAttributes – Multiple Identities • Conclusions
  • 3. Why is Schema Design important? • Largest factor for a performant system • Schema design with MongoDB is different • RDBMS – "What answers do I have?" • MongoDB – "What question will I have?"
  • 4. #1 - Message Inbox
  • 7. Design Goals • Efficiently send new messages to recipients • Efficiently read inbox
  • 9. 3 Approaches (there are more) • Fan out on Read • Fan out on Write • Fan out on Write with Bucketing
  • 10. // Shard on "from" db.shardCollection( "mongodbdays.inbox", { from: 1 } ) // Make sure we have an index to handle inbox reads db.inbox.ensureIndex( { to: 1, sent: 1 } ) msg = { from: "Joe", to: [ "Bob", "Jane" ], sent: new Date(), message: "Hi!", } // Send a message db.inbox.save( msg ) // Read my inbox db.inbox.find( { to: "Joe" } ).sort( { sent: -1 } ) Fan out on read
  • 11. Fan out on read – I/O Shard 1 Shard 2 Shard 3 Send Message
  • 12. Fan out on read – I/O Shard 1 Shard 2 Shard 3 Read Inbox Send Message
  • 13. Considerations • Write: One document per message sent • Read: Find all messages with my own name in the recipient field • Read: Requires scatter-gather on sharded cluster • A lot of random I/O on a shard to find everything
  • 14. // Shard on “recipient” and “sent” db.shardCollection( "mongodbdays.inbox", { ”recipient”: 1, ”sent”: 1 } ) msg = { from: "Joe", to: [ "Bob", "Jane" ], sent: new Date(), message: "Hi!", } // Send a message for ( recipient in msg.to ) { msg.recipient = msg.to[recipient] db.inbox.save( msg ); } // Read my inbox db.inbox.find( { recipient: "Joe" } ).sort( { sent: -1 } ) Fan out on write
  • 15. Fan out on write – I/O Shard 1 Shard 2 Shard 3 Send Message
  • 16. Fan out on write – I/O Read Inbox Send Message Shard 1 Shard 2 Shard 3
  • 17. Considerations • Write: One document per recipient • Read: Find all of the messages with me as the recipient • Can shard on recipient, so inbox reads hit one shard • But still lots of random I/O on the shard
  • 18. // Shard on "owner / sequence" db.shardCollection( "mongodbdays.inbox", { owner: 1, sequence: 1 } ) db.shardCollection( "mongodbdays.users", { user_name: 1 } ) msg = { from: "Joe", to: [ "Bob", "Jane" ], sent: new Date(), message: "Hi!", } Fan out on write with buckets
  • 19. // Send a message for( recipient in msg.to) { count = db.users.findAndModify({ query: { user_name: msg.to[recipient] }, update: { "$inc": { "msg_count": 1 } }, upsert: true, new: true }).msg_count; sequence = Math.floor(count / 50); db.inbox.update({ owner: msg.to[recipient], sequence: sequence }, { $push: { "messages": msg } }, { upsert: true } ); } // Read my inbox db.inbox.find( { owner: "Joe" } ) .sort ( { sequence: -1 } ).limit( 2 ) Fan out on write with buckets
  • 20. Fan out on write with buckets • Each “inbox” document is an array of messages • Append a message onto “inbox” of recipient • Bucket inboxes so there’s not too many messages per document • Can shard on recipient, so inbox reads hit one shard • 1 or 2 documents to read the whole inbox
  • 21. Fan out on write with buckets – I/O Shard 1 Shard 2 Shard 3 Send Message
  • 22. Shard 1 Shard 2 Shard 3 Fan out on write with buckets – I/O Read Inbox Send Message
  • 25. Design Goals • Need to retain a limited amount of history e.g. – Hours, Days, Weeks – May be legislative requirement (e.g. HIPPA, SOX, DPA) • Need to query efficiently by – match – ranges
  • 26. 3 Approaches (there are more) • Bucket by Number of messages • Fixed size array • Bucket by date + TTL collections
  • 27. db.inbox.find() { owner: "Joe", sequence: 25, messages: [ { from: "Joe", to: [ "Bob", "Jane" ], sent: ISODate("2013-03-01T09:59:42.689Z"), message: "Hi!" }, … ] } // Query with a date range db.inbox.find ({owner: "friend1", messages: { $elemMatch: {sent:{$gte: ISODate("…") }}}}) // Remove elements based on a date db.inbox.update({owner: "friend1" }, { $pull: { messages: { sent: { $gte: ISODate("…") } } } } ) Bucket by number of messages
  • 28. Considerations • Shrinking documents, space can be reclaimed with – db.runCommand ( { compact: '<collection>' } ) • Removing the document after the last element in the array as been removed – { "_id" : …, "messages" : [ ], "owner" : "friend1", "sequence" : 0 }
  • 29. msg = { from: "Your Boss", to: [ "Bob" ], sent: new Date(), message: "CALL ME NOW!" } // 2.4 Introduces $each, $sort and $slice for $push db.messages.update( { _id: 1 }, { $push: { messages: { $each: [ msg ], $sort: { sent: 1 }, $slice: -50 } } } ) Fixed Size Array
  • 30. Considerations • Need to compute the size of the array based on retention period
  • 31. // messages: one doc per user per day db.inbox.findOne() { _id: 1, to: "Joe", sequence: ISODate("2013-02-04T00:00:00.392Z"), messages: [ ] } // Auto expires data after 31536000 seconds = 1 year db.messages.ensureIndex( { sequence: 1 }, { expireAfterSeconds: 31536000 } ) TTL Collections
  • 32. #3 – Indexed Attributes
  • 33. Design Goal • Application needs to stored a variable number of attributes e.g. – User defined Form – Meta Data tags • Queries needed – Equality – Range based • Need to be efficient, regardless of the number of attributes
  • 34. 2 Approaches (there are more) • Attributes as Embedded Document • Attributes as Objects in an Array
  • 35. db.files.insert( { _id: "local.0", attr: { type: "text", size: 64, created: ISODate("..." } } ) db.files.insert( { _id: "local.1", attr: { type: "text", size: 128} } ) db.files.insert( { _id: "mongod", attr: { type: "binary", size: 256, created: ISODate("...") } } ) // Need to create an index for each item in the sub-document db.files.ensureIndex( { "attr.type": 1 } ) db.files.find( { "attr.type": "text"} ) // Can perform range queries db.files.ensureIndex( { "attr.size": 1 } ) db.files.find( { "attr.size": { $gt: 64, $lte: 16384 } } ) Attributes as a Sub- Document
  • 36. Considerations • Each attribute needs an Index • Each time you extend, you add an index • Lots and lots of indexes
  • 37. db.files.insert( {_id: "local.0", attr: [ { type: "text" }, { size: 64 }, { created: ISODate("...") } ] } ) db.files.insert( { _id: "local.1", attr: [ { type: "text" }, { size: 128 } ] } ) db.files.insert( { _id: "mongod", attr: [ { type: "binary" }, { size: 256 }, { created: ISODate("...") } ] } ) db.files.ensureIndex( { attr: 1 } ) Attributes as Objects in Array
  • 38. Considerations • Only one index needed on attr • Can support range queries, etc. • Index can be used only once per query
  • 39. #4 – Multiple Identities
  • 40. Design Goal • Ability to look up by a number of different identities e.g. • Username • Email address • FB Handle • LinkedIn URL
  • 41. 2 Approaches (there are more) • Identifiers in a single document • Separate Identifiers from Content
  • 42. db.users.findOne() { _id: "joe", email: "[email protected], fb: "joe.smith", // facebook li: "joe.e.smith", // linkedin other: {…} } // Shard collection by _id db.shardCollection("mongodbdays.users", { _id: 1 } ) // Create indexes on each key db.users.ensureIndex( { email: 1} ) db.users.ensureIndex( { fb: 1 } ) db.users.ensureIndex( { li: 1 } ) Single Document by User
  • 43. Read by _id (shard key) Shard 1 Shard 2 Shard 3 find( { _id: "joe"} )
  • 44. Read by email (non-shard key) Shard 1 Shard 2 Shard 3 find ( { email: [email protected] } )
  • 45. Considerations • Lookup by shard key is routed to 1 shard • Lookup by other identifier is scatter gathered across all shards • Secondary keys cannot have a unique index
  • 46. // Create unique index db.identities.ensureIndex( { identifier : 1} , { unique: true} ) // Create a document for each users document db.identities.save( { identifier : { hndl: "joe" }, user: "1200-42" } ) db.identities.save( { identifier : { email: "[email protected]" }, user: "1200-42" } ) db.identities.save( { identifier : { li: "joe.e.smith" }, user: "1200-42" } ) // Shard collection by _id db.shardCollection( "mydb.identities", { identifier : 1 } ) // Create unique index db.users.ensureIndex( { _id: 1} , { unique: true} ) // Shard collection by _id db.shardCollection( "mydb.users", { _id: 1 } ) Document per Identity
  • 47. Read requires 2 reads Shard 1 Shard 2 Shard 3 db.identities.find({"identifier" : { "hndl" : "joe" }}) db.users.find( { _id: "1200-42"} )
  • 48. Considerations • Lookup to Identities is a routed query • Lookup to Users is a routed query • Unique indexes available • Must do two queries per lookup
  • 50. Summary • Multiple ways to model a domain problem • Understand the key uses cases of your app • Balance between ease of query vs. ease of write • Random I/O should be avoided
  • 51. Perl Engineer & Evangelist, 10gen Mike Friedman #MongoDBdays Thank You
  • 52. Next Sessions at 3:40 5th Floor: West Side Ballroom 3&4:Advanced Replication Internals West Side Ballroom 1&2: Building a High-Performance Distributed Task Queue on MongoDB Juilliard Complex: WhiteBoard Q&A Lyceum Complex: Ask the Experts 7th Floor: Empire Complex: Managing a Maturing MongoDB Ecosystem SoHo Complex: MongoDB Indexing Constraints and Creative Schemas