1
Evolvable Application Development with
MongoDB
Gerd Teniers
Bart Wullems
for .NET developers
WARNING – This session is rated as a ‘Grandma session’ (=Level 200)
3 goals of this presentations
When you leave this presentation you should have learned
How easy it is to get started using MongoDB
How using MongoDB changes the way you design and build your applications
How MongoDB’s flexibility supports evolutionary design
That giving speakers beer before a session is never a good idea
What is not cool?
White socks & sandals
What is not cool?
Dancing like Miley Cyrus
What is not cool?
Relational databases
What is cool?
Short pants and very large socks
What is cool?
Dancing like Psy
What is cool?
NO-SQL (=Not Only SQL)
ThoughtWork Technology Radar
Entity Framework 7 will support No-SQL
Gartner
What is MongoDB?
MongoDB
HuMongous
General purpose database
Document oriented database using JSON document syntax
Features:
- Flexibility
- Power
- Scaling
- Ease of Use
- Built-in Javascript
Users: Craigslist, eBay, Foursquare, SourceForge, and The New York Times.
Written in C++
Extensive use of memory-mapped files
i.e. read-through write-through memory caching.
Runs nearly everywhere
Data serialized as BSON (fast parsing)
Full support for primary & secondary indexes
Document model = less work
High Performance
MongoDB Database Architecture: Document
{
_id: ObjectId("5099803df3f4948bd2f98391"),
name: { first: "Alan", last: "Turing" },
birth: new Date('Jun 23, 1912'),
death: new Date('Jun 07, 1954'),
contribs: [ "Turing machine", "Turing test", "Turingery" ],
views : NumberLong(1250000)
}
MongoDB Database Architecture: Collection
Logical group of documents
May or may not share same keys
Schema is dynamic/application maintained
Why should I use it?(or how do I convince my boss?)
Developer productivity
Avoid ORM pain, no mapping needed
Performance(again)
Scaling out is easy(or at least easier)
Optimized for reads
Flexibility
Dynamic schema
How to run it?
Exe
Windows service
Azure
3rd party commercial hosting
How to talk to it?
Mongo shell Official and non official drivers
>12 languages supported
DEMO 1 - PROTOTYPING
Schema design
23
First step in any application
is determine your
domain/entities
In a relational based app
We would start by doing
schema design
In a MongoDB based app
We start building our app
and let the schema evolve
Comparison
Album
- id
- artistid
- title
Track
- no
- name
- unitPrice
- popularity
Artist
- id
- name
Album
- _id
- title
- artist
- tracks[]
- _id
- name
Relational Document db
Modeling
Modeling
Start from application-specific queries
“What questions do I have?” vs “What answers”
“Data like the application wants it”
Base parent documents on
The most common usage
What do I want returned?
Modeling
Embedding vs Linking vs Hybrid
Album
- _id
- artist
- cover
- _id
- name
Artist
- _id
- name
- photo
Product
Single collection inheritance
Product
- _id
- price
Book
- author
- title
Album
- artist
- title
Jeans
- size
- color
- _id
- price
- author
- title
Relational Document db
- _id
- price
- size
- color
Product
Single collection inheritance
Product
- _id
- price
Book
- author
- title
Album
- artist
- title
Jeans
- size
- color
_type: Book
- _id
- price
- author
- title
Relational Document db
_type: Jeans
- _id
- price
- size
- color
One-to-many
Embedded array / array keys
Some queries get harder
You can index arrays!
Normalized approach
More flexibility
A lot less performance
BlogPost
- _id
- content
- tags: {“foo”, “bar”}
- comments: {“id1”, “id2”}
Demo 2 – MODELING
CRUD
CRUD operations
Create: insert, save
Read: find, findOne
Update: update, save
Delete: remove, drop
ACID Transactions
No support for multi-document
transactions commit/rollback
Atomic operations on document level
Multiple actions inside the same
document
Incl. embedded documents
By keeping transaction support
extremely simple, MongoDB can
provide greater performance
especially for partitioned or replicated systems
Demo 3 – CRUD
GridFS
Storing binary documents
Although MongoDB is a document database, it’s not good for documents :-S
Document != .PNG & .PDF files
Document size is limited
Max document size is 16MB
Recommended document size <250KB
Solution is GridFS
Mechanism for storing large binary files in MongoDB
Stores metadata in a single document inside the fs.files collection
Splits files into chunks and stores them inside the fs.chunks collection
GridFS implementation is handled completely by the client driver
Demo 4 – Evolving your domain model
------------& GRIDFS
Evolving your domain model
Great for small changes!
Hot swapping
Minimal impact on your application and database
Avoid Migrations
Handle changes in your application instead of your database
Performance
Avoid table collections scans by using indexes
> db.albums.ensureIndex({title: 1})
Compound indexes
Index on multiple fields
> db.albums.ensureIndex({title: 1, year: 1})
Indexes have their price
Every write takes longer
Max 64 indexes on a collection
Try to limit them
Indexes are useful as the number of records you want to return are limited
If you return >30% of a collection, check if a table scan is faster
Creating indexes
Aggregations with the Aggregation Framework
$project Select()
$unwind SelectMany()
$match Where()
$group GroupBy()
$sort OrderBy()
$skip Skip()
$limit Take()
Largely replaces the original Map/Reduce
Much faster!
Implemented in a multi-threaded C ++
No support in LINQ-provider yet (but in development)
Demo 5 – Optimizations
Conclusion
Benefits
Scalable: good for a lot of data & traffic
Horizontal scaling: to more nodes
Good for web-apps
Performance
No joins and constraints
Dev/user friendly
Data is modeled to how the app is going to use it
No conversion between object oriented > relational
No static schema = agile
Evolvable
Drawbacks
Forget what you have learned
New way of building and designing your application
Can collect garbage
No data integrity checks
Add a clean-up job
Database model is determined by usage
Requires insight in the usage
https://github.com/wullemsb/DemoTechoramaMongoDb
Things we didn’t talk about
Things we didn’t talk about…
 Security
- HTTPS/SSL
 Compile the code yourself
 Eventual Consistency
 Geospatial features
 Realtime Aggregation
Things we didn’t talk about…
 Many to Many
- Multiple approaches
 References on 1 site
 References on both sites
Things we didn’t talk about…
 Write Concerns
- Acknowledged vs Unacknowledged writes
- Stick with acknowledged writes(=default)
Things we didn’t talk about…
 GridFS disadvantages
- Slower performance: accessing files from
MongoDB will not be as fast as going directly
through the filesystem.
- You can only modify documents by deleting
them and resaving the whole thing.
- Drivers are required
Things we didn’t talk about…
 Schema Migrations
- Avoid it
- Make your app backwards compatible
- Add version field to your documents
Things we didn’t talk about…
 Why you should not use regexes
- Slow!
 Advanced Indexing
- Indexing objects and Arrays
- Unique vs Sparse Indexes
- Geospatial Indexes
- Full Text Indexes
 MapReduce
- Avoid it
- Very slow in MongoDB
- Use Aggregation FW instead
Things we didn’t talk about…
 Sharding
 Based on a shard key (= field)
 Commands are sent to the shard that includes
the relevant range of the data
 Data is evenly distributed across the shards
 Automatic reallocation of data when adding or
removing servers

Techorama - Evolvable Application Development with MongoDB

  • 1.
    1 Evolvable Application Developmentwith MongoDB Gerd Teniers Bart Wullems for .NET developers
  • 2.
    WARNING – Thissession is rated as a ‘Grandma session’ (=Level 200)
  • 3.
    3 goals ofthis presentations When you leave this presentation you should have learned How easy it is to get started using MongoDB How using MongoDB changes the way you design and build your applications How MongoDB’s flexibility supports evolutionary design That giving speakers beer before a session is never a good idea
  • 4.
    What is notcool? White socks & sandals
  • 5.
    What is notcool? Dancing like Miley Cyrus
  • 6.
    What is notcool? Relational databases
  • 7.
    What is cool? Shortpants and very large socks
  • 8.
  • 9.
    What is cool? NO-SQL(=Not Only SQL)
  • 10.
  • 11.
    Entity Framework 7will support No-SQL
  • 12.
  • 13.
  • 14.
    MongoDB HuMongous General purpose database Documentoriented database using JSON document syntax Features: - Flexibility - Power - Scaling - Ease of Use - Built-in Javascript Users: Craigslist, eBay, Foursquare, SourceForge, and The New York Times.
  • 15.
    Written in C++ Extensiveuse of memory-mapped files i.e. read-through write-through memory caching. Runs nearly everywhere Data serialized as BSON (fast parsing) Full support for primary & secondary indexes Document model = less work High Performance
  • 16.
    MongoDB Database Architecture:Document { _id: ObjectId("5099803df3f4948bd2f98391"), name: { first: "Alan", last: "Turing" }, birth: new Date('Jun 23, 1912'), death: new Date('Jun 07, 1954'), contribs: [ "Turing machine", "Turing test", "Turingery" ], views : NumberLong(1250000) }
  • 17.
    MongoDB Database Architecture:Collection Logical group of documents May or may not share same keys Schema is dynamic/application maintained
  • 18.
    Why should Iuse it?(or how do I convince my boss?) Developer productivity Avoid ORM pain, no mapping needed Performance(again) Scaling out is easy(or at least easier) Optimized for reads Flexibility Dynamic schema
  • 19.
    How to runit? Exe Windows service Azure 3rd party commercial hosting
  • 20.
    How to talkto it? Mongo shell Official and non official drivers >12 languages supported
  • 21.
    DEMO 1 -PROTOTYPING
  • 22.
  • 23.
    23 First step inany application is determine your domain/entities
  • 24.
    In a relationalbased app We would start by doing schema design
  • 25.
    In a MongoDBbased app We start building our app and let the schema evolve
  • 26.
    Comparison Album - id - artistid -title Track - no - name - unitPrice - popularity Artist - id - name Album - _id - title - artist - tracks[] - _id - name Relational Document db
  • 27.
  • 28.
    Modeling Start from application-specificqueries “What questions do I have?” vs “What answers” “Data like the application wants it” Base parent documents on The most common usage What do I want returned?
  • 29.
    Modeling Embedding vs Linkingvs Hybrid Album - _id - artist - cover - _id - name Artist - _id - name - photo
  • 30.
    Product Single collection inheritance Product -_id - price Book - author - title Album - artist - title Jeans - size - color - _id - price - author - title Relational Document db - _id - price - size - color
  • 31.
    Product Single collection inheritance Product -_id - price Book - author - title Album - artist - title Jeans - size - color _type: Book - _id - price - author - title Relational Document db _type: Jeans - _id - price - size - color
  • 32.
    One-to-many Embedded array /array keys Some queries get harder You can index arrays! Normalized approach More flexibility A lot less performance BlogPost - _id - content - tags: {“foo”, “bar”} - comments: {“id1”, “id2”}
  • 33.
    Demo 2 –MODELING
  • 34.
  • 35.
    CRUD operations Create: insert,save Read: find, findOne Update: update, save Delete: remove, drop
  • 36.
    ACID Transactions No supportfor multi-document transactions commit/rollback Atomic operations on document level Multiple actions inside the same document Incl. embedded documents By keeping transaction support extremely simple, MongoDB can provide greater performance especially for partitioned or replicated systems
  • 37.
  • 38.
  • 39.
    Storing binary documents AlthoughMongoDB is a document database, it’s not good for documents :-S Document != .PNG & .PDF files Document size is limited Max document size is 16MB Recommended document size <250KB Solution is GridFS Mechanism for storing large binary files in MongoDB Stores metadata in a single document inside the fs.files collection Splits files into chunks and stores them inside the fs.chunks collection GridFS implementation is handled completely by the client driver
  • 40.
    Demo 4 –Evolving your domain model ------------& GRIDFS
  • 41.
    Evolving your domainmodel Great for small changes! Hot swapping Minimal impact on your application and database Avoid Migrations Handle changes in your application instead of your database
  • 42.
  • 43.
    Avoid table collectionsscans by using indexes > db.albums.ensureIndex({title: 1}) Compound indexes Index on multiple fields > db.albums.ensureIndex({title: 1, year: 1}) Indexes have their price Every write takes longer Max 64 indexes on a collection Try to limit them Indexes are useful as the number of records you want to return are limited If you return >30% of a collection, check if a table scan is faster Creating indexes
  • 44.
    Aggregations with theAggregation Framework $project Select() $unwind SelectMany() $match Where() $group GroupBy() $sort OrderBy() $skip Skip() $limit Take() Largely replaces the original Map/Reduce Much faster! Implemented in a multi-threaded C ++ No support in LINQ-provider yet (but in development)
  • 45.
    Demo 5 –Optimizations
  • 46.
  • 47.
    Benefits Scalable: good fora lot of data & traffic Horizontal scaling: to more nodes Good for web-apps Performance No joins and constraints Dev/user friendly Data is modeled to how the app is going to use it No conversion between object oriented > relational No static schema = agile Evolvable
  • 48.
    Drawbacks Forget what youhave learned New way of building and designing your application Can collect garbage No data integrity checks Add a clean-up job Database model is determined by usage Requires insight in the usage
  • 49.
  • 50.
  • 51.
    Things we didn’ttalk about…  Security - HTTPS/SSL  Compile the code yourself  Eventual Consistency  Geospatial features  Realtime Aggregation
  • 52.
    Things we didn’ttalk about…  Many to Many - Multiple approaches  References on 1 site  References on both sites
  • 53.
    Things we didn’ttalk about…  Write Concerns - Acknowledged vs Unacknowledged writes - Stick with acknowledged writes(=default)
  • 54.
    Things we didn’ttalk about…  GridFS disadvantages - Slower performance: accessing files from MongoDB will not be as fast as going directly through the filesystem. - You can only modify documents by deleting them and resaving the whole thing. - Drivers are required
  • 55.
    Things we didn’ttalk about…  Schema Migrations - Avoid it - Make your app backwards compatible - Add version field to your documents
  • 56.
    Things we didn’ttalk about…  Why you should not use regexes - Slow!  Advanced Indexing - Indexing objects and Arrays - Unique vs Sparse Indexes - Geospatial Indexes - Full Text Indexes  MapReduce - Avoid it - Very slow in MongoDB - Use Aggregation FW instead
  • 57.
    Things we didn’ttalk about…  Sharding  Based on a shard key (= field)  Commands are sent to the shard that includes the relevant range of the data  Data is evenly distributed across the shards  Automatic reallocation of data when adding or removing servers

Editor's Notes

  • #27 Documents contain: Values Arrays Embedded docs