SlideShare a Scribd company logo
MongoDB
A NoSQL Document Oriented Database
Agenda
● RelationalDBs
● NoSQL
– What, Why
– Types
– History
– Features
– Types
● MongoDB
– Indexes
– Replication
– Sharding
– Querying
– Mapping
– MapReduce
● Use Case: RealNetworks
Relational DBs
● Born in the 70s
– storage is expensive
– schemas are simple
● Based on Relational Model
– Mathematical model for describing data structure
– Data represented in „tuples“, grouped into „relations“
● Queries based on Relational Algebra
– union, intersection, difference, cartesian product, selection,
projection, join, division
● Constraints
– Foreign Keys, Primary Keys, Indexes
– Domain Integrity (DataTypes)
MongoDB - A Document NoSQL Database
Joins
Relational Dbs
● Normalization
– minimize redundancy
– avoid duplication
Normalization
Relational DBs - Transactions
● Atomicity
– If one part of the transaction fails, the whole transaction fails
● Consistency
– Transaction leaves the DB in a valid state
● Isolation
– One transaction doesn't see an intermediate state of the other
● Durability
– Transaction gets persisted
Relational Dbs - Use
NoSQL – Why?
● Web2.0
– Huge DataVolumes
– Need for Speed
– Accesibility
● RDBMS are difficult to scale
● Storage gets cheap
● Commodity machines get cheap
NoSQL – What?
● Simple storage of data
● Looser consistency model (eventual consistency), in
order to achieve:
– higher availability
– horizontal scaling
● No JOINs
● Optimized for big data, when no relational features are
needed
Vertical Scale
Horizontal Scale
Vertical Scale
Horizontal Scale
Enforces parallel computing
Eventual Consistency
● RDBMS: all users see a consistent view
of the data
● ACID gets difficult when distributing
data across nodes
● Eventual Consistency: inconsistencies
are transitory. The DB may have some
inconsistencies at a point of time, but will
eventually get consistent.
● BASE (in contrast to ACID)– Basically
Available Soft-state Eventually
CAP Theorem
All nodes see
the same data
at the same time
Requests always
get an immediate response
System continues to work,
even if a part of it breaks
NoSQL - History
● Term first used in 1998 by C. Strozzi to name
his RelationalDB that didn't use SQL
● Term reused in 2009 by E.Evans to name the
distributed Dbs that didn't provide ACID
● Some people traduce it as „Not Only SQL“
● Should actually be called „NoRel“ (no
Relational)
NoSQL – Some Features
● Auto-Sharding
● Replication
● Caching
● Dynamic Schema
NoSQL - Types
● Document
– „Map“ key-value, with a „Document“ (xml, json, pdf, ..) as
value
– MongoDB, CouchDB
● Key-Value
– „Map“ key-value, with an „Object“ (Integer, String, Order, ..)
as value
– Cassandra, Dynamo, Voldemort
● Graph
– Data stored in a graph structure – nodes have pointer to
adjacent ones
– Neo4J
MongoDB
● OpenSource NoSQL Document DB written in
C++
● Started in 2009
● Commercial Support by 10gen
● From humongous (huge)
● http://www.mongodb.org/
MongoDB – Document Oriented
● No Document Structure - schemaless
● Atomicity: only at document level (no
transactions across documents)
● Normalization is not easy to achieve:
– Embed: +duplication, +performance
– Reference: -duplication, +roundtrips
MongoDB
●
> db.users.save(
{ name: 'ruben',
surname : 'inoto',
age : '36' } )
●
> db.users.find()
– { "_id" : ObjectId("519a3dd65f03c7847ca5f560"),
"name" : "ruben",
"surname" : "inoto",
"age" : "36" }
● > db.users.update(
{ name: 'ruben' },
{ $set: { 'age' : '24' } } )
Documents are stored in BSON format
MongoDB - Querying
● find(): Returns a cursor containing a number of documents
– All users
– db.users.find()
– User with id 42
– db.users.find({ _id: 42})
– Age between 20 and 30
– db.users.find( { age: { $gt: 20, $lt: 30 } } )
– Subdocuments: ZIP 5026
– db.users.find( { address.zip: 5026 } )
– OR: ruben or younger than 30
– db.users.find({ $or: [
{ name : "ruben" },
{ age: { $lt: 30 } }
]})
– Projection: Deliver only name and age
– db.users.find({ }, { name: 1, age: 1 })
{
"_id": 42,
"name": "ruben",
"surname": "inoto",
„age“: „36“,
"address": {
"street": "Glaserstraße",
"zip": "5026" }
}
MongoDB - Saving
● Insert
– db.test.save( { _id: "42", name: "ruben" } )
● Update
– db.test.update( { _id : "42" }, { name : "harald" } )
– db.test.update( { _id : "42" }, { name : "harald", age : 39 } )
● Atomic Operators ($inc)
– db.test.update( { _id : "42" }, { $inc: { age : 1 } } )
● Arrays
– { _id : "48", name : "david", hobbies : [ "bike", "judo" ] }
– Add element to array atomic ($push)
● db.test.update( { _id : "48" }, { $push: { hobbies : "swimming" } } )
– $each, $pop, $pull, $addToSet...
MongoDB - Delete
● db.test.remove ( { _id : „42“ } )
MongoDB – Indexes
● Indexes on any attribute
– > db.users.ensureIndex( { 'age' : 1 } )
● Compound indexes
– > db.users.ensureIndex( { 'age' : 1 }, { 'name':
1 } )
● Unique Indexes
● >v2.4 → Text Indexing (search)
SQL → Mongo Mapping (I)
SQL Statement Mongo Query Language
CREATE TABLE USERS (a Number, b
Number)
implicit
INSERT INTO USERS VALUES(1,1) db.users.insert({a:1,b:1})
SELECT a,b FROM users db.users.find({}, {a:1,b:1})
SELECT * FROM users db.users.find()
SELECT * FROM users WHERE age=33 db.users.find({age:33})
SELECT * FROM users WHERE age=33
ORDER BY name
db.users.find({age:33}).sort({name:1})
SQL → Mongo Mapping (I)
SQL Statement Mongo Query Language
SELECT * FROM users WHERE age>33 db.users.find({'age':{$gt:33}})})
CREATE INDEX myindexname ON
users(name)
db.users.ensureIndex({name:1})
SELECT * FROM users WHERE a=1 and
b='q'
db.users.find({a:1,b:'q'})
SELECT * FROM users LIMIT 10 SKIP 20 db.users.find().limit(10).skip(20)
SELECT * FROM users LIMIT 1 db.users.findOne()
EXPLAIN PLAN FOR SELECT * FROM users
WHERE z=3
db.users.find({z:3}).explain()
SELECT DISTINCT last_name FROM users db.users.distinct('last_name')
SELECT COUNT(*)
FROM users where AGE > 30
db.users.find({age: {'$gt': 30}}).count()
Embed vs Reference
Relational
Document
user: {
id: "1",
name: "ruben"
}
order: {
id: "a",
user_id: "1",
items: [ {
product_id: "x",
quantity: 10,
price: 300
},
{
product_id: "y",
quantity: 5,
price: 300
}]
}
referenced
embedded
MongoDB – Replication (I)
● Master-slave replication: primary and secondary nodes
● replica set: cluster of mongod instances that replicate amongst one
another and ensure automated failover
WriteConcern
MongoDB – Replication (II)
● adds redundancy
● helps to ensure high availability – automatic
failover
● simplifies backups
WriteConcerns
● Errors Ignored
– even network errors are ignored
● Unacknowledged
– at least network errors are handled
● Acknowledged
– constraints are handled (default)
● Journaled
– persisted to journal log
● Replica ACK
– 1..n
– Or 'majority'
MongoDB – Sharding (I)
● Scale Out
● Distributes data to nodes automatically
● Balances data and load accross machines
MongoDB – Sharding (II)
● A sharded Cluster is composed of:
– Shards: holds data.
● Either one mongod instance (primary daemon process –
handles data requests), or a replica set
– config Servers:
● mongod instance holding cluster metadata
– mongos instances:
● route application calls to the shards
● No single point of failure
MongoDB – Sharding (III)
MongoDB – Sharding (IV)
MongoDB – Sharding (V)
● Collection has a shard key: existing field(s) in
all documents
● Documents get distributed according to ranges
● In a shard, documents are partitioned into
chunks
● Mongo tries to keep all chunks at the same size
MongoDB – Sharding (VI)
● Shard Balancing
– When a shard has too many chunks, mongo moves
chunks to other shards
● Only makes sense with huge amount of data
Object Mappers
● C#, PHP, Scala, Erlang, Perl, Ruby
● Java
– Morphia
– Spring MongoDB
– mongo-jackson-mapper
– jongo
● ..
Jongo - Example
DB db = new MongoClient().getDB("jongo");
Jongo jongo = new Jongo(db);
MongoCollection users = jongo.getCollection("users");
User user = new User("ruben", "inoto", new Address("Musterstraße", "5026"));
users.save(user);
User ruben = users.findOne("{name: 'ruben'}").as(User.class);
public class User {
private String name;
private String surname;
private Address address;
public class Address {
private String street;
private String zip;
{
"_id" : ObjectId("51b0e1c4d78a1c14a26ada9e"),
"name" : "ruben",
"surname" : "inoto",
"address" : {
"street" : "Musterstraße",
"zip" : "5026"
}
}
TTL (TimeToLive)
● Data with an expiryDate
● After the specified TimeToLive, the data will be
removed from the DB
● Implemented as an Index
● Useful for logs, sessions, ..
db.broadcastMessages.ensureIndex( { "createdAt": 1 }, { expireAfterSeconds: 3600 } )
MapReduce
● Programming model for processing large data sets with a
parallel, distributed algorithm.
● Handles complex aggregation tasks
● Problem can be distributed in smaller tasks, distributed across
nodes
● map phase: selects the data
– Associates a value with a key and a value pair
– Values will be grouped by the key, and passed to the reduce function
● reduce phase: transforms the data
– Accepts two arguments: key and values
– Reduces to a single object all the values associated with the key
MapReduce
MapReduce Use Example
● Problem: Count how much money each
customer has paid in all its orders
Solution - Relational
select customer_id, sum(price * quantity)
from orders
group by customer_id
order_id customer_id price quantity
a 1 350 2
b 2 100 2
c 1 20 1
customer_id total
1 720
2 200
Solution - Sequential
var customerTotals = new Map();
for (Order order: orders) {
var newTotal = order.price * order.quantity;
if (customerTotals.containsKey(order.customerId)) {
newTotal += customerTotals.get(order.customerId);
}
customerTotals.put(order.customerId, newTotal);
}
[{
order_id: "a",
customer_id: "1",
price: 350,
quantity: 2
},
{
order_id: "b",
customer_id: "2",
price: 100,
quantity: 2
},
{
order_id: "c",
customer_id: "1",
price: 20,
quantity: 1
}]
{ „1“: 720 }
{ „2“: 200 }
Solution - MapReduce
db.orders.insert([
{
order_id: "a",
customer_id: "1",
price: 350
quantity: 2
},
{
order_id: "b",
customer_id: "2",
price: 100,
quantity: 2
},
{
order_id: "c",
customer_id: "1",
price: 20,
quantity: 1
}
]);
var mapOrders = function() {
var totalPrice = this.price * this.quantity;
emit(this.customer_id, totalPrice);
};
var reduceOrders = function(customerId, tempTotal) {
return Array.sum(tempTotal);
};
db.orders.mapReduce(
mapOrders,
reduceOrders,
{ out: "map_reduce_orders" }
);
> db.map_reduce_orders.find().pretty();
{ "_id" : "1", "value" : 720 }
{ "_id" : "2", "value" : 200 }
MapReduce
Who is using Mongo?
● Craigslist
● SourceForge
● Disney
● TheGuardian
● Forbes
● CERN
● ….
„Real“ Use Case – Android
Notifications
● App to send „notifications“ (messages) to devices
with an installed RealNetworks application (Music,
RBT)
● Scala, Scalatra, Lift, Jersey, Guice,
ProtocolBuffers
● MongoDB, Casbah, Salat
● Mongo Collections
– Devices: deviceId, msisdn, application
– Messages: message, audience
– SentMessages: deviceId, message, status
Criticism
● Loss of data
– Specially in a cluster
Conclusion
● Not a silver bullet
● Makes sense when:
– Eventual consistency is acceptable
– Prototyping
– Performance
– Object model doesn't suit in a Relational DB
● Easy to learn

More Related Content

What's hot (20)

PDF
Power-up services with gRPC
The Software House
 
PPTX
Microsoft Azure Big Data Analytics
Mark Kromer
 
PPT
Introduction to redis
Tanu Siwag
 
PPTX
Introduction to MongoDB
MongoDB
 
PPT
Introduction to mongodb
neela madheswari
 
PDF
MongoDB Fundamentals
MongoDB
 
PPTX
An Introduction To NoSQL & MongoDB
Lee Theobald
 
PPTX
The Basics of MongoDB
valuebound
 
PDF
Enterprise Java Beans - EJB
Peter R. Egli
 
PPTX
Introduction to Redis
Maarten Smeets
 
PDF
Introduction to React JS
Bethmi Gunasekara
 
PDF
PostgreSQL replication
NTT DATA OSS Professional Services
 
PDF
JavaScript Fetch API
Xcat Liu
 
PPTX
What Is Express JS?
Simplilearn
 
PDF
Introduction to Cassandra
Gokhan Atil
 
PPT
Introduction to databases
Dr Timothy Osadiya CITP., FIfL., ACII
 
PPTX
Document Database
Heman Hosainpana
 
PDF
Spring Framework - MVC
Dzmitry Naskou
 
PPTX
MongoDB
Anthony Slabinck
 
Power-up services with gRPC
The Software House
 
Microsoft Azure Big Data Analytics
Mark Kromer
 
Introduction to redis
Tanu Siwag
 
Introduction to MongoDB
MongoDB
 
Introduction to mongodb
neela madheswari
 
MongoDB Fundamentals
MongoDB
 
An Introduction To NoSQL & MongoDB
Lee Theobald
 
The Basics of MongoDB
valuebound
 
Enterprise Java Beans - EJB
Peter R. Egli
 
Introduction to Redis
Maarten Smeets
 
Introduction to React JS
Bethmi Gunasekara
 
PostgreSQL replication
NTT DATA OSS Professional Services
 
JavaScript Fetch API
Xcat Liu
 
What Is Express JS?
Simplilearn
 
Introduction to Cassandra
Gokhan Atil
 
Introduction to databases
Dr Timothy Osadiya CITP., FIfL., ACII
 
Document Database
Heman Hosainpana
 
Spring Framework - MVC
Dzmitry Naskou
 

Similar to MongoDB - A Document NoSQL Database (20)

PDF
Mongo DB schema design patterns
joergreichert
 
PDF
MongoDB.pdf
KuldeepKumar778733
 
PPTX
MongoDB_ppt.pptx
1AP18CS037ShirishKul
 
PPTX
Intro To Mongo Db
chriskite
 
KEY
Mongodb intro
christkv
 
PPT
9. Document Oriented Databases
Fabio Fumarola
 
KEY
MongoDB
Steven Francia
 
PPTX
MongoDB Knowledge share
Mr Kyaing
 
PPTX
MongoDB is a document database. It stores data in a type of JSON format calle...
amintafernandos
 
PPT
Tech Gupshup Meetup On MongoDB - 24/06/2016
Mukesh Tilokani
 
PDF
MongoDB for Coder Training (Coding Serbia 2013)
Uwe Printz
 
PPT
MongoDB Pros and Cons
johnrjenson
 
PPTX
Mongo db
Gyanendra Yadav
 
PPTX
Einführung in MongoDB
NETUserGroupBern
 
PDF
Building your first app with mongo db
MongoDB
 
PDF
Introduction to MongoDB
Mike Dirolf
 
PPT
Mongodb
Manav Prasad
 
PDF
MongoDB
techwhizbang
 
PPT
Mongo db tutorials
Anuj Jain
 
Mongo DB schema design patterns
joergreichert
 
MongoDB.pdf
KuldeepKumar778733
 
MongoDB_ppt.pptx
1AP18CS037ShirishKul
 
Intro To Mongo Db
chriskite
 
Mongodb intro
christkv
 
9. Document Oriented Databases
Fabio Fumarola
 
MongoDB Knowledge share
Mr Kyaing
 
MongoDB is a document database. It stores data in a type of JSON format calle...
amintafernandos
 
Tech Gupshup Meetup On MongoDB - 24/06/2016
Mukesh Tilokani
 
MongoDB for Coder Training (Coding Serbia 2013)
Uwe Printz
 
MongoDB Pros and Cons
johnrjenson
 
Mongo db
Gyanendra Yadav
 
Einführung in MongoDB
NETUserGroupBern
 
Building your first app with mongo db
MongoDB
 
Introduction to MongoDB
Mike Dirolf
 
Mongodb
Manav Prasad
 
MongoDB
techwhizbang
 
Mongo db tutorials
Anuj Jain
 
Ad

Recently uploaded (20)

PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
ICONIQ State of AI Report 2025 - The Builder's Playbook
Razin Mustafiz
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
🚀 Let’s Build Our First Slack Workflow! 🔧.pdf
SanjeetMishra29
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
ICONIQ State of AI Report 2025 - The Builder's Playbook
Razin Mustafiz
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
🚀 Let’s Build Our First Slack Workflow! 🔧.pdf
SanjeetMishra29
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
Ad

MongoDB - A Document NoSQL Database

  • 1. MongoDB A NoSQL Document Oriented Database
  • 2. Agenda ● RelationalDBs ● NoSQL – What, Why – Types – History – Features – Types ● MongoDB – Indexes – Replication – Sharding – Querying – Mapping – MapReduce ● Use Case: RealNetworks
  • 3. Relational DBs ● Born in the 70s – storage is expensive – schemas are simple ● Based on Relational Model – Mathematical model for describing data structure – Data represented in „tuples“, grouped into „relations“ ● Queries based on Relational Algebra – union, intersection, difference, cartesian product, selection, projection, join, division ● Constraints – Foreign Keys, Primary Keys, Indexes – Domain Integrity (DataTypes)
  • 6. Relational Dbs ● Normalization – minimize redundancy – avoid duplication
  • 8. Relational DBs - Transactions ● Atomicity – If one part of the transaction fails, the whole transaction fails ● Consistency – Transaction leaves the DB in a valid state ● Isolation – One transaction doesn't see an intermediate state of the other ● Durability – Transaction gets persisted
  • 10. NoSQL – Why? ● Web2.0 – Huge DataVolumes – Need for Speed – Accesibility ● RDBMS are difficult to scale ● Storage gets cheap ● Commodity machines get cheap
  • 11. NoSQL – What? ● Simple storage of data ● Looser consistency model (eventual consistency), in order to achieve: – higher availability – horizontal scaling ● No JOINs ● Optimized for big data, when no relational features are needed
  • 14. Eventual Consistency ● RDBMS: all users see a consistent view of the data ● ACID gets difficult when distributing data across nodes ● Eventual Consistency: inconsistencies are transitory. The DB may have some inconsistencies at a point of time, but will eventually get consistent. ● BASE (in contrast to ACID)– Basically Available Soft-state Eventually
  • 15. CAP Theorem All nodes see the same data at the same time Requests always get an immediate response System continues to work, even if a part of it breaks
  • 16. NoSQL - History ● Term first used in 1998 by C. Strozzi to name his RelationalDB that didn't use SQL ● Term reused in 2009 by E.Evans to name the distributed Dbs that didn't provide ACID ● Some people traduce it as „Not Only SQL“ ● Should actually be called „NoRel“ (no Relational)
  • 17. NoSQL – Some Features ● Auto-Sharding ● Replication ● Caching ● Dynamic Schema
  • 18. NoSQL - Types ● Document – „Map“ key-value, with a „Document“ (xml, json, pdf, ..) as value – MongoDB, CouchDB ● Key-Value – „Map“ key-value, with an „Object“ (Integer, String, Order, ..) as value – Cassandra, Dynamo, Voldemort ● Graph – Data stored in a graph structure – nodes have pointer to adjacent ones – Neo4J
  • 19. MongoDB ● OpenSource NoSQL Document DB written in C++ ● Started in 2009 ● Commercial Support by 10gen ● From humongous (huge) ● http://www.mongodb.org/
  • 20. MongoDB – Document Oriented ● No Document Structure - schemaless ● Atomicity: only at document level (no transactions across documents) ● Normalization is not easy to achieve: – Embed: +duplication, +performance – Reference: -duplication, +roundtrips
  • 21. MongoDB ● > db.users.save( { name: 'ruben', surname : 'inoto', age : '36' } ) ● > db.users.find() – { "_id" : ObjectId("519a3dd65f03c7847ca5f560"), "name" : "ruben", "surname" : "inoto", "age" : "36" } ● > db.users.update( { name: 'ruben' }, { $set: { 'age' : '24' } } ) Documents are stored in BSON format
  • 22. MongoDB - Querying ● find(): Returns a cursor containing a number of documents – All users – db.users.find() – User with id 42 – db.users.find({ _id: 42}) – Age between 20 and 30 – db.users.find( { age: { $gt: 20, $lt: 30 } } ) – Subdocuments: ZIP 5026 – db.users.find( { address.zip: 5026 } ) – OR: ruben or younger than 30 – db.users.find({ $or: [ { name : "ruben" }, { age: { $lt: 30 } } ]}) – Projection: Deliver only name and age – db.users.find({ }, { name: 1, age: 1 }) { "_id": 42, "name": "ruben", "surname": "inoto", „age“: „36“, "address": { "street": "Glaserstraße", "zip": "5026" } }
  • 23. MongoDB - Saving ● Insert – db.test.save( { _id: "42", name: "ruben" } ) ● Update – db.test.update( { _id : "42" }, { name : "harald" } ) – db.test.update( { _id : "42" }, { name : "harald", age : 39 } ) ● Atomic Operators ($inc) – db.test.update( { _id : "42" }, { $inc: { age : 1 } } ) ● Arrays – { _id : "48", name : "david", hobbies : [ "bike", "judo" ] } – Add element to array atomic ($push) ● db.test.update( { _id : "48" }, { $push: { hobbies : "swimming" } } ) – $each, $pop, $pull, $addToSet...
  • 24. MongoDB - Delete ● db.test.remove ( { _id : „42“ } )
  • 25. MongoDB – Indexes ● Indexes on any attribute – > db.users.ensureIndex( { 'age' : 1 } ) ● Compound indexes – > db.users.ensureIndex( { 'age' : 1 }, { 'name': 1 } ) ● Unique Indexes ● >v2.4 → Text Indexing (search)
  • 26. SQL → Mongo Mapping (I) SQL Statement Mongo Query Language CREATE TABLE USERS (a Number, b Number) implicit INSERT INTO USERS VALUES(1,1) db.users.insert({a:1,b:1}) SELECT a,b FROM users db.users.find({}, {a:1,b:1}) SELECT * FROM users db.users.find() SELECT * FROM users WHERE age=33 db.users.find({age:33}) SELECT * FROM users WHERE age=33 ORDER BY name db.users.find({age:33}).sort({name:1})
  • 27. SQL → Mongo Mapping (I) SQL Statement Mongo Query Language SELECT * FROM users WHERE age>33 db.users.find({'age':{$gt:33}})}) CREATE INDEX myindexname ON users(name) db.users.ensureIndex({name:1}) SELECT * FROM users WHERE a=1 and b='q' db.users.find({a:1,b:'q'}) SELECT * FROM users LIMIT 10 SKIP 20 db.users.find().limit(10).skip(20) SELECT * FROM users LIMIT 1 db.users.findOne() EXPLAIN PLAN FOR SELECT * FROM users WHERE z=3 db.users.find({z:3}).explain() SELECT DISTINCT last_name FROM users db.users.distinct('last_name') SELECT COUNT(*) FROM users where AGE > 30 db.users.find({age: {'$gt': 30}}).count()
  • 30. Document user: { id: "1", name: "ruben" } order: { id: "a", user_id: "1", items: [ { product_id: "x", quantity: 10, price: 300 }, { product_id: "y", quantity: 5, price: 300 }] } referenced embedded
  • 31. MongoDB – Replication (I) ● Master-slave replication: primary and secondary nodes ● replica set: cluster of mongod instances that replicate amongst one another and ensure automated failover WriteConcern
  • 32. MongoDB – Replication (II) ● adds redundancy ● helps to ensure high availability – automatic failover ● simplifies backups
  • 33. WriteConcerns ● Errors Ignored – even network errors are ignored ● Unacknowledged – at least network errors are handled ● Acknowledged – constraints are handled (default) ● Journaled – persisted to journal log ● Replica ACK – 1..n – Or 'majority'
  • 34. MongoDB – Sharding (I) ● Scale Out ● Distributes data to nodes automatically ● Balances data and load accross machines
  • 35. MongoDB – Sharding (II) ● A sharded Cluster is composed of: – Shards: holds data. ● Either one mongod instance (primary daemon process – handles data requests), or a replica set – config Servers: ● mongod instance holding cluster metadata – mongos instances: ● route application calls to the shards ● No single point of failure
  • 38. MongoDB – Sharding (V) ● Collection has a shard key: existing field(s) in all documents ● Documents get distributed according to ranges ● In a shard, documents are partitioned into chunks ● Mongo tries to keep all chunks at the same size
  • 39. MongoDB – Sharding (VI) ● Shard Balancing – When a shard has too many chunks, mongo moves chunks to other shards ● Only makes sense with huge amount of data
  • 40. Object Mappers ● C#, PHP, Scala, Erlang, Perl, Ruby ● Java – Morphia – Spring MongoDB – mongo-jackson-mapper – jongo ● ..
  • 41. Jongo - Example DB db = new MongoClient().getDB("jongo"); Jongo jongo = new Jongo(db); MongoCollection users = jongo.getCollection("users"); User user = new User("ruben", "inoto", new Address("Musterstraße", "5026")); users.save(user); User ruben = users.findOne("{name: 'ruben'}").as(User.class); public class User { private String name; private String surname; private Address address; public class Address { private String street; private String zip; { "_id" : ObjectId("51b0e1c4d78a1c14a26ada9e"), "name" : "ruben", "surname" : "inoto", "address" : { "street" : "Musterstraße", "zip" : "5026" } }
  • 42. TTL (TimeToLive) ● Data with an expiryDate ● After the specified TimeToLive, the data will be removed from the DB ● Implemented as an Index ● Useful for logs, sessions, .. db.broadcastMessages.ensureIndex( { "createdAt": 1 }, { expireAfterSeconds: 3600 } )
  • 43. MapReduce ● Programming model for processing large data sets with a parallel, distributed algorithm. ● Handles complex aggregation tasks ● Problem can be distributed in smaller tasks, distributed across nodes ● map phase: selects the data – Associates a value with a key and a value pair – Values will be grouped by the key, and passed to the reduce function ● reduce phase: transforms the data – Accepts two arguments: key and values – Reduces to a single object all the values associated with the key
  • 45. MapReduce Use Example ● Problem: Count how much money each customer has paid in all its orders
  • 46. Solution - Relational select customer_id, sum(price * quantity) from orders group by customer_id order_id customer_id price quantity a 1 350 2 b 2 100 2 c 1 20 1 customer_id total 1 720 2 200
  • 47. Solution - Sequential var customerTotals = new Map(); for (Order order: orders) { var newTotal = order.price * order.quantity; if (customerTotals.containsKey(order.customerId)) { newTotal += customerTotals.get(order.customerId); } customerTotals.put(order.customerId, newTotal); } [{ order_id: "a", customer_id: "1", price: 350, quantity: 2 }, { order_id: "b", customer_id: "2", price: 100, quantity: 2 }, { order_id: "c", customer_id: "1", price: 20, quantity: 1 }] { „1“: 720 } { „2“: 200 }
  • 48. Solution - MapReduce db.orders.insert([ { order_id: "a", customer_id: "1", price: 350 quantity: 2 }, { order_id: "b", customer_id: "2", price: 100, quantity: 2 }, { order_id: "c", customer_id: "1", price: 20, quantity: 1 } ]); var mapOrders = function() { var totalPrice = this.price * this.quantity; emit(this.customer_id, totalPrice); }; var reduceOrders = function(customerId, tempTotal) { return Array.sum(tempTotal); }; db.orders.mapReduce( mapOrders, reduceOrders, { out: "map_reduce_orders" } ); > db.map_reduce_orders.find().pretty(); { "_id" : "1", "value" : 720 } { "_id" : "2", "value" : 200 }
  • 50. Who is using Mongo? ● Craigslist ● SourceForge ● Disney ● TheGuardian ● Forbes ● CERN ● ….
  • 51. „Real“ Use Case – Android Notifications ● App to send „notifications“ (messages) to devices with an installed RealNetworks application (Music, RBT) ● Scala, Scalatra, Lift, Jersey, Guice, ProtocolBuffers ● MongoDB, Casbah, Salat ● Mongo Collections – Devices: deviceId, msisdn, application – Messages: message, audience – SentMessages: deviceId, message, status
  • 52. Criticism ● Loss of data – Specially in a cluster
  • 53. Conclusion ● Not a silver bullet ● Makes sense when: – Eventual consistency is acceptable – Prototyping – Performance – Object model doesn't suit in a Relational DB ● Easy to learn