CMPT 842(Mobile and Cloud Computing)
NoSQL Basics and MongoDB
Shamima Yeasmin
PhD Student, Software Research Lab, Computer Science, University of Saskatchewan.
Contents
NoSQL Basics
 NoSQL Definition
 Why NoSQL?
 RDBMS vs NoSQL
 Types of NoSQL
 NoSQL pros and cons
MongoDB
 MongoDB Features
 MongoDB Nexus Architecture
 MongoDB Data Model
 MongoDB Query Model
 Indexing
 MongoDB Data ManageMent
 Working Example
2
What is NoSQL?
 NoSQL database, also called Not Only SQL, is an approach to data management and
database design that's useful for very large sets of distributed data.
 This database system is non-relational, distributed, open-source and horizontally
scalable.
 NoSQL, which encompasses a wide range of technologies and architectures, seeks to
solve the scalability and big data performance issues that relational databases weren’t
designed to address.
 NoSQL does not prohibit structured query language (SQL). Some NoSQL systems are
entirely non-relational, others simply avoid selected relational functionality such as fixed
table schemas and join operations.
 Popular NoSQL database is Apache Cassandra, SimpleDB, Google BigTable, Apache
Hadoop, MapReduce, MemcacheDB, and Voldemort.
3
http://www.slideshare.net/Dataversity/nosql-now-nosql-architecture-patterns-23589170
Why NoSQL ?
 In today’s world the velocity and nature of data
used/generated over the Internet is growing
exponentially.
 In areas like social media, the data has no specific
structure boundary.
 In order to handle unstructured data which is non-
relational and schema-less in nature, it becomes a real
challenge for RDBMS to provide the cost effective and
fast CRUD operation as it has to deal with the overhead
of joins and maintaining relationships amongst various
data.
 This is where NoSQL comes into the picture to handle
unstructured BIG data in an efficient way to provide
maximum business value and customer satisfaction.
4
http://www.w3resource.com/mongodb/nosql.php
Brief History of NoSQL
 The name “NoSQL” was in fact first used by Carlo Strozzi in 1998 as the name of
file-based database he was developing. Ironically it’s relational database just one
without a SQL interface. As such it is not actually a part of the whole NoSQL
movement we see today.
 The term re-surfaced in 2009 when Eric Evans used it to name the current surge of
covering a collection of open-source distributed databases in non-relational
databases. It seems like the name has stuck for better or for worse.
 Based on 2014 revenue, the NoSQL market leaders are MarkLogic, MongoDB, and
Datastax.
 Based on 2015 popularity rankings, the most popular NoSQL databases are
MongoDB, Apache Cassandra, and Redis.
5
http://www.w3resource.com/mongodb/nosql.php
RDBMS vs NoSQL
 RDBMS
- Structured and organized data
- Structured query language (SQL)
- Data and its relationships are
stored in separate tables.
- Data Manipulation Language, Data
Definition Language
- Tight Consistency
- Follow the ACID property
 NoSQL
- Stands for Not Only SQL
- No declarative query language
- No predefined schema
- Key-Value pair storage, Column
Store, Document Store, Graph
databases
- Eventual consistency rather ACID
property
- Unstructured and unpredictable
data
- CAP Theorem
- Prioritizes high performance, high
availability and scalability
7
http://www.w3resource.com/mongodb/nosql.php
ACID Paradigm (RDBMS)
 Atomic: All operations of a transaction are executed, or none is.
 Consistent: At the end of the transaction, all data must be left in a consistent state.
 Isolated: Modifications of data performed by a transaction must be independent of
another transaction.
 Durability: Durability refers to the guarantee that once the user has been notified of
success, the transaction will persist and not be undone.
8
http://www.w3resource.com/mongodb/nosql.php
CAP Theorem (NoSQL)
 Eric Brewer formulates the CAP theorem whose properties are used by BASE
System.
 The CAP theorem states that a distributed computer system cannot guarantee all of
the following three properties at the same time:
 Consistency (C) – once data is written, all future read requests will contain that data
 Availability (A)– the database is always available and responsive
 Partition tolerance (P) – if one part of the database is unavailable, other parts are
unaffected
 Brewer originally described this impossibility result as forcing a choice of “two out
of the three” CAP properties: CP, AP and CA
9
http://www.w3resource.com/mongodb/nosql.php
CAP Theorem10
http://www.w3resource.com/mongodb/nosql.php
BASE System (NoSQL)
 A BASE system gives up on consistency so as to have greater Availability and
Partition tolerance. A BASE can be defined as following:
 Basically Available indicates that the system does guarantee availability.
 Soft state indicates that the state of the system may change over time, even without
input. This is because of the eventual consistency model.
 Eventual consistency indicates that the system will become consistent over time, given
that the system doesn’t receive input during that time.
11
http://www.w3resource.com/mongodb/nosql.php
NoSQL Database Types
 Key-value stores
 Column-oriented databases
 Graph databases
 Document Oriented databases
12
http://www.slideshare.net/Dataversity/nosql-now-nosql-architecture-patterns-23589170
1. Key-value stores
 The key-value model is the simplest and easiest to
implement.
 It is a schema-less construct.
 This model contains a key along with a piece of
associated data or object as value.
 Key-Value stores follows the 'Availability' and 'Partition'
aspects of CAP theorem.
 Key-Value stores can be used as collections,
dictionaries, associative arrays etc.
 Pros: Scalable, Simple API (put, get, delete).
 Cons: No way to query based on the content of the
value.
 Example Databases:
• Riak
• Redis
• Amazon’s DynamoDB
13
http://www.slideshare.net/Dataversity/nosql-now-nosql-architecture-patterns-23589170
2. Column-oriented databases
 These were created to store and process very
large amounts of data distributed over many
machines.
 There are still keys but they point to multiple
columns.
 The columns are arranged by column family.
 Pros: Good Scale out, Versioning.
 Cons: Row and column designs are critical.
 Example Databases:
 BigTable
 Hbase
 Cassandra
http://www.slideshare.net/Dataversity/nosql-now-nosql-architecture-patterns-23589170
14
Key in Column-oriented databases
Spreadsheets
 Spreadsheets use a Row/Column as a key
BigTable
 Bigtable systems use a combination of
row and column information as a part of
their key.
 Key also include timestamps, which allows
multiple versions of data.
 Values are just ordered bytes.
http://www.slideshare.net/Dataversity/nosql-now-nosql-architecture-patterns-23589170
15
3. Graph databases
 A graph database is a collection of nodes and
edges.
 Each node represents an entity (such as a
student or business) and each edge
represents a connection or relationship
between two nodes.
 Query are really graph traversal.
 Ideal when relationships between data are
keys: Social Networks.
 Pros: First network search.
 Cons: Poor scalability when graphs do not fit
into RAM.
 Example Databases:
• Neo4j
• OrientDB
• AllegroGraph
16
http://www.slideshare.net/Dataversity/nosql-now-nosql-architecture-patterns-23589170 and http://www.w3resource.com/mongodb/nosql.php
Graph Creation in Graph databases
 Nodes are joined to create graph
17
http://www.slideshare.net/Dataversity/nosql-now-nosql-architecture-patterns-23589170
Terms Comparison between the classic
relational model and the graph model
http://www.w3resource.com/mongodb/nosql.php
18
4. Document Oriented Databases
 A collection of documents and data in this model is stored
inside documents.
 Document databases are essentially the next level of key-
value, allowing nested values associated with each key.
 The semi-structured documents are stored in formats like
JSON or XML.
 Document databases support querying more efficiently.
 Documents are not typically forced to have a schema and
therefore are flexible and easy to change.
 Documents are stored into collections in order to group
different kinds of data.
 Pros: No object-relational mapping, ideal for research.
 Cons: Complex to implement and incompatible with SQL.
 Example Databases:
• MongoDB
• CouchDB
• MarkLogic
http://www.slideshare.net/Dataversity/nosql-now-nosql-architecture-patterns-23589170 http://www.w3resource.com/mongodb/nosql.php and
http://www.3pillarglobal.com/insights/short-history-databases-rdbms-nosql-beyond
19
Object Relational Mapping or not
Object Relational Mapping
 T1 – HTML into object
 T2 – Object into SQL table
 T3 – Table into object
 T4 – Object into HTML
Document Store
 Documents in the application
 Documents in the database
 No object middle tier
 No “shredding”
 Simple
http://www.slideshare.net/Dataversity/nosql-now-nosql-architecture-patterns-23589170
20
NoSQL pros and cons
Pros
 High scalability
 Distributed Computing
 Lower cost
 Schema flexibility
 Un/semi-structured data
 No complex relationships
 No join operations
Cons
 No standardization
 Limited query capabilities (so far)
http://www.3pillarglobal.com/insights/short-history-databases-rdbms-nosql-beyond
21
What is MongoDB?
 Document-oriented database
 Uses JSON (BSON actually)
 Schema-free
 Performant
 Written in C++
 Full index support
 No transactions (has atomic operation)
 Memory-mapped files(delayed writes)
 Scalable
 Replication
 Auto Sharding
 Commercially Supported
http://www.slideshare.net/ChrisEdwards357/updated-introduction-to-mongodb?qid=03b2845e-0dbc-455e-b6aa-
02fac97dd646&v=qf1&b=&from_search=10
22
Other Features of MongoDB
 Fast, Iterative Development: A flexible data model coupled with dynamic schema and idiomatic drivers make
it fast for developers to build and evolve applications.
 Flexible Data Model: MongoDB's document data model makes it easy for you to store and combine data of
any structure, without giving up sophisticated data access and rich indexing functionality.
 Pluggable Storage Architecture: Users can leverage the same MongoDB query language, data model,
scaling, security and operational tooling across different applications, each powered by different pluggable
MongoDB storage engines.
 Multi-Datacenter Scalability: MongoDB can be scaled within and across multiple distributed data centers,
providing new levels of availability and scalability.
 Integrated Feature Set: Analytics, text search, geospatial, in-memory performance and global replication
allow you to deliver a wide variety of real-time applications on one technology, reliably and securely.
 Lower TCO: MongoDB runs on commodity hardware, dramatically lowering costs.
 Long-Term Commitment: MongoDB Inc and the MongoDB ecosystem stand behind the world's fastest-
growing database. 10M+ downloads. 2,000+ customers including more than 1/3rd of the Fortune 100.
1,000+ partners.
https://www.mongodb.com/mongodb-architecture
23
MongoDB Nexus Architecture
 MongoDB’s design philosophy is focused on combining
 the critical capabilities of relational databases
 the innovations of NoSQL technologies
https://www.mongodb.com/mongodb-architecture
24
MongoDB Nexus Architecture
Relational Database
 Expressive query language
 Secondary indexes
 Strong consistency
NoSQL
 Flexible Data Model
 Elastic Scalability
 High Performance
https://www.mongodb.com/mongodb-architecture
25
MongoDB Data Model
 Data As Documents
 MongoDB stores data as documents in a
binary representation called BSON (Binary
JSON).
 BSON documents contain one or more fields,
and each field contains a value of a specific
data type, including arrays, binary data and
sub-documents.
 Documents that tend to share a similar
structure are organized as collections.
 Dynamic Schema
 Fields can vary from document to
document.
 There is no need to declare the structure
of documents to the system – documents
are self describing.
 MongoDB continues to store the updated
objects without the need for performing
costly ALTER_TABLE operations
 Schema Design
 Although MongoDB provides schema
flexibility, schema design is still important
https://www.mongodb.com/mongodb-architecture
26
RDBMS MongoDB
Table Collection
Row Document
Column Field
An Example Data Model for a Blogging
Application
Relational Data Model MongoDB Data Model
https://www.mongodb.com/mongodb-architecture
27
MongoDB Query Model
 Idiomatic Drivers
 Query types
 Indexing
https://www.mongodb.com/mongodb-architecture
28
Idiomatic Drivers
 MongoDB provides native drivers for all popular programming languages and
frameworks to make development natural.
 Supported drivers include
 Java
 .NET
 Ruby
 PHP
 JavaScript
 node.js
 Python
 Perl
 Scala and others.
https://www.mongodb.com/mongodb-architecture
29
Query Types
 Key-value queries
 Range queries
 Geospatial queries
 Text Search queries
 Aggregation Framework queries
 MapReduce queries
https://www.mongodb.com/mongodb-architecture
30
Query Types
 Key-value queries
http://howtodoinjava.com/2014/05/29/mongodb-selectqueryfind-documents-examples/
31
Query Types
 Range queries
http://howtodoinjava.com/2014/05/29/mongodb-selectqueryfind-documents-examples/
32
Indexing
 Indexes are a crucial mechanism for optimizing system performance and scalability while providing
flexible access to the data.
 Unique Indexes: By specifying an index as unique, MongoDB will reject inserts of new documents or
the update of a document with an existing value.
 Compound Indexes: It can be useful to create compound indexes for queries that specify multiple
predicates.
 Array Indexes: For fields that contain an array, each array value is stored as a separate index entry
 TTL Indexes: Time to Live (TTL) indexes allow the user to specify a period of time after which the data
will automatically be deleted from the database.
 Geospatial Indexes: MongoDB provides geospatial indexes to optimize queries related to location
within a two dimensional space, such as projection systems for the earth.
 Sparse Indexes: It only contain entries for documents that contain the specified field.
 Text Search Indexes: MongoDB provides a specialized index for text search that uses advanced,
language-specific linguistic rules for stemming, tokenization and stop words.
https://www.mongodb.com/mongodb-architecture
33
MongoDB Data Management
 Auto-sharding for linear scalability
 Pluggable storage architecture for application flexibility
 Storage efficiency witih compression
https://www.mongodb.com/mongodb-architecture
34
Auto-sharding for Linear Scalability
 Sharding distributes data across multiple physical partitions called shards.
 Sharding allows MongoDB deployments to address the hardware limitations of a
single server, such as bottlenecks in RAM or disk I/O.
 Unlike relational databases, sharding is automatic and built into the database.
 Developers don't face the complexity of building sharding logic into their
application code
https://www.mongodb.com/mongodb-architecture
35
Auto-sharding for Linear Scalability
 Multiple sharding policies available – hash-based, range-based and location-based.
 Range-based Sharding. Documents with shard key values close to one another are likely
to be co-located on the same shard. This approach is well suited for applications that
need to optimize range based queries.
 Hash-based Sharding. Documents are distributed according to an MD5 hash of the shard
key value. This approach guarantees a uniform distribution of writes across shards, but is
less optimal for range-based queries.
 Location-based Sharding. Documents are partitioned according to a user-specified
configuration that associates shard key ranges with specific shards and hardware.
https://www.mongodb.com/mongodb-architecture
36
Pluggable storage architecture for
application flexibility
 Through the use of a pluggable storage architecture, MongoDB can be extended
with new capabilities, and configured for optimal use of specific hardware
architectures.
https://www.mongodb.com/mongodb-architecture
37
Storage efficiency witih compression
 MongoDB supports native compression when configured with the WiredTiger
storage engine, reducing physical storage footprint by as much as 80%.
 In addition to reduced storage space, compression enables much higher storage
I/O scalability as fewer bits are read from disk.
 Administrators have the flexibility to configure specific compression algorithms for
collections, indexes.
https://www.mongodb.com/mongodb-architecture
38
MongoDB Consistency & Availability
 Transaction model
 The ACID guarantees provided by MongoDB ensure complete isolation as a document is
updated.
 Replica sets
 MongoDB maintains multiple copies of data called replica sets using native replication. A
replica set is a fully self-healing shard that helps prevent database downtime.
 In-memory performance with on-disk capacity
 MongoDB makes extensive use of RAM to speed up database operations. In MongoDB,
all data is read and manipulated through memory-mapped files.
 Security
 Authentication, Authorization, Auditing and Encryption.
https://www.mongodb.com/mongodb-architecture
39
Working Example: Using MongDB Java
Driver on Mac OS X
 Instructions to install MongDB:
 Download mongodb-osx-x86_64-3.0.7.tgz file and extract it.
 Copy it into /usr/local/mongdb
 Go to terminal into this directory and command the followings
 export PATH=<mongodb-install-directory>/bin:$PATH
 sudo chown -R $USER /data/db
 Mongod
 Coding in Java
 You need to download the jar from the path Download mongo.jar.
https://oss.sonatype.org/content/repositories/releases/org/mongodb/mongo-java-
driver/3.1.1/
 You need to include the mongo.jar into your classpath.
http://www.slideshare.net/Dataversity/nosql-now-nosql-architecture-patterns-23589170
40
Java Code
http://www.tutorialspoint.com/mongodb/mongodb_java.htm
41
Include these import
statements.
Database Connectivity
Insertion
Output42
Summary
 NoSQL, its characteristics and its types.
 MongoDB, its characteristics and working example with MongoDB java.
43

NoSQL Basics and MongDB

  • 1.
    CMPT 842(Mobile andCloud Computing) NoSQL Basics and MongoDB Shamima Yeasmin PhD Student, Software Research Lab, Computer Science, University of Saskatchewan.
  • 2.
    Contents NoSQL Basics  NoSQLDefinition  Why NoSQL?  RDBMS vs NoSQL  Types of NoSQL  NoSQL pros and cons MongoDB  MongoDB Features  MongoDB Nexus Architecture  MongoDB Data Model  MongoDB Query Model  Indexing  MongoDB Data ManageMent  Working Example 2
  • 3.
    What is NoSQL? NoSQL database, also called Not Only SQL, is an approach to data management and database design that's useful for very large sets of distributed data.  This database system is non-relational, distributed, open-source and horizontally scalable.  NoSQL, which encompasses a wide range of technologies and architectures, seeks to solve the scalability and big data performance issues that relational databases weren’t designed to address.  NoSQL does not prohibit structured query language (SQL). Some NoSQL systems are entirely non-relational, others simply avoid selected relational functionality such as fixed table schemas and join operations.  Popular NoSQL database is Apache Cassandra, SimpleDB, Google BigTable, Apache Hadoop, MapReduce, MemcacheDB, and Voldemort. 3 http://www.slideshare.net/Dataversity/nosql-now-nosql-architecture-patterns-23589170
  • 4.
    Why NoSQL ? In today’s world the velocity and nature of data used/generated over the Internet is growing exponentially.  In areas like social media, the data has no specific structure boundary.  In order to handle unstructured data which is non- relational and schema-less in nature, it becomes a real challenge for RDBMS to provide the cost effective and fast CRUD operation as it has to deal with the overhead of joins and maintaining relationships amongst various data.  This is where NoSQL comes into the picture to handle unstructured BIG data in an efficient way to provide maximum business value and customer satisfaction. 4 http://www.w3resource.com/mongodb/nosql.php
  • 5.
    Brief History ofNoSQL  The name “NoSQL” was in fact first used by Carlo Strozzi in 1998 as the name of file-based database he was developing. Ironically it’s relational database just one without a SQL interface. As such it is not actually a part of the whole NoSQL movement we see today.  The term re-surfaced in 2009 when Eric Evans used it to name the current surge of covering a collection of open-source distributed databases in non-relational databases. It seems like the name has stuck for better or for worse.  Based on 2014 revenue, the NoSQL market leaders are MarkLogic, MongoDB, and Datastax.  Based on 2015 popularity rankings, the most popular NoSQL databases are MongoDB, Apache Cassandra, and Redis. 5 http://www.w3resource.com/mongodb/nosql.php
  • 6.
    RDBMS vs NoSQL RDBMS - Structured and organized data - Structured query language (SQL) - Data and its relationships are stored in separate tables. - Data Manipulation Language, Data Definition Language - Tight Consistency - Follow the ACID property  NoSQL - Stands for Not Only SQL - No declarative query language - No predefined schema - Key-Value pair storage, Column Store, Document Store, Graph databases - Eventual consistency rather ACID property - Unstructured and unpredictable data - CAP Theorem - Prioritizes high performance, high availability and scalability 7 http://www.w3resource.com/mongodb/nosql.php
  • 7.
    ACID Paradigm (RDBMS) Atomic: All operations of a transaction are executed, or none is.  Consistent: At the end of the transaction, all data must be left in a consistent state.  Isolated: Modifications of data performed by a transaction must be independent of another transaction.  Durability: Durability refers to the guarantee that once the user has been notified of success, the transaction will persist and not be undone. 8 http://www.w3resource.com/mongodb/nosql.php
  • 8.
    CAP Theorem (NoSQL) Eric Brewer formulates the CAP theorem whose properties are used by BASE System.  The CAP theorem states that a distributed computer system cannot guarantee all of the following three properties at the same time:  Consistency (C) – once data is written, all future read requests will contain that data  Availability (A)– the database is always available and responsive  Partition tolerance (P) – if one part of the database is unavailable, other parts are unaffected  Brewer originally described this impossibility result as forcing a choice of “two out of the three” CAP properties: CP, AP and CA 9 http://www.w3resource.com/mongodb/nosql.php
  • 9.
  • 10.
    BASE System (NoSQL) A BASE system gives up on consistency so as to have greater Availability and Partition tolerance. A BASE can be defined as following:  Basically Available indicates that the system does guarantee availability.  Soft state indicates that the state of the system may change over time, even without input. This is because of the eventual consistency model.  Eventual consistency indicates that the system will become consistent over time, given that the system doesn’t receive input during that time. 11 http://www.w3resource.com/mongodb/nosql.php
  • 11.
    NoSQL Database Types Key-value stores  Column-oriented databases  Graph databases  Document Oriented databases 12 http://www.slideshare.net/Dataversity/nosql-now-nosql-architecture-patterns-23589170
  • 12.
    1. Key-value stores The key-value model is the simplest and easiest to implement.  It is a schema-less construct.  This model contains a key along with a piece of associated data or object as value.  Key-Value stores follows the 'Availability' and 'Partition' aspects of CAP theorem.  Key-Value stores can be used as collections, dictionaries, associative arrays etc.  Pros: Scalable, Simple API (put, get, delete).  Cons: No way to query based on the content of the value.  Example Databases: • Riak • Redis • Amazon’s DynamoDB 13 http://www.slideshare.net/Dataversity/nosql-now-nosql-architecture-patterns-23589170
  • 13.
    2. Column-oriented databases These were created to store and process very large amounts of data distributed over many machines.  There are still keys but they point to multiple columns.  The columns are arranged by column family.  Pros: Good Scale out, Versioning.  Cons: Row and column designs are critical.  Example Databases:  BigTable  Hbase  Cassandra http://www.slideshare.net/Dataversity/nosql-now-nosql-architecture-patterns-23589170 14
  • 14.
    Key in Column-orienteddatabases Spreadsheets  Spreadsheets use a Row/Column as a key BigTable  Bigtable systems use a combination of row and column information as a part of their key.  Key also include timestamps, which allows multiple versions of data.  Values are just ordered bytes. http://www.slideshare.net/Dataversity/nosql-now-nosql-architecture-patterns-23589170 15
  • 15.
    3. Graph databases A graph database is a collection of nodes and edges.  Each node represents an entity (such as a student or business) and each edge represents a connection or relationship between two nodes.  Query are really graph traversal.  Ideal when relationships between data are keys: Social Networks.  Pros: First network search.  Cons: Poor scalability when graphs do not fit into RAM.  Example Databases: • Neo4j • OrientDB • AllegroGraph 16 http://www.slideshare.net/Dataversity/nosql-now-nosql-architecture-patterns-23589170 and http://www.w3resource.com/mongodb/nosql.php
  • 16.
    Graph Creation inGraph databases  Nodes are joined to create graph 17 http://www.slideshare.net/Dataversity/nosql-now-nosql-architecture-patterns-23589170
  • 17.
    Terms Comparison betweenthe classic relational model and the graph model http://www.w3resource.com/mongodb/nosql.php 18
  • 18.
    4. Document OrientedDatabases  A collection of documents and data in this model is stored inside documents.  Document databases are essentially the next level of key- value, allowing nested values associated with each key.  The semi-structured documents are stored in formats like JSON or XML.  Document databases support querying more efficiently.  Documents are not typically forced to have a schema and therefore are flexible and easy to change.  Documents are stored into collections in order to group different kinds of data.  Pros: No object-relational mapping, ideal for research.  Cons: Complex to implement and incompatible with SQL.  Example Databases: • MongoDB • CouchDB • MarkLogic http://www.slideshare.net/Dataversity/nosql-now-nosql-architecture-patterns-23589170 http://www.w3resource.com/mongodb/nosql.php and http://www.3pillarglobal.com/insights/short-history-databases-rdbms-nosql-beyond 19
  • 19.
    Object Relational Mappingor not Object Relational Mapping  T1 – HTML into object  T2 – Object into SQL table  T3 – Table into object  T4 – Object into HTML Document Store  Documents in the application  Documents in the database  No object middle tier  No “shredding”  Simple http://www.slideshare.net/Dataversity/nosql-now-nosql-architecture-patterns-23589170 20
  • 20.
    NoSQL pros andcons Pros  High scalability  Distributed Computing  Lower cost  Schema flexibility  Un/semi-structured data  No complex relationships  No join operations Cons  No standardization  Limited query capabilities (so far) http://www.3pillarglobal.com/insights/short-history-databases-rdbms-nosql-beyond 21
  • 21.
    What is MongoDB? Document-oriented database  Uses JSON (BSON actually)  Schema-free  Performant  Written in C++  Full index support  No transactions (has atomic operation)  Memory-mapped files(delayed writes)  Scalable  Replication  Auto Sharding  Commercially Supported http://www.slideshare.net/ChrisEdwards357/updated-introduction-to-mongodb?qid=03b2845e-0dbc-455e-b6aa- 02fac97dd646&v=qf1&b=&from_search=10 22
  • 22.
    Other Features ofMongoDB  Fast, Iterative Development: A flexible data model coupled with dynamic schema and idiomatic drivers make it fast for developers to build and evolve applications.  Flexible Data Model: MongoDB's document data model makes it easy for you to store and combine data of any structure, without giving up sophisticated data access and rich indexing functionality.  Pluggable Storage Architecture: Users can leverage the same MongoDB query language, data model, scaling, security and operational tooling across different applications, each powered by different pluggable MongoDB storage engines.  Multi-Datacenter Scalability: MongoDB can be scaled within and across multiple distributed data centers, providing new levels of availability and scalability.  Integrated Feature Set: Analytics, text search, geospatial, in-memory performance and global replication allow you to deliver a wide variety of real-time applications on one technology, reliably and securely.  Lower TCO: MongoDB runs on commodity hardware, dramatically lowering costs.  Long-Term Commitment: MongoDB Inc and the MongoDB ecosystem stand behind the world's fastest- growing database. 10M+ downloads. 2,000+ customers including more than 1/3rd of the Fortune 100. 1,000+ partners. https://www.mongodb.com/mongodb-architecture 23
  • 23.
    MongoDB Nexus Architecture MongoDB’s design philosophy is focused on combining  the critical capabilities of relational databases  the innovations of NoSQL technologies https://www.mongodb.com/mongodb-architecture 24
  • 24.
    MongoDB Nexus Architecture RelationalDatabase  Expressive query language  Secondary indexes  Strong consistency NoSQL  Flexible Data Model  Elastic Scalability  High Performance https://www.mongodb.com/mongodb-architecture 25
  • 25.
    MongoDB Data Model Data As Documents  MongoDB stores data as documents in a binary representation called BSON (Binary JSON).  BSON documents contain one or more fields, and each field contains a value of a specific data type, including arrays, binary data and sub-documents.  Documents that tend to share a similar structure are organized as collections.  Dynamic Schema  Fields can vary from document to document.  There is no need to declare the structure of documents to the system – documents are self describing.  MongoDB continues to store the updated objects without the need for performing costly ALTER_TABLE operations  Schema Design  Although MongoDB provides schema flexibility, schema design is still important https://www.mongodb.com/mongodb-architecture 26 RDBMS MongoDB Table Collection Row Document Column Field
  • 26.
    An Example DataModel for a Blogging Application Relational Data Model MongoDB Data Model https://www.mongodb.com/mongodb-architecture 27
  • 27.
    MongoDB Query Model Idiomatic Drivers  Query types  Indexing https://www.mongodb.com/mongodb-architecture 28
  • 28.
    Idiomatic Drivers  MongoDBprovides native drivers for all popular programming languages and frameworks to make development natural.  Supported drivers include  Java  .NET  Ruby  PHP  JavaScript  node.js  Python  Perl  Scala and others. https://www.mongodb.com/mongodb-architecture 29
  • 29.
    Query Types  Key-valuequeries  Range queries  Geospatial queries  Text Search queries  Aggregation Framework queries  MapReduce queries https://www.mongodb.com/mongodb-architecture 30
  • 30.
    Query Types  Key-valuequeries http://howtodoinjava.com/2014/05/29/mongodb-selectqueryfind-documents-examples/ 31
  • 31.
    Query Types  Rangequeries http://howtodoinjava.com/2014/05/29/mongodb-selectqueryfind-documents-examples/ 32
  • 32.
    Indexing  Indexes area crucial mechanism for optimizing system performance and scalability while providing flexible access to the data.  Unique Indexes: By specifying an index as unique, MongoDB will reject inserts of new documents or the update of a document with an existing value.  Compound Indexes: It can be useful to create compound indexes for queries that specify multiple predicates.  Array Indexes: For fields that contain an array, each array value is stored as a separate index entry  TTL Indexes: Time to Live (TTL) indexes allow the user to specify a period of time after which the data will automatically be deleted from the database.  Geospatial Indexes: MongoDB provides geospatial indexes to optimize queries related to location within a two dimensional space, such as projection systems for the earth.  Sparse Indexes: It only contain entries for documents that contain the specified field.  Text Search Indexes: MongoDB provides a specialized index for text search that uses advanced, language-specific linguistic rules for stemming, tokenization and stop words. https://www.mongodb.com/mongodb-architecture 33
  • 33.
    MongoDB Data Management Auto-sharding for linear scalability  Pluggable storage architecture for application flexibility  Storage efficiency witih compression https://www.mongodb.com/mongodb-architecture 34
  • 34.
    Auto-sharding for LinearScalability  Sharding distributes data across multiple physical partitions called shards.  Sharding allows MongoDB deployments to address the hardware limitations of a single server, such as bottlenecks in RAM or disk I/O.  Unlike relational databases, sharding is automatic and built into the database.  Developers don't face the complexity of building sharding logic into their application code https://www.mongodb.com/mongodb-architecture 35
  • 35.
    Auto-sharding for LinearScalability  Multiple sharding policies available – hash-based, range-based and location-based.  Range-based Sharding. Documents with shard key values close to one another are likely to be co-located on the same shard. This approach is well suited for applications that need to optimize range based queries.  Hash-based Sharding. Documents are distributed according to an MD5 hash of the shard key value. This approach guarantees a uniform distribution of writes across shards, but is less optimal for range-based queries.  Location-based Sharding. Documents are partitioned according to a user-specified configuration that associates shard key ranges with specific shards and hardware. https://www.mongodb.com/mongodb-architecture 36
  • 36.
    Pluggable storage architecturefor application flexibility  Through the use of a pluggable storage architecture, MongoDB can be extended with new capabilities, and configured for optimal use of specific hardware architectures. https://www.mongodb.com/mongodb-architecture 37
  • 37.
    Storage efficiency witihcompression  MongoDB supports native compression when configured with the WiredTiger storage engine, reducing physical storage footprint by as much as 80%.  In addition to reduced storage space, compression enables much higher storage I/O scalability as fewer bits are read from disk.  Administrators have the flexibility to configure specific compression algorithms for collections, indexes. https://www.mongodb.com/mongodb-architecture 38
  • 38.
    MongoDB Consistency &Availability  Transaction model  The ACID guarantees provided by MongoDB ensure complete isolation as a document is updated.  Replica sets  MongoDB maintains multiple copies of data called replica sets using native replication. A replica set is a fully self-healing shard that helps prevent database downtime.  In-memory performance with on-disk capacity  MongoDB makes extensive use of RAM to speed up database operations. In MongoDB, all data is read and manipulated through memory-mapped files.  Security  Authentication, Authorization, Auditing and Encryption. https://www.mongodb.com/mongodb-architecture 39
  • 39.
    Working Example: UsingMongDB Java Driver on Mac OS X  Instructions to install MongDB:  Download mongodb-osx-x86_64-3.0.7.tgz file and extract it.  Copy it into /usr/local/mongdb  Go to terminal into this directory and command the followings  export PATH=<mongodb-install-directory>/bin:$PATH  sudo chown -R $USER /data/db  Mongod  Coding in Java  You need to download the jar from the path Download mongo.jar. https://oss.sonatype.org/content/repositories/releases/org/mongodb/mongo-java- driver/3.1.1/  You need to include the mongo.jar into your classpath. http://www.slideshare.net/Dataversity/nosql-now-nosql-architecture-patterns-23589170 40
  • 40.
  • 41.
  • 42.
    Summary  NoSQL, itscharacteristics and its types.  MongoDB, its characteristics and working example with MongoDB java. 43

Editor's Notes

  • #7 Advantages of Distributed Computing Reliability (fault tolerance) : The important advantage of distributed computing system is reliability. If some of the machines within the system crash, the rest of the computers remain unaffected and work does not stop. Scalability : In distributed computing the system can easily be expanded by adding more machines as needed. Sharing of Resources : Shared data is essential to many applications such as banking, reservation system. As data or resources are shared in distributed system, other resources can be also shared (e.g. expensive printers). Flexibility : As the system is very flexible, it is very easy to install, implement and debug new services. Speed : A distributed computing system can have more computing power and it's speed makes it different than other systems. Open system : As it is open system, every service is equally accessible to every client i.e. local or remote. Performance : The collection of processors in the system can provide higher performance (and better price/performance ratio) than a centralized computer. Disadvantages of Distributed Computing Troubleshooting : Troubleshooting and diagnosing problems. Software : Less software support is the main disadvantage of distributed computing system. Networking : The network infrastructure can create several problems such as transmission problem, overloading, loss of messages. Security : Easy access in distributed computing system increases the risk of security and sharing of data generates the problem of data security
  • #9 Relational database systems offer the concept of referential integrity, mechanisms to add semantics to the model by using keys and foreign key relationships to ensure data consistency. Concurrent access to a database is administered by using transactions. The famous ACID paradigm is part of (almost) all relational database systems and guarantees transaction to be
  • #10 CA – data should be consistent between all nodes. As long as all nodes are online, users can read/write from any node and be sure that the data is the same on all nodes. CP – data is consistent between all nodes and maintains partition tolerance by becoming unavailable when a node goes down. AP - nodes remain online even if they can’t communicate with each other and will re-sync data once the partition is resolved, but you aren’t guaranteed that all nodes will have the same data (either during or after the partition)
  • #17 Graph Databases: Every node and edge is defined by a unique identifier. Each node knows its adjacent nodes. As the number of nodes increases, the cost of a local step (or hop) remains the same. Index for lookups.
  • #18 How do you know two items reference the same object? - By Node identification (URI or similar structure)
  • #24 Fast, Iterative Development Contrast this against static relational schemas and complex operations that have hindered you in the past. Flexible Data Model: You can dynamically modify the schema without downtime. You spend less time prepping your data for the database, and more time putting your data to work. Pluggable Storage Architecture With MongoDB, organizations can address diverse application needs with a single database technology. With it's pluggable storage architecture, M ongoDB can be extended with new capabilities, and configured for optimal use of specific hardware architectures. Integrated Feature Set: RDBMS systems require additional, complex technologies demanding separate integration overhead and expense to do this well.
  • #27 Data As Documents: MongoDB stores data as documents in a binary representation called BSON (Binary JSON). The BSON encoding extends the popular JSON (JavaScript Object Notation) representation to include additional types such as int, long, and floating point. BSON documents contain one or more fields, and each field contains a value of a specific data type, including arrays, binary data and sub-documents. Schema Design: Developers and DBAs should consider a number of topics, including the types of queries the application will need to perform, how objects are managed in the application code, and how documents will change over time.
  • #28  As this example illustrates, MongoDB documents tend to have all data for a given record in a single document, whereas in a relational database information for a given record is usually spread across many tables. With the MongoDB document model, data is more localized, which significantly reduces the need to JOIN separate tables. The result is dramatically higher performance and scalability across commodity hardware as a single read to the database can retrieve the entire document containingall related data. In addition, MongoDB BSON documents are more closely aligned to the structure of objects in the programming language. This makes it simpler and faster for developers to model how data in the application will map to data stored in the database.
  • #31 Key-value queries return results based on any field in the document, often the primary key. Range queries return results based on values defined as inequalities (e.g. greater than, less than or equal to, between). Geospatial queries return results based on proximity criteria, intersection and inclusion as specified by a point, line, circle or polygon. Text Search queries return results in relevance order based on text arguments using Boolean operators (e.g., AND, OR, NOT). Aggregation Framework queries return aggregations of values returned by the query (e.g., count, min, max, average, similar to a SQL GROUP BY statement). MapReduce queries execute complex data processing that is expressed in JavaScript and executed across data in the database.
  • #34 Compound Indexes: For example, consider an application that stores data about customers. The application may need to find customers based on last name, first name, and city of residence. With a compound index on last name, first name, and city of residence, queries could efficiently locate people with all three of these values specified. TTL Indexes: A common use of TTL indexes is applications that maintain a rolling window of history (e.g., most recent 100 days) for user actions such as clickstreams. Geospatial Indexes: These indexes allow MongoDB to optimize queries for documents that contain points or a polygon that are closest to a given point or line; that are within a circle, rectangle, or polygon; or that intersect with a circle, rectangle, or polygon. Sparse Indexes: Because the document data model of MongoDB allows for flexibility in the data model from document to document, it is common for some fields to be present only in a subset of all documents. Sparse indexes allow for smaller, more efficient indexes when fields are not present in all documents. Text Search Indexes: Queries that use the text search index will return documents in relevance order. One or more fields can be included in the text index.
  • #38 MongoDB 3.0 ships with two supported storage engines: MMAPv1 (Memory Mapped Version 1) engine – an improved version of the engine used in prior MongoDB releases; and the new WiredTiger storage engine bringing higher concurrency and compression. The Pluggable Storage Architecture (PSA) Framework package allows partners to deliver performance-enhancing, multi-pathing and load-balancing behaviors optimized per array.
  • #40 Authentication. Simplifying access control to the database, MongoDB offers integration with external security mechanisms including LDAP, Windows Active Directory, Kerberos and x.509 PKI certificates. Authorization. User-defined roles enable administrators to configure granular permissions for a user or application, based on the privileges they need to do their job. Additionally, field-level redaction can work with trusted middleware to manage access to individual fields within a document, allowing the co-location of data with multiple security levels for ease of development and operation. Auditing. For regulatory compliance, security administrators can use MongoDB's native audit log to track access and operations performed against the database. Encryption. MongoDB data can be encrypted on the network and on disk. Support for SSL allows clients to connect to MongoDB over an encrypted channel.