NoSQL Database: New Era of Databases for Big data
Analytics - Classification, Characteristics and
Comparison
A B M Moniruzzaman and Syed Akhter Hossain

03/04/14

CSC 8710

1
Contents
•
•
•
•
•
•
•
•
•
•

NoSQL databases definition
Why NoSQL databases?
Characteristics of NoSQL Databases
Primary Uses of NoSQL Database
Key-Value databases
Documents databases
Column-Family databases
Graph databases
Adoption of NoSQL Database
Conclusion

03/04/14

CSC 8710

2
NoSQL Database
• NoSQL for Not Only SQL, refers to an eclectic and
increasingly familiar group of non-relational data
management system
• databases are not built primarily on tables, and generally
don't use SQL for data manipulation.
• NoSQL systems are distributed, non-relational database,
designed for large-scale data storage and for massiveparallel data processing across a large number of
commodity servers.

03/04/14

CSC 8710

3
NoSQL Database
• They also use non-SQL languages and mechanisms to
interact with data.
• NoSQL database systems arose alongside major Internet
companies, such as Google, Amazon, and Facebook
which had challenges in dealing with huge quantities of
data
• These systems are designed to scale thousands or
millions of users doing updates as well as reads, in
contrast to traditional DBMSs and data warehouses

03/04/14

CSC 8710

4
Why NoSQL?
• Relational DBMSs have been a successful
technology for many years, providing
persistence, concurrency control and integration
mechanisms.
• The need of processing large amount of data
changes the direction from scaling vertically to
scaling horizontally on clusters.

03/04/14

CSC 8710

5
Why NoSQL?
• NoSQL databases focus on analytical processing
of large scale datasets, offering increased
scalability over commodity hardware
• Organizations that collect large amounts of
unstructured data are increasingly turning to nonrelational databases (NoSQL databases).

03/04/14

CSC 8710

6
Big Data

03/04/14

CSC 8710

7
Characteristics of NoSQL Databases
• Strong Consistency: all clients see the same version of
data.
• High Availability: Data always available, at least one
copy of the requested data even if one of the nodes is
down.
• Partition-tolerance: the total system keeps its
characteristic even when being deployed on different
servers

03/04/14

CSC 8710

8
Characteristics of NoSQL Databases

03/04/14

CSC 8710

9
Primary Uses of NoSQL Database
1. Large-scale data processing
2. Exploratory analytics on semi-structured
data (expert level)
3. Large volume data storage.

03/04/14

CSC 8710

10
Classification of NoSQL Databases
• Key-Value databases
• Documents databases
• Column Family databases
• Graphics databases

03/04/14

CSC 8710

11
Key-Value Databases
• These DMS store items as alpha-numeric identifiers that
refer to the keys. Each key has associated values.
• The values could be simple text strings or more complex
lists and sets
• Search only performed against keys, and limited to exact
matches.
• Search cannot be performed against values

03/04/14

CSC 8710

12
Key-Value Databases

03/04/14

CSC 8710

13
Key-Value characterstics
• The simplicity of Key-Value Store makes them very quick
and light.
• Highly scalable retrieval of the values needed for
application tasks such as retrieving product names.
• This is why Amazon use K-V system, Dynamo, in its
shopping cart. Dynamo is a highly available key-value
storage system.
• Example: Dynamo (Amazon), Voldemort (LinkedIn)
Redis, BerkeleyDB, Riak
03/04/14

CSC 8710

14
Pros and Cons

• pros: anything can be stored in an
aggregate
• cons: only key lookup to access the entire
aggregate is allowed (no query and part of
aggregate retrieval mechanisms)

03/04/14

CSC 8710

15
Document Database

• Designed to manage and store
documents.
• These documents are encoded in a
standard data exchange format such as
XML, JSON (Javascript Option Notation)
or BSON (Binary JSON).
03/04/14

CSC 8710

16
Document Database

03/04/14

CSC 8710

17
Primary Uses
• Document databases are good for storing
and managing Big Data-size collections of
literal documents such as text documents,
email messages.

03/04/14

CSC 8710

18
Pros And Cons
• pros: allow structured queries and partial
aggregate retrieval based on the fields in
the aggregate
• cons: imposes a limit on what can be
placed in a database

03/04/14

CSC 8710

19
Column-Family Databases
• It consists of a Key-Value pair where the value
consists of set of columns.
• The column family databases are represented in
tables, each key-value pair being a row.
• All the related data can be grouped as one family

03/04/14

CSC 8710

20
Primary Uses
1. Large-scale, batch-oriented data processing:
sorting, parsing, conversion :
- conversions between hexadecimal, binary and
decimal code values.

2. Exploratory and predictive analytics performed
by expert statisticians and programmers.

03/04/14

CSC 8710

21
Column-Family

03/04/14

CSC 8710

22
Graph Databases
• Graph databases replace relational tables with structured
relational graphs of interconnected key-value pairings.
• Graph databases are useful when you are more
interested in relationships between data than the data
itself and it works perfectly for the social network.
• It is optimized for relationship traversing not for querying
• Examples: Neo4j, InfoGrid, Sones GraphDB,
AllegroGraph, InfiniteGraph
03/04/14

CSC 8710

23
Graph Databases

03/04/14

CSC 8710

24
Adoption of NoSQL Database
• Organizations that have massive data storage
are looking seriously at NoSQL.
• NoSQL Database expert are highly demanded
for most of the developing organizations.
• The next graph shows job trends of five NoSQL
Databases from Indeed.com

03/04/14

CSC 8710

25
Job Trends of Five NoSQL Databases

03/04/14

CSC 8710

26
Adoption of NoSQL Database
• MongoDB‘s growth means that it has cemented
its place as the most popular NoSQL database.
• According to LinkedIn profile mentions, The
mentions of NoSQL technologies form 45% in
LinkedIn profiles.

03/04/14

CSC 8710

27
LinkedIn statistics

03/04/14

CSC 8710

28
Conclusion
• Computational and storage requirements of applications
such as for Big Data analytics, Business Intelligence and
social networking over peta-byte datasets led us to the
change from SQL to NoSQL DBs.
• This led to the development of horizontally scalable,
distributed non-relational No-SQL databases.
• MongoDB‘s is the most demanded one.

03/04/14

CSC 8710

29
Resources
•

http://arxiv.org/ftp/arxiv/papers/1307/1307.0191.pdf

•

http://en.wikipedia.org/wiki/Column_family

•

http://en.wikipedia.org/wiki/NoSQL

03/04/14

CSC 8710

30
03/04/14

CSC 8710

31
03/04/14

CSC 8710

32

NoSQL databases

  • 1.
    NoSQL Database: NewEra of Databases for Big data Analytics - Classification, Characteristics and Comparison A B M Moniruzzaman and Syed Akhter Hossain 03/04/14 CSC 8710 1
  • 2.
    Contents • • • • • • • • • • NoSQL databases definition WhyNoSQL databases? Characteristics of NoSQL Databases Primary Uses of NoSQL Database Key-Value databases Documents databases Column-Family databases Graph databases Adoption of NoSQL Database Conclusion 03/04/14 CSC 8710 2
  • 3.
    NoSQL Database • NoSQLfor Not Only SQL, refers to an eclectic and increasingly familiar group of non-relational data management system • databases are not built primarily on tables, and generally don't use SQL for data manipulation. • NoSQL systems are distributed, non-relational database, designed for large-scale data storage and for massiveparallel data processing across a large number of commodity servers. 03/04/14 CSC 8710 3
  • 4.
    NoSQL Database • Theyalso use non-SQL languages and mechanisms to interact with data. • NoSQL database systems arose alongside major Internet companies, such as Google, Amazon, and Facebook which had challenges in dealing with huge quantities of data • These systems are designed to scale thousands or millions of users doing updates as well as reads, in contrast to traditional DBMSs and data warehouses 03/04/14 CSC 8710 4
  • 5.
    Why NoSQL? • RelationalDBMSs have been a successful technology for many years, providing persistence, concurrency control and integration mechanisms. • The need of processing large amount of data changes the direction from scaling vertically to scaling horizontally on clusters. 03/04/14 CSC 8710 5
  • 6.
    Why NoSQL? • NoSQLdatabases focus on analytical processing of large scale datasets, offering increased scalability over commodity hardware • Organizations that collect large amounts of unstructured data are increasingly turning to nonrelational databases (NoSQL databases). 03/04/14 CSC 8710 6
  • 7.
  • 8.
    Characteristics of NoSQLDatabases • Strong Consistency: all clients see the same version of data. • High Availability: Data always available, at least one copy of the requested data even if one of the nodes is down. • Partition-tolerance: the total system keeps its characteristic even when being deployed on different servers 03/04/14 CSC 8710 8
  • 9.
    Characteristics of NoSQLDatabases 03/04/14 CSC 8710 9
  • 10.
    Primary Uses ofNoSQL Database 1. Large-scale data processing 2. Exploratory analytics on semi-structured data (expert level) 3. Large volume data storage. 03/04/14 CSC 8710 10
  • 11.
    Classification of NoSQLDatabases • Key-Value databases • Documents databases • Column Family databases • Graphics databases 03/04/14 CSC 8710 11
  • 12.
    Key-Value Databases • TheseDMS store items as alpha-numeric identifiers that refer to the keys. Each key has associated values. • The values could be simple text strings or more complex lists and sets • Search only performed against keys, and limited to exact matches. • Search cannot be performed against values 03/04/14 CSC 8710 12
  • 13.
  • 14.
    Key-Value characterstics • Thesimplicity of Key-Value Store makes them very quick and light. • Highly scalable retrieval of the values needed for application tasks such as retrieving product names. • This is why Amazon use K-V system, Dynamo, in its shopping cart. Dynamo is a highly available key-value storage system. • Example: Dynamo (Amazon), Voldemort (LinkedIn) Redis, BerkeleyDB, Riak 03/04/14 CSC 8710 14
  • 15.
    Pros and Cons •pros: anything can be stored in an aggregate • cons: only key lookup to access the entire aggregate is allowed (no query and part of aggregate retrieval mechanisms) 03/04/14 CSC 8710 15
  • 16.
    Document Database • Designedto manage and store documents. • These documents are encoded in a standard data exchange format such as XML, JSON (Javascript Option Notation) or BSON (Binary JSON). 03/04/14 CSC 8710 16
  • 17.
  • 18.
    Primary Uses • Documentdatabases are good for storing and managing Big Data-size collections of literal documents such as text documents, email messages. 03/04/14 CSC 8710 18
  • 19.
    Pros And Cons •pros: allow structured queries and partial aggregate retrieval based on the fields in the aggregate • cons: imposes a limit on what can be placed in a database 03/04/14 CSC 8710 19
  • 20.
    Column-Family Databases • Itconsists of a Key-Value pair where the value consists of set of columns. • The column family databases are represented in tables, each key-value pair being a row. • All the related data can be grouped as one family 03/04/14 CSC 8710 20
  • 21.
    Primary Uses 1. Large-scale,batch-oriented data processing: sorting, parsing, conversion : - conversions between hexadecimal, binary and decimal code values. 2. Exploratory and predictive analytics performed by expert statisticians and programmers. 03/04/14 CSC 8710 21
  • 22.
  • 23.
    Graph Databases • Graphdatabases replace relational tables with structured relational graphs of interconnected key-value pairings. • Graph databases are useful when you are more interested in relationships between data than the data itself and it works perfectly for the social network. • It is optimized for relationship traversing not for querying • Examples: Neo4j, InfoGrid, Sones GraphDB, AllegroGraph, InfiniteGraph 03/04/14 CSC 8710 23
  • 24.
  • 25.
    Adoption of NoSQLDatabase • Organizations that have massive data storage are looking seriously at NoSQL. • NoSQL Database expert are highly demanded for most of the developing organizations. • The next graph shows job trends of five NoSQL Databases from Indeed.com 03/04/14 CSC 8710 25
  • 26.
    Job Trends ofFive NoSQL Databases 03/04/14 CSC 8710 26
  • 27.
    Adoption of NoSQLDatabase • MongoDB‘s growth means that it has cemented its place as the most popular NoSQL database. • According to LinkedIn profile mentions, The mentions of NoSQL technologies form 45% in LinkedIn profiles. 03/04/14 CSC 8710 27
  • 28.
  • 29.
    Conclusion • Computational andstorage requirements of applications such as for Big Data analytics, Business Intelligence and social networking over peta-byte datasets led us to the change from SQL to NoSQL DBs. • This led to the development of horizontally scalable, distributed non-relational No-SQL databases. • MongoDB‘s is the most demanded one. 03/04/14 CSC 8710 29
  • 30.
  • 31.
  • 32.

Editor's Notes

  • #8 Enterprise Resourse planning: Customer Relationship management