No

NoSQL Databases
NoSQL DataBases
By:
Muluken Sholaye
(mulesho2490@gmail.com)
Sept,2021
CAP Theorem

Consistency, Availability, Partition Tolerance (CAP)

You can’t continually maintain perfect consistency,
availability, and partition tolerance simultaneously.

CAP is defined by:-

Consistency: all nodes see the same data at the same time

Availability: a guarantee that every request receives a
response about whether it

was successful or failed

Partition tolerance: the system continues to operate despite
arbitrary message loss
CAP Theorem

A distributed system can satisfy a maximum of two
of the following gurantees.


NoSQL databases are next generation databases mostly addressing
some of the points:

Being non-relational,

distributed,

open-source, and

horizontally scalable

Often more characteristics apply to NoSQL databases such as:
Schema-free, easy replication support, simple API, eventually
consistent/BASE (basically available, soft-state, eventual consistency

Not ACID but BASE
NoSQL Databases
Properties of NoSQL Databases

Non-relational

Distributed

Open-source

Horizontally scalable

Schema-free

Easy replication support

Simple API

BASE not ACID
The current number of NoSQL databases has more than 225.
NoSQL databases are widely used in many famous enterprises such as
Google, Yahoo, Facebook, Twitter, Taobao, Amazon, and so on
Categories of NoSQL Databases
●
Here are the four main types of NoSQL databases:
●
Document databases
●
Key-value stores
●
Column-oriented databases
●
Graph databases
●
According to the statistics of the DB-Engines
Ranking website, Apache Cassandra and Apache
HBase are the more widely discussed ones of the
wide column store databases.
Document based
●
A document database stores data in JSON, BSON ,
or XML documents.
●
In a document database, documents can be nested.
Particular elements can be indexed for faster
querying.
●
The most widely adopted document databases are
usually implemented with a scale-out architecture,
providing a clear path to scalability of both data
volumes and traffic.
●
Examples of document stores are MongoDB and
CouchDB.
Cont’d
●
A collection is a group of documents. The
documents within a collection are usually related
to the same subject, such as employees, products,
and so on.
●
A document is a set of ordered key-value pairs,
where key is a string used to reference a
particular value, and value can be either a string
or a document.
●
JSON (JavaScript Object Notation), BSON (Binary
JSON), and XML (eXtensible Markup Language) are
formats commonly used to define documents.
Cont’d
KEY-VALUE STORES
●
Key-value stores are the least complex of the NoSQL databases.
They are, as the name suggests, a collection of key-value pairs.
●
The data in this category of NoSQL databases is stored with the
format of “Key → Value” ,
●
where
●
Key is a string used to identify a unique value;
●
Value is an object whose value can be a simple string, numeric
value, or a complex BLOB JSON object, image, audio, and so
on;
●
According to the statistics of the DB-Engines Ranking Website,
both Redis and DynamoDB.
Cont’d
Graph Databases
●
The most complex one, geared toward storing
relations between entities in an efficient manner.
●
The graph database model (GDM) is composed of
vertices and edges [5], where
– A vertex is an entity instance, which is equivalent to a
tuple in RDM;
– An edge is used to define the relationship between
vertices;
– Each vertex and edge contains any number of attributes
that store the actual data value
●
Cont’d
Assignment
●
Hbase
●
CouchDB
●
Cassandra
●
Redis
●
MongoDB
●
Note:- Take One database from the list and study
– The basics of the database
– Installation and usage
– Demo
●
ETA = 5 Days
Columnar Databases
●
They are index based databases arranged into
columns.
●
Hbase is the most commonly used.
Bigdata Frameworks
Basics
●
The major challenges associated with big data are as follows
−
– Capturing data
– Curation
– Storage
– Searching
– Sharing
– Transfer
– Analysis
– Presentation
●
To fulfill the above challenges, organizations normally take
the help of enterprise Solutions of Layered Frameworks.
Hadoop Ecosystem
●
Apache Hadoop is an open source framework.
●
Hadoop provides businesses with the ability to distribute data storage,
parallel processing, and process data at higher volume, higher velocity,
variety, value, and veracity.
●
Hadoop Ecosystem is a platform or a suite which provides various
services to solve the big data problems. It includes Many Apache projects.
– HDFS: Hadoop Distributed File System
– YARN: Yet Another Resource Negotiator
– MapReduce: Programming based Data Processing
– Spark: In-Memory data processing
– PIG, HIVE: Query based processing of data services
– HBase: NoSQL Database
– Mahout, Spark MLLib: Machine Learning algorithm libraries
– Solar, Lucene: Searching and Indexing
– Zookeeper: Managing cluster
– Flume,Chukwa, Scribe, Kafka, Sqoop : Data collection
Nosql
Cont’d
●
All these toolkits or components revolve around one term
i.e. Data.
●
That’s the beauty of Hadoop that it revolves around data
and hence making its synthesis easier.
●
There are four major elements
of Hadoop i.e.
– HDFS,
– MapReduce,
– YARN, and
– Hadoop Common.
●
Let’s study each in more detail.
HDFS
●
HDFS is is responsible for storing large data sets of structured
or unstructured data across various nodes and thereby
maintaining the metadata in the form of log files.
●
HDFS consists of two core components i.e.
– Name node
– Data Node
●
Name Node is the prime node which contains metadata (data
about data) requiring comparatively fewer resources than the data
nodes that stores the actual data.
●
These data nodes are commodity hardware in the distributed
environment. Undoubtedly, making Hadoop cost effective.
●
HDFS maintains all the coordination between the clusters and
hardware, thus working at the heart of the system.
MapReduce
●
By making the use of distributed and parallel algorithms,
MapReduce makes it possible to carry over the processing’s
logic and helps to write applications which transform big
data sets into a manageable one.
●
MapReduce makes the use of two functions i.e. Map()
and Reduce() whose task is:
– Map() performs sorting and filtering of data and thereby
organizing them in the form of group. Map generates a key-value
pair based result which is later on processed by the Reduce()
method.
– Reduce(), as the name suggests does the summarization by
aggregating the mapped data. In simple, Reduce() takes the output
generated by Map() as input and combines those tuples into
smaller set of tuples.
Nosql
Nosql
●
A Word Count Example of MapReduce
●
Let us understand, how a MapReduce works
by taking an example where I have a text file
called example.txt whose contents are as
follows:
●
Dear, Bear, River, Car, Car, River, Deer, Car
and Bear
●
Now, suppose, we have to perform a word
count on the sample.txt using MapReduce. So,
we will be finding unique words and the
number of occurrences of those unique words.
●
Example
Nosql
Nosql

More Related Content

PPTX
Sql vs NoSQL-Presentation
PDF
Cloud Deployments with Apache Hadoop and Apache HBase
PPT
PPTX
Introduction to NoSQL
PPTX
Selecting best NoSQL
PDF
NoSQL-Database-Concepts
PDF
NoSQL databases
PPTX
Sql vs NoSQL-Presentation
Cloud Deployments with Apache Hadoop and Apache HBase
Introduction to NoSQL
Selecting best NoSQL
NoSQL-Database-Concepts
NoSQL databases

What's hot (19)

PPTX
Chapter1: NoSQL: It’s about making intelligent choices
PPSX
A Seminar on NoSQL Databases.
PPTX
Relational and non relational database 7
PPTX
NoSQL Consepts
PPTX
NoSQL Architecture Overview
PPTX
NoSQL Data Architecture Patterns
PPTX
Data models in NoSQL
PDF
NoSQL Databases
PPT
NoSQL databases
ODP
Nonrelational Databases
PPTX
Introduction to NOSQL databases
PPTX
Appache Cassandra
PPTX
PDF
Introduction to NoSQL
PPTX
Unit 3 MongDB
PDF
NoSQL Now! NoSQL Architecture Patterns
PDF
the rising no sql technology
PDF
Big Challenges in Data Modeling: NoSQL and Data Modeling
KEY
NoSQL Databases: Why, what and when
Chapter1: NoSQL: It’s about making intelligent choices
A Seminar on NoSQL Databases.
Relational and non relational database 7
NoSQL Consepts
NoSQL Architecture Overview
NoSQL Data Architecture Patterns
Data models in NoSQL
NoSQL Databases
NoSQL databases
Nonrelational Databases
Introduction to NOSQL databases
Appache Cassandra
Introduction to NoSQL
Unit 3 MongDB
NoSQL Now! NoSQL Architecture Patterns
the rising no sql technology
Big Challenges in Data Modeling: NoSQL and Data Modeling
NoSQL Databases: Why, what and when
Ad

Similar to Nosql (20)

PPT
CouchBase The Complete NoSql Solution for Big Data
PPTX
2018 05 08_biological_databases_no_sql
PPTX
UNIT I Introduction to NoSQL.pptx
PPTX
Introduction to Data Science NoSQL.pptx
PDF
NoSQL Databases Introduction - UTN 2013
PPTX
MongoDB and Hadoop Handling for Big Data
PPTX
No sql databases
PDF
NoSql and it's introduction features-Unit-1.pdf
PPTX
UNIT I Introduction to NoSQL.pptx
PPTX
No sq lv2
PPTX
Introduction to NoSql
PDF
NOsql Presentation.pdf
PPTX
Sql vs NoSQL
PDF
Spring one2gx2010 spring-nonrelational_data
PPTX
PPTX
NoSQL and Couchbase
PPTX
NoSQL and MongoDB
PPTX
NOSQL PRESENTATION ON INTRRODUCTION Intro.pptx
PPTX
Nosql databases
PPTX
cours database pour etudiant NoSQL (1).pptx
CouchBase The Complete NoSql Solution for Big Data
2018 05 08_biological_databases_no_sql
UNIT I Introduction to NoSQL.pptx
Introduction to Data Science NoSQL.pptx
NoSQL Databases Introduction - UTN 2013
MongoDB and Hadoop Handling for Big Data
No sql databases
NoSql and it's introduction features-Unit-1.pdf
UNIT I Introduction to NoSQL.pptx
No sq lv2
Introduction to NoSql
NOsql Presentation.pdf
Sql vs NoSQL
Spring one2gx2010 spring-nonrelational_data
NoSQL and Couchbase
NoSQL and MongoDB
NOSQL PRESENTATION ON INTRRODUCTION Intro.pptx
Nosql databases
cours database pour etudiant NoSQL (1).pptx
Ad

Recently uploaded (20)

PDF
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
DOCX
Basics of Cloud Computing - Cloud Ecosystem
PDF
Planning-an-Audit-A-How-To-Guide-Checklist-WP.pdf
PDF
Enhancing plagiarism detection using data pre-processing and machine learning...
PDF
Co-training pseudo-labeling for text classification with support vector machi...
PDF
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
PDF
SaaS reusability assessment using machine learning techniques
PPTX
Microsoft User Copilot Training Slide Deck
PPTX
Internet of Everything -Basic concepts details
PDF
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
PDF
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
PDF
A hybrid framework for wild animal classification using fine-tuned DenseNet12...
PDF
Comparative analysis of machine learning models for fake news detection in so...
PPTX
agenticai-neweraofintelligence-250529192801-1b5e6870.pptx
PDF
Auditboard EB SOX Playbook 2023 edition.
PPTX
SGT Report The Beast Plan and Cyberphysical Systems of Control
PDF
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
PDF
NewMind AI Weekly Chronicles – August ’25 Week IV
PDF
Aug23rd - Mulesoft Community Workshop - Hyd, India.pdf
PDF
Introduction to MCP and A2A Protocols: Enabling Agent Communication
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
Basics of Cloud Computing - Cloud Ecosystem
Planning-an-Audit-A-How-To-Guide-Checklist-WP.pdf
Enhancing plagiarism detection using data pre-processing and machine learning...
Co-training pseudo-labeling for text classification with support vector machi...
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
SaaS reusability assessment using machine learning techniques
Microsoft User Copilot Training Slide Deck
Internet of Everything -Basic concepts details
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
A hybrid framework for wild animal classification using fine-tuned DenseNet12...
Comparative analysis of machine learning models for fake news detection in so...
agenticai-neweraofintelligence-250529192801-1b5e6870.pptx
Auditboard EB SOX Playbook 2023 edition.
SGT Report The Beast Plan and Cyberphysical Systems of Control
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
NewMind AI Weekly Chronicles – August ’25 Week IV
Aug23rd - Mulesoft Community Workshop - Hyd, India.pdf
Introduction to MCP and A2A Protocols: Enabling Agent Communication

Nosql

  • 3. CAP Theorem  Consistency, Availability, Partition Tolerance (CAP)  You can’t continually maintain perfect consistency, availability, and partition tolerance simultaneously.  CAP is defined by:-  Consistency: all nodes see the same data at the same time  Availability: a guarantee that every request receives a response about whether it  was successful or failed  Partition tolerance: the system continues to operate despite arbitrary message loss
  • 4. CAP Theorem  A distributed system can satisfy a maximum of two of the following gurantees. 
  • 5.  NoSQL databases are next generation databases mostly addressing some of the points:  Being non-relational,  distributed,  open-source, and  horizontally scalable  Often more characteristics apply to NoSQL databases such as: Schema-free, easy replication support, simple API, eventually consistent/BASE (basically available, soft-state, eventual consistency  Not ACID but BASE NoSQL Databases
  • 6. Properties of NoSQL Databases  Non-relational  Distributed  Open-source  Horizontally scalable  Schema-free  Easy replication support  Simple API  BASE not ACID The current number of NoSQL databases has more than 225. NoSQL databases are widely used in many famous enterprises such as Google, Yahoo, Facebook, Twitter, Taobao, Amazon, and so on
  • 7. Categories of NoSQL Databases ● Here are the four main types of NoSQL databases: ● Document databases ● Key-value stores ● Column-oriented databases ● Graph databases ● According to the statistics of the DB-Engines Ranking website, Apache Cassandra and Apache HBase are the more widely discussed ones of the wide column store databases.
  • 8. Document based ● A document database stores data in JSON, BSON , or XML documents. ● In a document database, documents can be nested. Particular elements can be indexed for faster querying. ● The most widely adopted document databases are usually implemented with a scale-out architecture, providing a clear path to scalability of both data volumes and traffic. ● Examples of document stores are MongoDB and CouchDB.
  • 9. Cont’d ● A collection is a group of documents. The documents within a collection are usually related to the same subject, such as employees, products, and so on. ● A document is a set of ordered key-value pairs, where key is a string used to reference a particular value, and value can be either a string or a document. ● JSON (JavaScript Object Notation), BSON (Binary JSON), and XML (eXtensible Markup Language) are formats commonly used to define documents.
  • 11. KEY-VALUE STORES ● Key-value stores are the least complex of the NoSQL databases. They are, as the name suggests, a collection of key-value pairs. ● The data in this category of NoSQL databases is stored with the format of “Key → Value” , ● where ● Key is a string used to identify a unique value; ● Value is an object whose value can be a simple string, numeric value, or a complex BLOB JSON object, image, audio, and so on; ● According to the statistics of the DB-Engines Ranking Website, both Redis and DynamoDB.
  • 13. Graph Databases ● The most complex one, geared toward storing relations between entities in an efficient manner. ● The graph database model (GDM) is composed of vertices and edges [5], where – A vertex is an entity instance, which is equivalent to a tuple in RDM; – An edge is used to define the relationship between vertices; – Each vertex and edge contains any number of attributes that store the actual data value ●
  • 15. Assignment ● Hbase ● CouchDB ● Cassandra ● Redis ● MongoDB ● Note:- Take One database from the list and study – The basics of the database – Installation and usage – Demo ● ETA = 5 Days
  • 16. Columnar Databases ● They are index based databases arranged into columns. ● Hbase is the most commonly used.
  • 18. Basics ● The major challenges associated with big data are as follows − – Capturing data – Curation – Storage – Searching – Sharing – Transfer – Analysis – Presentation ● To fulfill the above challenges, organizations normally take the help of enterprise Solutions of Layered Frameworks.
  • 19. Hadoop Ecosystem ● Apache Hadoop is an open source framework. ● Hadoop provides businesses with the ability to distribute data storage, parallel processing, and process data at higher volume, higher velocity, variety, value, and veracity. ● Hadoop Ecosystem is a platform or a suite which provides various services to solve the big data problems. It includes Many Apache projects. – HDFS: Hadoop Distributed File System – YARN: Yet Another Resource Negotiator – MapReduce: Programming based Data Processing – Spark: In-Memory data processing – PIG, HIVE: Query based processing of data services – HBase: NoSQL Database – Mahout, Spark MLLib: Machine Learning algorithm libraries – Solar, Lucene: Searching and Indexing – Zookeeper: Managing cluster – Flume,Chukwa, Scribe, Kafka, Sqoop : Data collection
  • 21. Cont’d ● All these toolkits or components revolve around one term i.e. Data. ● That’s the beauty of Hadoop that it revolves around data and hence making its synthesis easier. ● There are four major elements of Hadoop i.e. – HDFS, – MapReduce, – YARN, and – Hadoop Common. ● Let’s study each in more detail.
  • 22. HDFS ● HDFS is is responsible for storing large data sets of structured or unstructured data across various nodes and thereby maintaining the metadata in the form of log files. ● HDFS consists of two core components i.e. – Name node – Data Node ● Name Node is the prime node which contains metadata (data about data) requiring comparatively fewer resources than the data nodes that stores the actual data. ● These data nodes are commodity hardware in the distributed environment. Undoubtedly, making Hadoop cost effective. ● HDFS maintains all the coordination between the clusters and hardware, thus working at the heart of the system.
  • 23. MapReduce ● By making the use of distributed and parallel algorithms, MapReduce makes it possible to carry over the processing’s logic and helps to write applications which transform big data sets into a manageable one. ● MapReduce makes the use of two functions i.e. Map() and Reduce() whose task is: – Map() performs sorting and filtering of data and thereby organizing them in the form of group. Map generates a key-value pair based result which is later on processed by the Reduce() method. – Reduce(), as the name suggests does the summarization by aggregating the mapped data. In simple, Reduce() takes the output generated by Map() as input and combines those tuples into smaller set of tuples.
  • 26. ● A Word Count Example of MapReduce ● Let us understand, how a MapReduce works by taking an example where I have a text file called example.txt whose contents are as follows: ● Dear, Bear, River, Car, Car, River, Deer, Car and Bear ● Now, suppose, we have to perform a word count on the sample.txt using MapReduce. So, we will be finding unique words and the number of occurrences of those unique words. ●