SlideShare a Scribd company logo
Slide 1
HBase Vs Cassandra Vs
MongoDB - choose the right
NoSQL database
View NoSQL database Courses at : www.edureka.in
*
Slide 2
Objectives of this Session
• Un
For Queries during the session and class recording:
Post on Twitter @edurekaIN: #askEdureka
Post on Facebook /edurekaIN
 Traditional databases
 Challenges with traditional databases
 CAP Theorem
 NoSQL to the rescue
 A BASE system
 Choose the right NoSQL database
www.edureka.in
Slide 3 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
RDBMS/OLTP/Real Time
NoSQL/New SQL/BigData
DSS/OLAP/DW
Oracle
MySQL
MS SQL
DB2
Netezza
SAP Hana
Oracle Express
MongoDB
HBase
Cassandra
CouchDB
Database Categories
www.edureka.in
Slide 4 www.edureka.in
5000 TPS
Caching Layer
300 ~ 500 SQL
Transaction
100 ~ 200 SQL
Transaction
1000 TPS
WEB APPLICATION
RDBMS1
Applications Changing Data
RDBMS1
Elastic Scale
A Traditional database solution
Slide 5 www.edureka.in
1000 TPS
Elastic Scale WEB APPLICATION
Applications Changing Data
Elastic Scale
CASSANDRA
300 ~ 500 SQL
Transaction
100 ~ 200 SQL
Transaction
5000 TPS
A NoSQL database solution
Slide 6 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in
Challenges with traditional databases
 Not a good fit for large Data Volume (petabytes of data) with Varying data types
e.g. images, videos, text etc.
 Can’t scale for large data volume e.g. 15 - 20 petabyte data in Govt. of India
“AADHAR” project
 Scale-up - Limited by Memory and Processing (CPU) capabilities
 Scale-out - Cache dependent ‘Read’ and ‘Write’ Operations
 Complex RDBMS model – Parsing, Locking, Logging, Buffer pool, Threads etc.
 Sharding causes operational problems e.g. managing a shard failure
 Consistency – A bottleneck for Scalability in RDBMS
 Satisfying ACID is an hindrance for Scaling
 Relaxed consistency to scale out with NoSQL databases
Slide 7 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in
CAP
We must understand the CAP
theorem when we talk about
NoSQL databases or in fact
when designing any distributed
system.
CAP theorem states that there are 3 basic requirements which exist in a special relation when designing
applications for a distributed architecture.
Consistency
Availability
Partition
Tolerance
CAP Theorem
This means that the system is always on (service guarantee
availability), no downtime.
This means that the system continues to function even the
communication among the servers is unreliable, i.e. the servers
may be partitioned into multiple groups that cannot communicate
with one another.
This means that the data in the database remains consistent after
the execution of an operation. For example after an update
operation all clients see the same data.
Slide 8 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
 CAP provides the basic requirements for a distributed system
to follow 2 of the 3 requirements.
 In theoretically it is impossible to fulfill all 3 requirements.
 Therefore all the current NoSQL database follow the different
combinations of the C, A, P from the CAP theorem.
CAP Theorem and NoSQL databases
 CA - Single site cluster, therefore all nodes are always
in contact. When a partition occurs, the system blocks.
 CP - Some data may not be accessible, but the rest is
still consistent/accurate.
 AP - System is still available under partitioning, but
some of the data returned may be inaccurate.
www.edureka.in
Slide 9 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in
NoSQL to the rescue
 A scale-out, shared-nothing architecture, capable of running on a large number of
nodes
 A non-locking concurrency control mechanism so real-time reads will not conflict
with writes
 Scalable replication and distribution
 Thousands of machines with distributed data
 An architecture providing much higher per-node performance than available from
the traditional SQL-based databases
 Schema-less Data Model
 Mostly Query and Few Updates
Slide 10 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in
 Basically Available indicates that the system does guarantee availability, in terms of
the CAP theorem.
Basically Available
 Soft State indicates that the state of the system may change over time, even without
input. This is because of the eventual consistency model.
Soft State
 Eventual Consistency indicates that the system will become consistent over time,
given that the system doesn't receive input during that time.
Eventual Consistency
A BASE system gives up on consistency.
NoSQL database - A BASE not ACID system
Slide 11 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in
~ 150 No SQL Database
are there in Market
~150
NoSQL database – Not a Panacea
Slide 12 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
NoSQL Database – Storage Architecture
CouchDB, MongoDB
Collection of key value
Connections
Incomplete Data
Tolerant
Query Performance, No
Standard Query Syntax
Hbase, Cassandra
Column Families
Fast Look-ups
Very Low Level API
Amazon Simple DB,
Redis
Collection of Key
Value pairs
Fast Look-ups
Stored Data
has no Schema
InfoGrid, Infinite Graph
“Property Graph” - Nodes
Graph Algorithms – Shortest
Path, Connected ness, Etc
Not easy to Cluster, traverse
whole graph to get answer
Data Model
Example
Weakness
Strength
Data Model
Example
Weakness
Strength
Data Model
Example
Weakness
Strength
Data Model
Example
Weakness
Strength
Document Data
Store Databases
Key Value
Databases
Columnar NoSQL
Databases
Graph NoSQL
Databases
No SQL
Database Types
www.edureka.in
Slide 13 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Right Data Model
Pros and Cons of
Consistency
Compromising
Features of RDBMS
Step 2
Step 3
Selecting a NoSQL database
Step 1
www.edureka.in
Slide 14 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Where to Use Cassandra?
 If looking for simple setup, maintenance and code
 Very High Velocity Random Reads & Writes
 Flexible Sparse / Wide Column Requirements
 No Multiple Secondary Index Needs
www.edureka.in
Slide 15 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.inwww.edureka.in
Massive Scale, High Availability
Cassandra Use Case - Twitter
Slide 16 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Where NOT to Use Cassandra?
Do not use Cassandra if your application has:
 Secondary Indexes.
 Relational Data.
 Transactional (Rollback, Commit)
 Primary & Financial Records.
 Stringent Security & Authorization Needs On Data
 Dynamic Queries on Columns.
 Searching Column Data
 Low Latency
www.edureka.in
Slide 17 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Where to Use HBase
 Optimized for reads
 Well suited for doing Range based scans
 Applications with strict consistency requirements
 Applications with fast read and writes with scalability
 Facebook uses it to manage its user statuses, photos, chat messages etc.
www.edureka.in
Slide 18 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.inwww.edureka.in
Consistency and Scale
HBase Use Case - Facebook Messenger
Slide 19 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in
 It is not optimized for classic transactional applications or even relational analytics
 Application that need:
 full table scans
 data to be aggregated, rolled up, analysed across rows
Where Not to use HBase
Slide 20 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in
Where to Use MongoDB
www.edureka.in
 RDBMS replacement for Web Applications
 Semi-structured Content Management
 Real-time Analytics & High-Speed Logging
 Caching and High Scalability
 Web 2.0, Media, SAAS, Gaming
http://www.mongodb.org/about/production-deployments/
Slide 21 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in
 MySQL for Active posts
 MongoDB for Archived posts
 Migrated Two billion plus posts to MongoDB
 Migrated from RDBMS to MongoDB
 Storage of venues and check-ins
High-performance and Schema-free
MongoDB Use Cases
Slide 22 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in
 Highly Transactional Applications
 Applications with traditional database systems requirements such as foreign-key
constraints etc.
Where Not to use MongoDB
Slide 23 www.edureka.in
 Distributed and
scalable big data store
 Strong consistency
 Built on top of Hadoop
Distributed File
system (HDFS)
 CP on CAP
Cassandra MongoDBHBase
 High availability
 Incremental scalability
 Eventually consistent
 Trade-offs between
consistency and latency
 Minimal administration
 No SPF (Single Point of Failure)
 AP on CAP
 Schemas to change as applications
evolve (Schema-free)
 Full Index Support for High
Performance.
 Replication and Failover for High
Availability.
 Auto Sharding for Easy Scalability.
 Rich Document based queries for
Easy readability
 CP on CAP
HBase Vs Cassandra Vs MongoDB
Slide 24
Questions?
Buy NoSQL database Courses at : www.edureka.in
Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
www.edureka.in

More Related Content

What's hot (20)

PPTX
AWS S3 | Tutorial For Beginners | AWS S3 Bucket Tutorial | AWS Tutorial For B...
Simplilearn
 
PPTX
Azure SQL Database Managed Instance
James Serra
 
PDF
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
DATAVERSITY
 
PDF
Cassandra Introduction & Features
DataStax Academy
 
PPTX
Azure data platform overview
James Serra
 
PPTX
Introducing Azure SQL Database
James Serra
 
PPTX
Introduction to Azure Databricks
James Serra
 
PPTX
Apache HBase™
Prashant Gupta
 
PPTX
Sql vs NoSQL-Presentation
Shubham Tomar
 
PPTX
Non relational databases-no sql
Ram kumar
 
PPTX
Snowflake Architecture.pptx
chennakesava44
 
PDF
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Edureka!
 
PDF
Introduction to apache spark
Aakashdata
 
PPTX
App Modernization with Microsoft Azure
Microsoft Tech Community
 
PPTX
Snowflake + Power BI: Cloud Analytics for Everyone
Angel Abundez
 
PPTX
PostgreSQL Database Slides
metsarin
 
PDF
[pgday.Seoul 2022] PostgreSQL with Google Cloud
PgDay.Seoul
 
KEY
NoSQL Databases: Why, what and when
Lorenzo Alberton
 
PPTX
Postgresql
NexThoughts Technologies
 
PPTX
DAX (Data Analysis eXpressions) from Zero to Hero
Microsoft TechNet - Belgium and Luxembourg
 
AWS S3 | Tutorial For Beginners | AWS S3 Bucket Tutorial | AWS Tutorial For B...
Simplilearn
 
Azure SQL Database Managed Instance
James Serra
 
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
DATAVERSITY
 
Cassandra Introduction & Features
DataStax Academy
 
Azure data platform overview
James Serra
 
Introducing Azure SQL Database
James Serra
 
Introduction to Azure Databricks
James Serra
 
Apache HBase™
Prashant Gupta
 
Sql vs NoSQL-Presentation
Shubham Tomar
 
Non relational databases-no sql
Ram kumar
 
Snowflake Architecture.pptx
chennakesava44
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Edureka!
 
Introduction to apache spark
Aakashdata
 
App Modernization with Microsoft Azure
Microsoft Tech Community
 
Snowflake + Power BI: Cloud Analytics for Everyone
Angel Abundez
 
PostgreSQL Database Slides
metsarin
 
[pgday.Seoul 2022] PostgreSQL with Google Cloud
PgDay.Seoul
 
NoSQL Databases: Why, what and when
Lorenzo Alberton
 
DAX (Data Analysis eXpressions) from Zero to Hero
Microsoft TechNet - Belgium and Luxembourg
 

Viewers also liked (13)

PPTX
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
Athiq Ahamed
 
PPTX
1. beyond mission critical virtualizing big data and hadoop
Chiou-Nan Chen
 
KEY
Strengths and Weaknesses of MongoDB
lehresman
 
PPT
MongoDB Pros and Cons
johnrjenson
 
PPT
MySQL Atchitecture and Concepts
Tuyen Vuong
 
PDF
Optimizing Hive Queries
Owen O'Malley
 
PDF
Analytical Queries with Hive: SQL Windowing and Table Functions
DataWorks Summit
 
PDF
Hive tuning
Michael Zhang
 
PDF
Introduction to MySQL
Giuseppe Maxia
 
PDF
Optimizing MapReduce Job performance
DataWorks Summit
 
PPT
NoSQL databases pros and cons
Fabio Fumarola
 
PPTX
How to understand and analyze Apache Hive query execution plan for performanc...
DataWorks Summit/Hadoop Summit
 
DOCX
Dynamo db pros and cons
Saniya Khalsa
 
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
Athiq Ahamed
 
1. beyond mission critical virtualizing big data and hadoop
Chiou-Nan Chen
 
Strengths and Weaknesses of MongoDB
lehresman
 
MongoDB Pros and Cons
johnrjenson
 
MySQL Atchitecture and Concepts
Tuyen Vuong
 
Optimizing Hive Queries
Owen O'Malley
 
Analytical Queries with Hive: SQL Windowing and Table Functions
DataWorks Summit
 
Hive tuning
Michael Zhang
 
Introduction to MySQL
Giuseppe Maxia
 
Optimizing MapReduce Job performance
DataWorks Summit
 
NoSQL databases pros and cons
Fabio Fumarola
 
How to understand and analyze Apache Hive query execution plan for performanc...
DataWorks Summit/Hadoop Summit
 
Dynamo db pros and cons
Saniya Khalsa
 
Ad

Similar to HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database (20)

PPTX
Introduction to MongoDB
Edureka!
 
PDF
Mongo DB
Edureka!
 
PDF
Build Application With MongoDB
Edureka!
 
PPTX
Introduction to asdfghjkln b vfgh n v
23mz02
 
PPTX
NoSQL databases - An introduction
Pooyan Mehrparvar
 
PDF
Nosql Presentation.pdf for DBMS understanding
HUSNAINAHMAD39
 
PPTX
NoSQL(NOT ONLY SQL)
Rahul P
 
PPTX
NoSQL and Couchbase
Sangharsh agarwal
 
PPTX
NoSQL Basics and MongDB
Shamima Yeasmin Mukta
 
PPTX
Relational and non relational database 7
abdulrahmanhelan
 
PPTX
Introduction to nosql
Zuhaib Ansari
 
PPTX
UNIT I Introduction to NoSQL.pptx
Rahul Borate
 
PPT
No SQL Databases as modern database concepts
debasisdas225831
 
PPTX
Master.pptx
KarthikR780430
 
PDF
NoSQL Databases Introduction - UTN 2013
Facundo Farias
 
PPTX
No sq lv2
Nusrat Sharmin
 
PPTX
Nosql databases
Fayez Shayeb
 
PDF
NOSQL- Presentation on NoSQL
Ramakant Soni
 
PDF
NOSQL in big data is the not only structure langua.pdf
ajajkhan16
 
PPTX
NoSQL and MongoDB
Rajesh Menon
 
Introduction to MongoDB
Edureka!
 
Mongo DB
Edureka!
 
Build Application With MongoDB
Edureka!
 
Introduction to asdfghjkln b vfgh n v
23mz02
 
NoSQL databases - An introduction
Pooyan Mehrparvar
 
Nosql Presentation.pdf for DBMS understanding
HUSNAINAHMAD39
 
NoSQL(NOT ONLY SQL)
Rahul P
 
NoSQL and Couchbase
Sangharsh agarwal
 
NoSQL Basics and MongDB
Shamima Yeasmin Mukta
 
Relational and non relational database 7
abdulrahmanhelan
 
Introduction to nosql
Zuhaib Ansari
 
UNIT I Introduction to NoSQL.pptx
Rahul Borate
 
No SQL Databases as modern database concepts
debasisdas225831
 
Master.pptx
KarthikR780430
 
NoSQL Databases Introduction - UTN 2013
Facundo Farias
 
No sq lv2
Nusrat Sharmin
 
Nosql databases
Fayez Shayeb
 
NOSQL- Presentation on NoSQL
Ramakant Soni
 
NOSQL in big data is the not only structure langua.pdf
ajajkhan16
 
NoSQL and MongoDB
Rajesh Menon
 
Ad

More from Edureka! (20)

PDF
What to learn during the 21 days Lockdown | Edureka
Edureka!
 
PDF
Top 10 Dying Programming Languages in 2020 | Edureka
Edureka!
 
PDF
Top 5 Trending Business Intelligence Tools | Edureka
Edureka!
 
PDF
Tableau Tutorial for Data Science | Edureka
Edureka!
 
PDF
Python Programming Tutorial | Edureka
Edureka!
 
PDF
Top 5 PMP Certifications | Edureka
Edureka!
 
PDF
Top Maven Interview Questions in 2020 | Edureka
Edureka!
 
PDF
Linux Mint Tutorial | Edureka
Edureka!
 
PDF
How to Deploy Java Web App in AWS| Edureka
Edureka!
 
PDF
Importance of Digital Marketing | Edureka
Edureka!
 
PDF
RPA in 2020 | Edureka
Edureka!
 
PDF
Email Notifications in Jenkins | Edureka
Edureka!
 
PDF
EA Algorithm in Machine Learning | Edureka
Edureka!
 
PDF
Cognitive AI Tutorial | Edureka
Edureka!
 
PDF
AWS Cloud Practitioner Tutorial | Edureka
Edureka!
 
PDF
Blue Prism Top Interview Questions | Edureka
Edureka!
 
PDF
Big Data on AWS Tutorial | Edureka
Edureka!
 
PDF
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
Edureka!
 
PDF
Kubernetes Installation on Ubuntu | Edureka
Edureka!
 
PDF
Introduction to DevOps | Edureka
Edureka!
 
What to learn during the 21 days Lockdown | Edureka
Edureka!
 
Top 10 Dying Programming Languages in 2020 | Edureka
Edureka!
 
Top 5 Trending Business Intelligence Tools | Edureka
Edureka!
 
Tableau Tutorial for Data Science | Edureka
Edureka!
 
Python Programming Tutorial | Edureka
Edureka!
 
Top 5 PMP Certifications | Edureka
Edureka!
 
Top Maven Interview Questions in 2020 | Edureka
Edureka!
 
Linux Mint Tutorial | Edureka
Edureka!
 
How to Deploy Java Web App in AWS| Edureka
Edureka!
 
Importance of Digital Marketing | Edureka
Edureka!
 
RPA in 2020 | Edureka
Edureka!
 
Email Notifications in Jenkins | Edureka
Edureka!
 
EA Algorithm in Machine Learning | Edureka
Edureka!
 
Cognitive AI Tutorial | Edureka
Edureka!
 
AWS Cloud Practitioner Tutorial | Edureka
Edureka!
 
Blue Prism Top Interview Questions | Edureka
Edureka!
 
Big Data on AWS Tutorial | Edureka
Edureka!
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
Edureka!
 
Kubernetes Installation on Ubuntu | Edureka
Edureka!
 
Introduction to DevOps | Edureka
Edureka!
 

Recently uploaded (20)

PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PPTX
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
PPTX
Building a Production-Ready Barts Health Secure Data Environment Tooling, Acc...
Barts Health
 
PDF
July Patch Tuesday
Ivanti
 
PDF
Predicting the unpredictable: re-engineering recommendation algorithms for fr...
Speck&Tech
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PDF
Persuasive AI: risks and opportunities in the age of digital debate
Speck&Tech
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
Building a Production-Ready Barts Health Secure Data Environment Tooling, Acc...
Barts Health
 
July Patch Tuesday
Ivanti
 
Predicting the unpredictable: re-engineering recommendation algorithms for fr...
Speck&Tech
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
Persuasive AI: risks and opportunities in the age of digital debate
Speck&Tech
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 

HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database

  • 1. Slide 1 HBase Vs Cassandra Vs MongoDB - choose the right NoSQL database View NoSQL database Courses at : www.edureka.in *
  • 2. Slide 2 Objectives of this Session • Un For Queries during the session and class recording: Post on Twitter @edurekaIN: #askEdureka Post on Facebook /edurekaIN  Traditional databases  Challenges with traditional databases  CAP Theorem  NoSQL to the rescue  A BASE system  Choose the right NoSQL database www.edureka.in
  • 3. Slide 3 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions RDBMS/OLTP/Real Time NoSQL/New SQL/BigData DSS/OLAP/DW Oracle MySQL MS SQL DB2 Netezza SAP Hana Oracle Express MongoDB HBase Cassandra CouchDB Database Categories www.edureka.in
  • 4. Slide 4 www.edureka.in 5000 TPS Caching Layer 300 ~ 500 SQL Transaction 100 ~ 200 SQL Transaction 1000 TPS WEB APPLICATION RDBMS1 Applications Changing Data RDBMS1 Elastic Scale A Traditional database solution
  • 5. Slide 5 www.edureka.in 1000 TPS Elastic Scale WEB APPLICATION Applications Changing Data Elastic Scale CASSANDRA 300 ~ 500 SQL Transaction 100 ~ 200 SQL Transaction 5000 TPS A NoSQL database solution
  • 6. Slide 6 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in Challenges with traditional databases  Not a good fit for large Data Volume (petabytes of data) with Varying data types e.g. images, videos, text etc.  Can’t scale for large data volume e.g. 15 - 20 petabyte data in Govt. of India “AADHAR” project  Scale-up - Limited by Memory and Processing (CPU) capabilities  Scale-out - Cache dependent ‘Read’ and ‘Write’ Operations  Complex RDBMS model – Parsing, Locking, Logging, Buffer pool, Threads etc.  Sharding causes operational problems e.g. managing a shard failure  Consistency – A bottleneck for Scalability in RDBMS  Satisfying ACID is an hindrance for Scaling  Relaxed consistency to scale out with NoSQL databases
  • 7. Slide 7 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in CAP We must understand the CAP theorem when we talk about NoSQL databases or in fact when designing any distributed system. CAP theorem states that there are 3 basic requirements which exist in a special relation when designing applications for a distributed architecture. Consistency Availability Partition Tolerance CAP Theorem This means that the system is always on (service guarantee availability), no downtime. This means that the system continues to function even the communication among the servers is unreliable, i.e. the servers may be partitioned into multiple groups that cannot communicate with one another. This means that the data in the database remains consistent after the execution of an operation. For example after an update operation all clients see the same data.
  • 8. Slide 8 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions  CAP provides the basic requirements for a distributed system to follow 2 of the 3 requirements.  In theoretically it is impossible to fulfill all 3 requirements.  Therefore all the current NoSQL database follow the different combinations of the C, A, P from the CAP theorem. CAP Theorem and NoSQL databases  CA - Single site cluster, therefore all nodes are always in contact. When a partition occurs, the system blocks.  CP - Some data may not be accessible, but the rest is still consistent/accurate.  AP - System is still available under partitioning, but some of the data returned may be inaccurate. www.edureka.in
  • 9. Slide 9 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in NoSQL to the rescue  A scale-out, shared-nothing architecture, capable of running on a large number of nodes  A non-locking concurrency control mechanism so real-time reads will not conflict with writes  Scalable replication and distribution  Thousands of machines with distributed data  An architecture providing much higher per-node performance than available from the traditional SQL-based databases  Schema-less Data Model  Mostly Query and Few Updates
  • 10. Slide 10 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in  Basically Available indicates that the system does guarantee availability, in terms of the CAP theorem. Basically Available  Soft State indicates that the state of the system may change over time, even without input. This is because of the eventual consistency model. Soft State  Eventual Consistency indicates that the system will become consistent over time, given that the system doesn't receive input during that time. Eventual Consistency A BASE system gives up on consistency. NoSQL database - A BASE not ACID system
  • 11. Slide 11 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in ~ 150 No SQL Database are there in Market ~150 NoSQL database – Not a Panacea
  • 12. Slide 12 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions NoSQL Database – Storage Architecture CouchDB, MongoDB Collection of key value Connections Incomplete Data Tolerant Query Performance, No Standard Query Syntax Hbase, Cassandra Column Families Fast Look-ups Very Low Level API Amazon Simple DB, Redis Collection of Key Value pairs Fast Look-ups Stored Data has no Schema InfoGrid, Infinite Graph “Property Graph” - Nodes Graph Algorithms – Shortest Path, Connected ness, Etc Not easy to Cluster, traverse whole graph to get answer Data Model Example Weakness Strength Data Model Example Weakness Strength Data Model Example Weakness Strength Data Model Example Weakness Strength Document Data Store Databases Key Value Databases Columnar NoSQL Databases Graph NoSQL Databases No SQL Database Types www.edureka.in
  • 13. Slide 13 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Right Data Model Pros and Cons of Consistency Compromising Features of RDBMS Step 2 Step 3 Selecting a NoSQL database Step 1 www.edureka.in
  • 14. Slide 14 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Where to Use Cassandra?  If looking for simple setup, maintenance and code  Very High Velocity Random Reads & Writes  Flexible Sparse / Wide Column Requirements  No Multiple Secondary Index Needs www.edureka.in
  • 15. Slide 15 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.inwww.edureka.in Massive Scale, High Availability Cassandra Use Case - Twitter
  • 16. Slide 16 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Where NOT to Use Cassandra? Do not use Cassandra if your application has:  Secondary Indexes.  Relational Data.  Transactional (Rollback, Commit)  Primary & Financial Records.  Stringent Security & Authorization Needs On Data  Dynamic Queries on Columns.  Searching Column Data  Low Latency www.edureka.in
  • 17. Slide 17 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Where to Use HBase  Optimized for reads  Well suited for doing Range based scans  Applications with strict consistency requirements  Applications with fast read and writes with scalability  Facebook uses it to manage its user statuses, photos, chat messages etc. www.edureka.in
  • 18. Slide 18 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.inwww.edureka.in Consistency and Scale HBase Use Case - Facebook Messenger
  • 19. Slide 19 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in  It is not optimized for classic transactional applications or even relational analytics  Application that need:  full table scans  data to be aggregated, rolled up, analysed across rows Where Not to use HBase
  • 20. Slide 20 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in Where to Use MongoDB www.edureka.in  RDBMS replacement for Web Applications  Semi-structured Content Management  Real-time Analytics & High-Speed Logging  Caching and High Scalability  Web 2.0, Media, SAAS, Gaming http://www.mongodb.org/about/production-deployments/
  • 21. Slide 21 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in  MySQL for Active posts  MongoDB for Archived posts  Migrated Two billion plus posts to MongoDB  Migrated from RDBMS to MongoDB  Storage of venues and check-ins High-performance and Schema-free MongoDB Use Cases
  • 22. Slide 22 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in  Highly Transactional Applications  Applications with traditional database systems requirements such as foreign-key constraints etc. Where Not to use MongoDB
  • 23. Slide 23 www.edureka.in  Distributed and scalable big data store  Strong consistency  Built on top of Hadoop Distributed File system (HDFS)  CP on CAP Cassandra MongoDBHBase  High availability  Incremental scalability  Eventually consistent  Trade-offs between consistency and latency  Minimal administration  No SPF (Single Point of Failure)  AP on CAP  Schemas to change as applications evolve (Schema-free)  Full Index Support for High Performance.  Replication and Failover for High Availability.  Auto Sharding for Easy Scalability.  Rich Document based queries for Easy readability  CP on CAP HBase Vs Cassandra Vs MongoDB
  • 24. Slide 24 Questions? Buy NoSQL database Courses at : www.edureka.in Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in