SlideShare a Scribd company logo
Distributed Data Stores and
No SQL Databases
S. Sudarshan
IIT Bombay
Parallel Databases and Data Stores
 Relational Databases – mainstay of business
 Web-based applications caused spikes
 Especially true for public-facing e-Commerce sites
 Many application servers, one database
 Easy to parallelize application servers to 1000s of
servers, harder to parallelize databases to same scale
 First solution: memcache or other caching
mechanisms to reduce database access
Scaling Up
 What if the dataset is huge, and very high
number of transactions per second
 Use multiple servers to host database
 Parallel databases have been around for a
while
 But expensive, and designed for decision
support not OLTP
Scaling RDBMS – Master/Slave
 Master-Slave
 All writes are written to the master. All reads
performed against the replicated slave
databases
 Good for mostly read, very few update
applications
 Critical reads may be incorrect as writes may
not have been propagated down
 Large data sets can pose problems as
master needs to duplicate data to slaves
Scaling RDBMS - Partitioning
 Partitioning
 Divide the database across many machines
 E.g. hash or range partitioning
 Handled transparently by parallel databases
 but they are expensive
 “Sharding”
 Divide data amongst many cheap databases
(MySQL/PostgreSQL)
 Manage parallel access in the application
 Scales well for both reads and writes
 Not transparent, application needs to be partition-aware
What is NoSQL?
 Stands for Not Only SQL
 Class of non-relational data storage systems
 E.g. BigTable, Dynamo, PNUTS/Sherpa, ..
 Usually do not require a fixed table schema nor
do they use the concept of joins
 All NoSQL offerings relax one or more of the
ACID properties (will talk about the CAP
theorem)
 Not a backlash/rebellion against RDBMS
 SQL is a rich query language that cannot be
rivaled by the current list of NoSQL offerings
Why Now?
 Explosion of social media sites (Facebook,
Twitter) with large data needs
 Explosion of storage needs in large web
sites such as Google, Yahoo
 Much of the data is not files
 Rise of cloud-based solutions such as
Amazon S3 (simple storage solution)
 Shift to dynamically-typed data with frequent
schema changes
 Open-source community
Distributed Key-Value Data Stores
 Distributed key-value data storage systems allow
key-value pairs to be stored (and retrieved on key)
in a massively parallel system
 E.g. Google BigTable, Yahoo! Sherpa/PNUTS, Amazon
Dynamo, ..
 Partitioning, high availability etc completely
transparent to application
 Sharding systems and key-value stores don’t
support many relational features
 No join operations (except within partition)
 No referential integrity constraints across partitions
 etc.
Typical NoSQL API
 Basic API access:
 get(key) -- Extract the value given a key
 put(key, value) -- Create or update the value
given its key
 delete(key) -- Remove the key and its
associated value
 execute(key, operation, parameters) --
Invoke an operation to the value (given its
key) which is a special data structure (e.g.
List, Set, Map .... etc).
Flexible Data Model
ColumnFamily: Rockets
Key Value
1
2
3
Name Value
toon
inventoryQty
brakes
Rocket-Powered Roller Skates
Ready, Set, Zoom
5
false
name
Name Value
toon
inventoryQty
brakes
Little Giant Do-It-Yourself Rocket-Sled Kit
Beep Prepared
4
false
Name Value
toon
inventoryQty
wheels
Acme Jet Propelled Unicycle
Hot Rod and Reel
1
1
name
name
NoSQL Data Storage: Classification
 Uninterpreted key/value or ‘the big hash
table’.
 Amazon S3 (Dynamo)
 Flexible schema
 BigTable, Cassandra, HBase (ordered keys,
semi-structured data),
 Sherpa/PNuts (unordered keys, JSON)
 MongoDB (based on JSON)
 CouchDB (name/value in text)
PNUTS Data Storage Architecture
CAP Theorem
 Three properties of a system
 Consistency (all copies have same value)
 Availability (system can run even if parts have failed)
 Partitions (network can break into two or more parts,
each with active systems that can’t talk to other parts)
 Brewer’s CAP “Theorem”: You can have at most
two of these three properties for any system
 Very large systems will partition at some point
 Choose one of consistency or availablity
 Traditional database choose consistency
 Most Web applications choose availability
 Except for specific parts such as order processing
Availability
 Traditionally, thought of as the
server/process available five 9’s (99.999
%).
 However, for large node system, at almost
any point in time there’s a good chance that
a node is either down or there is a network
disruption among the nodes.
 Want a system that is resilient in the face of
network disruption
Eventual Consistency
 When no updates occur for a long period of time, eventually
all updates will propagate through the system and all the
nodes will be consistent
 For a given accepted update and a given node, eventually
either the update reaches the node or the node is removed
from service
 Known as BASE (Basically Available, Soft state, Eventual
consistency), as opposed to ACID
 Soft state: copies of a data item may be inconsistent
 Eventually Consistent – copies becomes consistent at
some later time if there are no more updates to that data
item
Common Advantages
 Cheap, easy to implement (open source)
 Data are replicated to multiple nodes (therefore
identical and fault-tolerant) and can be
partitioned
 When data is written, the latest version is on at least
one node and then replicated to other nodes
 Down nodes easily replaced
 No single point of failure
 Easy to distribute
 Don't require a schema
What does NoSQL Not Provide?
 Joins
 Group by
 But PNUTS provides interesting
materialized view approach to
joins/aggregation.
 ACID transactions
 SQL
 Integration with applications that are based
on SQL
Should I be using NoSQL Databases?
 NoSQL Data storage systems makes sense for
applications that need to deal with very very large
semi-structured data
 Log Analysis
 Social Networking Feeds
 Most of us work on organizational databases,
which are not that large and have low
update/query rates
 regular relational databases are THE correct
solution for such applications
Further Reading
 Lots of material on the Web
 E.g. nice presentation on NoSQL by Perry Hoekstra
(Perficient)
 Some material in this talk is from above
presentation
 Use a search engine to find information on data
storage systems such as
 BigTable (Google), Dynamo (Amazon), Cassandra
(Facebook/Apache), Pnuts/Sherpa (Yahoo),
CouchDB, MongoDB, …
 Several of above are open source

More Related Content

PPT
No SQL Databases as modern database concepts
debasisdas225831
 
PPT
No sql databases
Ashish Kumar Thakur
 
PPTX
UNIT I Introduction to NoSQL.pptx
Rahul Borate
 
PPTX
Introduction to Data Science NoSQL.pptx
tarakesh7199
 
PPTX
UNIT I Introduction to NoSQL.pptx
Rahul Borate
 
PDF
NoSql and it's introduction features-Unit-1.pdf
ajajkhan16
 
PPTX
NoSQL.pptx
RithikRaj25
 
PPTX
cours database pour etudiant NoSQL (1).pptx
ssuser1fde9c
 
No SQL Databases as modern database concepts
debasisdas225831
 
No sql databases
Ashish Kumar Thakur
 
UNIT I Introduction to NoSQL.pptx
Rahul Borate
 
Introduction to Data Science NoSQL.pptx
tarakesh7199
 
UNIT I Introduction to NoSQL.pptx
Rahul Borate
 
NoSql and it's introduction features-Unit-1.pdf
ajajkhan16
 
NoSQL.pptx
RithikRaj25
 
cours database pour etudiant NoSQL (1).pptx
ssuser1fde9c
 

Similar to No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;' (20)

PPTX
No sql database
vishal gupta
 
PPTX
NOSQL PRESENTATION ON INTRRODUCTION Intro.pptx
plvdravikumarit
 
PPTX
NOSQL
akbarashaikh
 
PPTX
Master.pptx
KarthikR780430
 
PPT
6269441.ppt
Swapna Jk
 
PDF
NOsql Presentation.pdf
AkshayDwivedi31
 
PPTX
NoSQL(NOT ONLY SQL)
Rahul P
 
PDF
NOSQL- Presentation on NoSQL
Ramakant Soni
 
PDF
NOSQL in big data is the not only structure langua.pdf
ajajkhan16
 
PPTX
NoSQL databases - An introduction
Pooyan Mehrparvar
 
PPTX
No sq lv2
Nusrat Sharmin
 
PPTX
Sql vs NoSQL
RTigger
 
PPT
No sql
Murat Çakal
 
PPTX
NoSQL Basics and MongDB
Shamima Yeasmin Mukta
 
PDF
NoSQL Basics - A Quick Tour
Bikram Sinha. MBA, PMP
 
PPTX
Introduction to asdfghjkln b vfgh n v
23mz02
 
PDF
the rising no sql technology
INFOGAIN PUBLICATION
 
PDF
Nosql Presentation.pdf for DBMS understanding
HUSNAINAHMAD39
 
PPSX
A Seminar on NoSQL Databases.
Navdeep Charan
 
PPT
No sql
Shruti_gtbit
 
No sql database
vishal gupta
 
NOSQL PRESENTATION ON INTRRODUCTION Intro.pptx
plvdravikumarit
 
Master.pptx
KarthikR780430
 
6269441.ppt
Swapna Jk
 
NOsql Presentation.pdf
AkshayDwivedi31
 
NoSQL(NOT ONLY SQL)
Rahul P
 
NOSQL- Presentation on NoSQL
Ramakant Soni
 
NOSQL in big data is the not only structure langua.pdf
ajajkhan16
 
NoSQL databases - An introduction
Pooyan Mehrparvar
 
No sq lv2
Nusrat Sharmin
 
Sql vs NoSQL
RTigger
 
No sql
Murat Çakal
 
NoSQL Basics and MongDB
Shamima Yeasmin Mukta
 
NoSQL Basics - A Quick Tour
Bikram Sinha. MBA, PMP
 
Introduction to asdfghjkln b vfgh n v
23mz02
 
the rising no sql technology
INFOGAIN PUBLICATION
 
Nosql Presentation.pdf for DBMS understanding
HUSNAINAHMAD39
 
A Seminar on NoSQL Databases.
Navdeep Charan
 
No sql
Shruti_gtbit
 
Ad

Recently uploaded (20)

DOCX
Modul Ajar Deep Learning Bahasa Inggris Kelas 11 Terbaru 2025
wahyurestu63
 
PPTX
Virus sequence retrieval from NCBI database
yamunaK13
 
PPTX
A Smarter Way to Think About Choosing a College
Cyndy McDonald
 
PPTX
Artificial Intelligence in Gastroentrology: Advancements and Future Presprec...
AyanHossain
 
PDF
Health-The-Ultimate-Treasure (1).pdf/8th class science curiosity /samyans edu...
Sandeep Swamy
 
PPTX
CONCEPT OF CHILD CARE. pptx
AneetaSharma15
 
PPTX
HISTORY COLLECTION FOR PSYCHIATRIC PATIENTS.pptx
PoojaSen20
 
PPTX
Five Point Someone – Chetan Bhagat | Book Summary & Analysis by Bhupesh Kushwaha
Bhupesh Kushwaha
 
PDF
Biological Classification Class 11th NCERT CBSE NEET.pdf
NehaRohtagi1
 
DOCX
Unit 5: Speech-language and swallowing disorders
JELLA VISHNU DURGA PRASAD
 
PDF
Module 2: Public Health History [Tutorial Slides]
JonathanHallett4
 
PPTX
Cleaning Validation Ppt Pharmaceutical validation
Ms. Ashatai Patil
 
PPTX
Dakar Framework Education For All- 2000(Act)
santoshmohalik1
 
PPTX
Information Texts_Infographic on Forgetting Curve.pptx
Tata Sevilla
 
PPTX
Applications of matrices In Real Life_20250724_091307_0000.pptx
gehlotkrish03
 
PPTX
Tips Management in Odoo 18 POS - Odoo Slides
Celine George
 
PPTX
An introduction to Prepositions for beginners.pptx
drsiddhantnagine
 
PPTX
Gupta Art & Architecture Temple and Sculptures.pptx
Virag Sontakke
 
PPTX
How to Apply for a Job From Odoo 18 Website
Celine George
 
PPTX
CARE OF UNCONSCIOUS PATIENTS .pptx
AneetaSharma15
 
Modul Ajar Deep Learning Bahasa Inggris Kelas 11 Terbaru 2025
wahyurestu63
 
Virus sequence retrieval from NCBI database
yamunaK13
 
A Smarter Way to Think About Choosing a College
Cyndy McDonald
 
Artificial Intelligence in Gastroentrology: Advancements and Future Presprec...
AyanHossain
 
Health-The-Ultimate-Treasure (1).pdf/8th class science curiosity /samyans edu...
Sandeep Swamy
 
CONCEPT OF CHILD CARE. pptx
AneetaSharma15
 
HISTORY COLLECTION FOR PSYCHIATRIC PATIENTS.pptx
PoojaSen20
 
Five Point Someone – Chetan Bhagat | Book Summary & Analysis by Bhupesh Kushwaha
Bhupesh Kushwaha
 
Biological Classification Class 11th NCERT CBSE NEET.pdf
NehaRohtagi1
 
Unit 5: Speech-language and swallowing disorders
JELLA VISHNU DURGA PRASAD
 
Module 2: Public Health History [Tutorial Slides]
JonathanHallett4
 
Cleaning Validation Ppt Pharmaceutical validation
Ms. Ashatai Patil
 
Dakar Framework Education For All- 2000(Act)
santoshmohalik1
 
Information Texts_Infographic on Forgetting Curve.pptx
Tata Sevilla
 
Applications of matrices In Real Life_20250724_091307_0000.pptx
gehlotkrish03
 
Tips Management in Odoo 18 POS - Odoo Slides
Celine George
 
An introduction to Prepositions for beginners.pptx
drsiddhantnagine
 
Gupta Art & Architecture Temple and Sculptures.pptx
Virag Sontakke
 
How to Apply for a Job From Odoo 18 Website
Celine George
 
CARE OF UNCONSCIOUS PATIENTS .pptx
AneetaSharma15
 
Ad

No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'

  • 1. Distributed Data Stores and No SQL Databases S. Sudarshan IIT Bombay
  • 2. Parallel Databases and Data Stores  Relational Databases – mainstay of business  Web-based applications caused spikes  Especially true for public-facing e-Commerce sites  Many application servers, one database  Easy to parallelize application servers to 1000s of servers, harder to parallelize databases to same scale  First solution: memcache or other caching mechanisms to reduce database access
  • 3. Scaling Up  What if the dataset is huge, and very high number of transactions per second  Use multiple servers to host database  Parallel databases have been around for a while  But expensive, and designed for decision support not OLTP
  • 4. Scaling RDBMS – Master/Slave  Master-Slave  All writes are written to the master. All reads performed against the replicated slave databases  Good for mostly read, very few update applications  Critical reads may be incorrect as writes may not have been propagated down  Large data sets can pose problems as master needs to duplicate data to slaves
  • 5. Scaling RDBMS - Partitioning  Partitioning  Divide the database across many machines  E.g. hash or range partitioning  Handled transparently by parallel databases  but they are expensive  “Sharding”  Divide data amongst many cheap databases (MySQL/PostgreSQL)  Manage parallel access in the application  Scales well for both reads and writes  Not transparent, application needs to be partition-aware
  • 6. What is NoSQL?  Stands for Not Only SQL  Class of non-relational data storage systems  E.g. BigTable, Dynamo, PNUTS/Sherpa, ..  Usually do not require a fixed table schema nor do they use the concept of joins  All NoSQL offerings relax one or more of the ACID properties (will talk about the CAP theorem)  Not a backlash/rebellion against RDBMS  SQL is a rich query language that cannot be rivaled by the current list of NoSQL offerings
  • 7. Why Now?  Explosion of social media sites (Facebook, Twitter) with large data needs  Explosion of storage needs in large web sites such as Google, Yahoo  Much of the data is not files  Rise of cloud-based solutions such as Amazon S3 (simple storage solution)  Shift to dynamically-typed data with frequent schema changes  Open-source community
  • 8. Distributed Key-Value Data Stores  Distributed key-value data storage systems allow key-value pairs to be stored (and retrieved on key) in a massively parallel system  E.g. Google BigTable, Yahoo! Sherpa/PNUTS, Amazon Dynamo, ..  Partitioning, high availability etc completely transparent to application  Sharding systems and key-value stores don’t support many relational features  No join operations (except within partition)  No referential integrity constraints across partitions  etc.
  • 9. Typical NoSQL API  Basic API access:  get(key) -- Extract the value given a key  put(key, value) -- Create or update the value given its key  delete(key) -- Remove the key and its associated value  execute(key, operation, parameters) -- Invoke an operation to the value (given its key) which is a special data structure (e.g. List, Set, Map .... etc).
  • 10. Flexible Data Model ColumnFamily: Rockets Key Value 1 2 3 Name Value toon inventoryQty brakes Rocket-Powered Roller Skates Ready, Set, Zoom 5 false name Name Value toon inventoryQty brakes Little Giant Do-It-Yourself Rocket-Sled Kit Beep Prepared 4 false Name Value toon inventoryQty wheels Acme Jet Propelled Unicycle Hot Rod and Reel 1 1 name name
  • 11. NoSQL Data Storage: Classification  Uninterpreted key/value or ‘the big hash table’.  Amazon S3 (Dynamo)  Flexible schema  BigTable, Cassandra, HBase (ordered keys, semi-structured data),  Sherpa/PNuts (unordered keys, JSON)  MongoDB (based on JSON)  CouchDB (name/value in text)
  • 12. PNUTS Data Storage Architecture
  • 13. CAP Theorem  Three properties of a system  Consistency (all copies have same value)  Availability (system can run even if parts have failed)  Partitions (network can break into two or more parts, each with active systems that can’t talk to other parts)  Brewer’s CAP “Theorem”: You can have at most two of these three properties for any system  Very large systems will partition at some point  Choose one of consistency or availablity  Traditional database choose consistency  Most Web applications choose availability  Except for specific parts such as order processing
  • 14. Availability  Traditionally, thought of as the server/process available five 9’s (99.999 %).  However, for large node system, at almost any point in time there’s a good chance that a node is either down or there is a network disruption among the nodes.  Want a system that is resilient in the face of network disruption
  • 15. Eventual Consistency  When no updates occur for a long period of time, eventually all updates will propagate through the system and all the nodes will be consistent  For a given accepted update and a given node, eventually either the update reaches the node or the node is removed from service  Known as BASE (Basically Available, Soft state, Eventual consistency), as opposed to ACID  Soft state: copies of a data item may be inconsistent  Eventually Consistent – copies becomes consistent at some later time if there are no more updates to that data item
  • 16. Common Advantages  Cheap, easy to implement (open source)  Data are replicated to multiple nodes (therefore identical and fault-tolerant) and can be partitioned  When data is written, the latest version is on at least one node and then replicated to other nodes  Down nodes easily replaced  No single point of failure  Easy to distribute  Don't require a schema
  • 17. What does NoSQL Not Provide?  Joins  Group by  But PNUTS provides interesting materialized view approach to joins/aggregation.  ACID transactions  SQL  Integration with applications that are based on SQL
  • 18. Should I be using NoSQL Databases?  NoSQL Data storage systems makes sense for applications that need to deal with very very large semi-structured data  Log Analysis  Social Networking Feeds  Most of us work on organizational databases, which are not that large and have low update/query rates  regular relational databases are THE correct solution for such applications
  • 19. Further Reading  Lots of material on the Web  E.g. nice presentation on NoSQL by Perry Hoekstra (Perficient)  Some material in this talk is from above presentation  Use a search engine to find information on data storage systems such as  BigTable (Google), Dynamo (Amazon), Cassandra (Facebook/Apache), Pnuts/Sherpa (Yahoo), CouchDB, MongoDB, …  Several of above are open source