Chapter 4
NoSQL
Dr G Sudha Sadasivam
Mrs R Thirumahal
Agenda
• SQL vs NoSQL
• Limitations and advantages of NoSQL
• Types of NoSQL Stores with example
– KV store
– Column family
– Document
– Graph
• Comparison of NoSQL stores
• Principles of NoSQL models
• CAP
• BASE
• Polyglot persistence in ecommerce application
Introduction
• Coined by Carlo Strozzi in 1998
• Relational systems have
ACID properties, are transactional and hence performance degradation
Centralised control
rigid schema resulting in lack of flexibility and scalability.
• NoSQL – Not only SQL
• Schema less and hence
» have simple and fast data access
» Can store voluminous data
» Can store unstructured data from multiple sources
• work with large volumes of distributed data.
• have high operational speed, great flexibility, horizontal scalability
• BASE properties with eventual consistency
• Possess shared-nothing architecture
• Supports auto-sharding & replication;
• parallelism & distributed querying
NoSQL systems are complementary to SQL systems
s
Limitations
• cannot be used for transactional applications that have
constraints and consistency requirements
• Being schemaless necessitates use of constraints by app
developer
• Multiple data stores makes interoperability difficult
• Eventual consistency: changes in data will be updated to
all copies with a time lag
• Vendor lock-in: Each NoSQL data store exists as a silo
resulting in high coupling between data store and the
application.
• Lack of expertise in the usage of the NoSQL stores.
• NoSQL databases suffer from security issues based on
authentication, authorization and storage security.
• Key-value (KV) stores
• Associative arrays (dictionary)
• key-value pairs with unique ordered keys for every value.
• Good performance, so used for session management and caching
• RAM as in Memcached or secondary memory as in MemcacheDB.
• Document stores
• Organise data as a collection of documents with unique keys.
• information can be retrieved based on the contents of the document.
• Collections are analogous to tables & documents to records in a table.
• every document can have different fields.
• suitable to manage content and mobile data.
• MongoDB and Couch DB.
• Column family stores
– data is stored in columns instead of rows.
– columns with different types of data can be aggregated as a column
family for querying.
– HBase and BigTable are column family data stores.
• Graph data stores
– Entities in social networks are connected by relationships represented
by graphs ---- Neo4j
TYPES OF NoSQL STORES
KV Store:
Each record is stored in a row &read using RecordReader in HDFS
Each attribute is separated by a comma & extracted using a comma separator.
Column Family Store
Customer Table has 2 col families – Name & Address along with orders with TS
Order Table has Price and Item column families
Document Store
Two collections namely, Customer and Order.
Customer has 2 documents (rows) while Order has 3 documents
Graph Store:
Entities are CustID with Name, Address, OrderID with Price and Items.
EXAMPLE
RELATIONAL
Logical organization
in KV store
Physical organization in
KV store
Column Family Store
Document collection 1
Document 1
Document 2
Document 1
Document 2
Document 3
Document collection 2
Document Store
Graph datastore
• KV stores are simple and powerful but cannot process a range of
keys.
• Ordered KV stores can be used, but cannot model values.
• Column families model values as map-of-map-of-maps in terms of
column families, aggregated from columns aggregated from
timestamp values.
• Document stores can model values not only as aggregates but also
schema of arbitrary complexity. They also provide indexing based
on field names/keys.
• Graph data stores extend ordered KV systems by linking various
keys as a graph rather than a hierarchical model
Comparison
Comparison
CAP
• Eric Brewer proposed the Consistency, Availability, Partition
tolerance (CAP) theory in 2000
• Consistency is the ability to obtain same data from multiple
replicas. Consistency compliance ensures that all the cluster
nodes should have access to the same data.
• Availability is the ability of a system to continue its operation
even when some hardware/software components fail.
• Partition tolerance is the ability of the system to continue
operation a partitioned network due to network failures. It
guarantees independence of various data partitions.
Replication facilitates the availability of data. Eventual consistency
ensures that replicas are not stale. Partitioning ensures load
distribution and scalability.
• Only 2 can be satisfied at a time
– AP follows BASE properties with eventual consistency.
eg. Amazon’s Dynamo DB without strict consistency
– CP: ACID properties with strict consistency. Pessimistic
locking ensures consistency.
eg. MongoDB and MemChache A CA system.
– CA : cannot operate under network partitions and hence it is
neither ACID nor BASE. 2 phase commit protocol is used. For
eg Relational and Big table
BASE
• Web 2.0 applications
• basically available, soft state and eventually consistent
• works basically all the time
• Due to eventual consistency, maintains softstate
ACID BASE
Atomicity, Consistency, Isolation,
Durability
Basically Available, Softstate,
eventually consistent
Strong consistency Weak consistency
Consistency and Isolation first Availability first
Nested Transactions Approximate Answers
Conservative Simple
Schema Schema-less
Case Study
• Polyglot persistence applies multiple data storage technologies to
meet the needs of an application.
• Consider an e-commerce application with shopping cart,
inventory, orders, catalogue and customer details.
1. User sessions / activity logs require efficient read/write
operations - KV stores
2. Point of Sales high ingestion rate with high volume of write
operations. KV stores (storage) ; Column family (analytics)
3. Shopping cart requires high availability, and aggregates
information. Document Store.
4. Product Catalogue has frequent reads and infrequent writes.
They must also support aggregation. Document stores
5. Product recommendations are made based on similar products
or users. Graph Store
6. Financial data is relational and requires transactional updates –
RDBMS
Exercises to be completed
• Consider the case study of AAA coffee shop in test 1 - Identify the
type of NoSQL stores that can be used for each and justify
• Consider a table with student details (Roll No, First Name, last
name, Department, Programme, Year, Semester), and faculty details
(FacultyId, First Name, Last Name, Department, Course handled1,
Course Handled 2, Course handled 3). Design keyvalue, column
family, document & graph databases for the same.
• Exercises in MongoDB. Create a data base in MongoDB for storing
patient and doctor details. Insert patient details and doctor details.
Establish connection between doctor and patient. Modify doctor
details for a patient. Add 2 /more doctors for a patient named XXX.
Identify count of patients under a doctor. If patints count > 4, allot a
new dotor to the patient. Allot doctor to patient based on
specialisation. If patients to a doctor becomes 0 generate an alert
message. If a doctor leaves a hospital, then delete doctor from
database, allot a new doctor based on speciality to his / her
patients.
Neo4j
• Create a Neo4j database with 5 people giving
their attributed, friendship relations. Create
new persons with attaributes. Create
relationships, modify relationships. Identify
how many friends a person has. Identify
friend-of-friend relationships.
Conclusion
• SQL vs NoSQL
• Limitations and advantages of NoSQL
• Types of NoSQL Stores with example
– KV store
– Column family
– Document
– Graph
• Comparison of NoSQL stores
• CAP
• BASE
• Polyglot persistence in ecommerce application
• Exercises in MongoDB & Neo4j

Use a data parallel approach to proAcess

  • 1.
    Chapter 4 NoSQL Dr GSudha Sadasivam Mrs R Thirumahal
  • 2.
    Agenda • SQL vsNoSQL • Limitations and advantages of NoSQL • Types of NoSQL Stores with example – KV store – Column family – Document – Graph • Comparison of NoSQL stores • Principles of NoSQL models • CAP • BASE • Polyglot persistence in ecommerce application
  • 3.
    Introduction • Coined byCarlo Strozzi in 1998 • Relational systems have ACID properties, are transactional and hence performance degradation Centralised control rigid schema resulting in lack of flexibility and scalability. • NoSQL – Not only SQL • Schema less and hence » have simple and fast data access » Can store voluminous data » Can store unstructured data from multiple sources • work with large volumes of distributed data. • have high operational speed, great flexibility, horizontal scalability • BASE properties with eventual consistency • Possess shared-nothing architecture • Supports auto-sharding & replication; • parallelism & distributed querying NoSQL systems are complementary to SQL systems
  • 4.
  • 5.
    Limitations • cannot beused for transactional applications that have constraints and consistency requirements • Being schemaless necessitates use of constraints by app developer • Multiple data stores makes interoperability difficult • Eventual consistency: changes in data will be updated to all copies with a time lag • Vendor lock-in: Each NoSQL data store exists as a silo resulting in high coupling between data store and the application. • Lack of expertise in the usage of the NoSQL stores. • NoSQL databases suffer from security issues based on authentication, authorization and storage security.
  • 6.
    • Key-value (KV)stores • Associative arrays (dictionary) • key-value pairs with unique ordered keys for every value. • Good performance, so used for session management and caching • RAM as in Memcached or secondary memory as in MemcacheDB. • Document stores • Organise data as a collection of documents with unique keys. • information can be retrieved based on the contents of the document. • Collections are analogous to tables & documents to records in a table. • every document can have different fields. • suitable to manage content and mobile data. • MongoDB and Couch DB. • Column family stores – data is stored in columns instead of rows. – columns with different types of data can be aggregated as a column family for querying. – HBase and BigTable are column family data stores. • Graph data stores – Entities in social networks are connected by relationships represented by graphs ---- Neo4j TYPES OF NoSQL STORES
  • 7.
    KV Store: Each recordis stored in a row &read using RecordReader in HDFS Each attribute is separated by a comma & extracted using a comma separator. Column Family Store Customer Table has 2 col families – Name & Address along with orders with TS Order Table has Price and Item column families Document Store Two collections namely, Customer and Order. Customer has 2 documents (rows) while Order has 3 documents Graph Store: Entities are CustID with Name, Address, OrderID with Price and Items. EXAMPLE RELATIONAL
  • 8.
    Logical organization in KVstore Physical organization in KV store
  • 9.
  • 10.
    Document collection 1 Document1 Document 2 Document 1 Document 2 Document 3 Document collection 2 Document Store
  • 11.
  • 12.
    • KV storesare simple and powerful but cannot process a range of keys. • Ordered KV stores can be used, but cannot model values. • Column families model values as map-of-map-of-maps in terms of column families, aggregated from columns aggregated from timestamp values. • Document stores can model values not only as aggregates but also schema of arbitrary complexity. They also provide indexing based on field names/keys. • Graph data stores extend ordered KV systems by linking various keys as a graph rather than a hierarchical model
  • 13.
  • 14.
  • 15.
    CAP • Eric Brewerproposed the Consistency, Availability, Partition tolerance (CAP) theory in 2000 • Consistency is the ability to obtain same data from multiple replicas. Consistency compliance ensures that all the cluster nodes should have access to the same data. • Availability is the ability of a system to continue its operation even when some hardware/software components fail. • Partition tolerance is the ability of the system to continue operation a partitioned network due to network failures. It guarantees independence of various data partitions. Replication facilitates the availability of data. Eventual consistency ensures that replicas are not stale. Partitioning ensures load distribution and scalability.
  • 16.
    • Only 2can be satisfied at a time – AP follows BASE properties with eventual consistency. eg. Amazon’s Dynamo DB without strict consistency – CP: ACID properties with strict consistency. Pessimistic locking ensures consistency. eg. MongoDB and MemChache A CA system. – CA : cannot operate under network partitions and hence it is neither ACID nor BASE. 2 phase commit protocol is used. For eg Relational and Big table
  • 17.
    BASE • Web 2.0applications • basically available, soft state and eventually consistent • works basically all the time • Due to eventual consistency, maintains softstate ACID BASE Atomicity, Consistency, Isolation, Durability Basically Available, Softstate, eventually consistent Strong consistency Weak consistency Consistency and Isolation first Availability first Nested Transactions Approximate Answers Conservative Simple Schema Schema-less
  • 18.
    Case Study • Polyglotpersistence applies multiple data storage technologies to meet the needs of an application. • Consider an e-commerce application with shopping cart, inventory, orders, catalogue and customer details. 1. User sessions / activity logs require efficient read/write operations - KV stores 2. Point of Sales high ingestion rate with high volume of write operations. KV stores (storage) ; Column family (analytics) 3. Shopping cart requires high availability, and aggregates information. Document Store. 4. Product Catalogue has frequent reads and infrequent writes. They must also support aggregation. Document stores 5. Product recommendations are made based on similar products or users. Graph Store 6. Financial data is relational and requires transactional updates – RDBMS
  • 20.
    Exercises to becompleted • Consider the case study of AAA coffee shop in test 1 - Identify the type of NoSQL stores that can be used for each and justify • Consider a table with student details (Roll No, First Name, last name, Department, Programme, Year, Semester), and faculty details (FacultyId, First Name, Last Name, Department, Course handled1, Course Handled 2, Course handled 3). Design keyvalue, column family, document & graph databases for the same. • Exercises in MongoDB. Create a data base in MongoDB for storing patient and doctor details. Insert patient details and doctor details. Establish connection between doctor and patient. Modify doctor details for a patient. Add 2 /more doctors for a patient named XXX. Identify count of patients under a doctor. If patints count > 4, allot a new dotor to the patient. Allot doctor to patient based on specialisation. If patients to a doctor becomes 0 generate an alert message. If a doctor leaves a hospital, then delete doctor from database, allot a new doctor based on speciality to his / her patients.
  • 21.
    Neo4j • Create aNeo4j database with 5 people giving their attributed, friendship relations. Create new persons with attaributes. Create relationships, modify relationships. Identify how many friends a person has. Identify friend-of-friend relationships.
  • 22.
    Conclusion • SQL vsNoSQL • Limitations and advantages of NoSQL • Types of NoSQL Stores with example – KV store – Column family – Document – Graph • Comparison of NoSQL stores • CAP • BASE • Polyglot persistence in ecommerce application • Exercises in MongoDB & Neo4j