SlideShare a Scribd company logo
1PolicyBazaar.com
Ranjeet Kumar Jha
Reachable:
• ranjeet@policybazaar.co
m
• Cell: +91 9811006657
Exp:
• Java JEE: 13+
• NoSQL/BigData: 4+
2
LinkedIn: https://in.linkedin.com/in/jharanjeet
(Oracle Certified Enterprise Architect)
PolicyBazaar.com
Agenda
• Before SQL and After SQL
• NoSQL universe
• Trend of NoSQL
• Characteristic of BigData
3V
• Where to use NoSQL
• What NoSQL must deliver
• Classification of NoSQL
databases
• Size Vs Complexity
• Visual Guide of CAP
Theorem
• Overview of key/Value
Store
• Overview of Document
Store
• Overview of Column
Family Store
• Overview of Graph Store
• Use Case of Twitter
3PolicyBazaar.com
Three Eras of Databases
4
Note: The era of using RDBMSes for all problems is over. Instead
we should use the database most suited for the problem at hand.
PolicyBazaar.com
Before NoSQL DB Selection Was Easy!
5PolicyBazaar.com
Big Data Definition
• Volumes & volumes of data
• Unstructured
• Semi-structured
• Not suited for Relational Databases
• Often utilizes MapReduce frameworks
6PolicyBazaar.com
Databases Universe
7Source: http://arxiv.org/ftp/arxiv/papers/1307/1307.0191.pdfPolicyBazaar.com
The NO-SQL Universe
8PolicyBazaar.com
Before NoSQL
9PolicyBazaar.com
Pressures on Single Node RDBMS
Architectures
10PolicyBazaar.com
After NoSQL
11PolicyBazaar.com
RDBMS vs. NoSQL
12
Source: http://www.google.com/trends/explore#q=nosql%2C%20rdbms&date=1%2F2009%2051m&cmpt=q
PolicyBazaar.com
NoSQL or SQL?
• Wrong question
• What is your problem?
– Transactions
– Amount of data
– Data structure
– Scale-out Vs Scale-up
– OLTP Or OLAP
13PolicyBazaar.com
What is your problem…
• Key Evaluation Requirements
– Transactional, Durability & Consistency
– Response time
– Functionality
– Data characteristics
– Scalability, Clustering
– Failover
– Maintenance, Online changes, Node Management
– Maturity
– Community, Support
– Hosted or Managed
– Cost, open source
14PolicyBazaar.com
Why NOSQL Now?
•Trend 1: Size
•Trend 2: Connectedness
•Trend 3: Semi-structure
•Trend 4: Architecture
15PolicyBazaar.com
Character of Big Data: 3V
• Volume: Large volumes of data
– Today, Facebook ingests 500 terabytes of new data every day; a Boeing 737 will
generate 240 terabytes of flight data during a single flight across the US
• Velocity: rate of moving data
– E.g. Clickstreams and ad impressions capture user behavior at millions of events per
second;
• Variety: structured, semi structure, unstructured,
images, etc.
– Big Data data isn't just numbers, dates, and strings. Big Data is also geospatial data, 3D
data, audio and video, and unstructured text, including log files and social media
Source: http://www-01.ibm.com/software/data/bigdata/
16PolicyBazaar.com
Many Uses of Data
• Transactions (OLTP)
• Analysis (OLAP)
• Search and Findability
• Enterprise Agility
• Speed and Reliability
• Consistency and Availability
• Or anything else…
17PolicyBazaar.com
Where to use NoSQL?
• Social data
• Data processing (Hadoop)
• Search (Lucene)
• Caching (Memcache, ...)
• Data Warehousing
• Logging
• ...
18PolicyBazaar.com
What NoSQL must deliver
• Massive scalability
– No application-level sharding
• Performance
• High Availability/Fault Tolerance
• Ease of use
– Simple operations/administration
– No application-level sharding
– Simple APIs
– Quickly evolve application & schema
19PolicyBazaar.com
Classification of NoSQL Databases
• Key-Value
– Very popular for simple key-value lookup: disk/memory. e.g
Dynamo, Redis,, Voldemort, MemcachedDB, Berkeley, HazelCast etc
• Document
– Popular for document type storage. e.g. MongoDB, OrientDB, CouchDB,
Riak etc.
• Column Family
– Key value with fixed column families, allows dynamic columns
within column family. E.g. Cassandra, BigTable, HBase, Hypertable etc
• Graph
– Connected graph with entity Relationship. e.g.Titan, Neo4j,
infiniteGraph
20PolicyBazaar.com
NoSQL Store
• Key-Value Stores
– Dynamo Clones
• Redis
• Membase
• Riak
• Tokyo Cabinet
• Voldemort
• Document Stores
– MongoDB
– CouchDB
– SimpleDB
• Column Family
– BigTable Clones
• Cassandra
• Hbase
• HyperTable
• Graph Databases
– Neo4J
– Titan
– InfoGrid
– AllegroGraph
21PolicyBazaar.com
NOSQL: Size Vs Complexity
22
Sources: http://blogs.neotechnology.com/emil/2009/11/nosql-scaling-to-size-
and-scaling-to-complexity.html
PolicyBazaar.com
Visual Guide to NoSQL
23Sources: http://blog.nahurst.com/visual-guide-to-nosql-systemsPolicyBazaar.com
Key-Value Store
• Focus on scaling to huge amounts of data
• Designed to handle massive load
• Based on Amazon’s Dynamo paper
• Data model: (global) collection of Key-Value
pairs
• Dynamo ring partitioning and replication
24PolicyBazaar.com
Types of Key-Value Stores
• Eventually-consistent key-value store
• Hierarchical key-value stores
• Key-Value stores in RAM
• Key-Value stores on disk
• High availability key-value store
• Ordered key-value stores
• Values that allow simple list operations
25PolicyBazaar.com
Key / value stores (Opaque)
• Keys are mapped to values
• Values are treated as BLOBs (opaque data)
• No type information is stored
• Values can be heterogeneous
• Example values:
{ name: “ranjeet“, age: 35, city: “DL“ } => JSON, but store will not care about it
xdexadxb0x0b => binary, but store will not care about it
26
Key Value
PolicyBazaar.com
• Open source in-memory key-value store with
optional durability
• Focus on high speed reads and writes of
common data structures to RAM
• Allows simple lists, sets and hashes to be
stored within the value and manipulated
• Many features that developers like
– expiration, transactions, pub/sub, partitioning
27PolicyBazaar.com
BigTable clones
• Like column oriented Relational Databases,
but with a twist
• Tables similarly to RDBMS, but handles semi-
structured
• Based on Google’s BigTable paper
28PolicyBazaar.com
Document Store
• Data stored in nested hierarchies
• Logical data remains stored together as a unit
• Any item in the document can be queried
• Similar to Key-Value stores, but the DB knows
what the Value is
• Inspired by Lotus Notes
• Documents are often versioned
29PolicyBazaar.com
Document Store …
• Data model: Collections of Key-Value
collections
• Pros: No object-relational mapping layer, ideal
for search, Schema less
• Cons: Complex to implement, incompatible
with SQL
• Examples: MongoDB, Couchbase, CouchDB
30PolicyBazaar.com
MongoDB (DocumentDB)
• Open Source JSON data store created
by 10gen
• Master-slave scale out model
• Strong developer community
• Sharding built-in, automatic
• Implemented in C++ with many APIs
(C++, JavaScript, Java, .net, Perl, Python etc.)
31PolicyBazaar.com
Column-Family
• Key includes a row, column family and column
name
• Store versioned blobs in one large table
• Queries can be done on rows, column families
and column names
• Pros: Great scale out, Performant, versioning
• Cons: Cannot query blob content, row and
column designs are critical
• Examples: Cassandra, Bigtable, HBase, Hypertable, Apache
Accumulo
32PolicyBazaar.com
The Evolution of Cassandra
33PolicyBazaar.com
Cassandra
• Apache open source column family database
supported by DataStax
• Peer-to-peer distribution model
• Strong reputation for linear scale out (millions
of writes/second)
• Database side security
• Written in Java and works well with HDFS and
MapReduce
34PolicyBazaar.com
Cassandra: Feature Headlines
• Elastic
– Read and write throughput increases linearly as
new machines are
• Decentralized
– Fault tolerant with no single point of failure; no
“master” node
• Rich data model
– Column based, range slices, column slices,
secondary indexes, counters, expiring columns
35
Source: http://cassandra.apache.org/
PolicyBazaar.com
• Apache Hadoop is a framework that allows for the
distributed processing of large data sets across clusters of
commodity computers using a simple programming model.
It is designed to scale up from single servers to thousands
of machines, each providing computation and storage.
• Hadoop is an open-source implementation of Google
MapReduce, GFS(distributed file system).
• Hadoop was created by Doug Cutting, the creator of Apache
Lucene, the widely used text search library.
• Hadoop fulfill need of common infrastructure
– Efficient, reliable, easy to use
– Open Source, Apache License Hadoop origins
36PolicyBazaar.com
HBase /Hadoop
• Open source implementation of MapReduce
algorithm written in Java
• Initially created by Yahoo
• Column-oriented data store
• Java interface
• HBase designed specifically to work with Hadoop
• High-level query language (Pig)
• Strong support by many vendors
37PolicyBazaar.com
Graph Store
• Focus on modeling the structure of data -
interconnectivity
• Scales to the complexity of the data
• Inspired by mathematical Graph Theory ( G=(E,V)
) Data is stored in a series of nodes, relationships
and properties
• Queries are really graph traversals
• Data is stored in a series of nodes, relationships
and properties
• Ideal when relationships between data is key:
– e.g. social networks
38PolicyBazaar.com
Graph Store (cont..)
• Ideal when relationships between data is key:
– e.g. social networks
• Data model: “Property Graph” ‣Nodes
‣Relationships/Edges between Nodes ‣Key-Value
pairs on both ‣Possibly Edge Labels and/or Node/
Edge Types
• Pros: fast network search, works with public
linked data sets
• Cons: specialized query languages (RDF uses
SPARQL) , gramlin, cypher)
• Examples: Neo4j, Titan, AllegroGraph, InfiniteGraph..
39PolicyBazaar.com
Graph Stores (cont..)
• Used when the relationship and relationships
types between items are critical
• Used for
– Social networking queries: "friends of my friends"
– Inference and rules engines
– Pattern recognition
– Used for working with open-linked data
• Automate "joins" of public data
40PolicyBazaar.com
Property Graph model
• Nodes i.e. Vertex
• Relationships between Nodes i.e Edge
• Relationships have Labels
• Relationships are directed, but traversed at equal
speed in both directions
• The semantics of the direction is up to the
application (LIVES WITH is reflexive, LOVES is not)
• Nodes have key-value properties
• Relationships have key-value properties
41PolicyBazaar.com
Neo4J
• Graph database designed to be easy to
use by Java developers
• Dual license (community edition is
GPL)
• Works as an embedded java library in
your application
• Disk-based (not just RAM)
• Full ACID
42PolicyBazaar.com
Decides what you need
• SQL
– Relational, transactional processing
• NoSQL
– Non relational, distributed, high performance and
highly scalable
• Analytics, Warehouse, BigData
– Data Warehousing, Analytics, Data science, and
reporting
• Combination of all 3
– Begin with SQL, NoSQL and eventually need BigData/
Analytics platform
43PolicyBazaar.com
Finally… in One liner…
• SQL
– Works great , can’t easily scale.
• NoSQL
• Works great , can’t fit for all
• Analytics, BigData
– Every Business need it.
44PolicyBazaar.com
Use Case: Twitter
• Twitter challenges
– Needs to store many graphs
• Who you are following
• Who’s following you
• Who you receive phone notifications from etc
– To deliver a tweet requires rapid paging of followers
– Heavy write load as followers are added and removed
– Set arithmetic for @mentions (intersection of users).
45PolicyBazaar.com
Use Case: Twitter …
• What did they try?
• Started with Relational Databases
• Tried Key-Value storage of denormalized lists
• Did it work?
– Nope
– Either good at Handling the write load or paging
large amounts of data But not both
46PolicyBazaar.com
Open source implementations to play
with!
• MongoDB - http://www.mongodb.org/
• Cassandra - http://cassandra.apache.org/
• Neo4j - http://neo4j.org/
• Hadoop + Hbase - http://hadoop.apache.org/
• Redis - http://code.google.com/p/redis/
• Oracle Berkley DB - http://www.oracle.com/
database/berkeley-db/
• … and Many more…
47PolicyBazaar.com
Thank You
For any Query or feedback write to me
ranjeet@policyBazaar.com
ranjeet.kr@gmail.com
PolicyBazaar.com 48

More Related Content

PPTX
MongoDB & Hadoop - Understanding Your Big Data
MongoDB
 
PPTX
No SQL- The Future Of Data Storage
Bethmi Gunasekara
 
PPTX
Big Data: Guidelines and Examples for the Enterprise Decision Maker
MongoDB
 
PPT
9. Document Oriented Databases
Fabio Fumarola
 
PPTX
NoSql Data Management
sameerfaizan
 
PPTX
Sharing a Startup’s Big Data Lessons
George Stathis
 
PPTX
Key-Value NoSQL Database
Heman Hosainpana
 
PPTX
An Introduction to Big Data, NoSQL and MongoDB
William LaForest
 
MongoDB & Hadoop - Understanding Your Big Data
MongoDB
 
No SQL- The Future Of Data Storage
Bethmi Gunasekara
 
Big Data: Guidelines and Examples for the Enterprise Decision Maker
MongoDB
 
9. Document Oriented Databases
Fabio Fumarola
 
NoSql Data Management
sameerfaizan
 
Sharing a Startup’s Big Data Lessons
George Stathis
 
Key-Value NoSQL Database
Heman Hosainpana
 
An Introduction to Big Data, NoSQL and MongoDB
William LaForest
 

What's hot (20)

PPTX
Data Modeling for NoSQL
Tony Tam
 
PDF
The Hadoop Ecosystem for Developers
Zohar Elkayam
 
PPTX
Mongo db
Kowndinya Mannepalli
 
PPTX
When to Use MongoDB...and When You Should Not...
MongoDB
 
PPTX
Non relational databases-no sql
Ram kumar
 
PDF
Common MongoDB Use Cases
DATAVERSITY
 
PDF
introduction to Neo4j (Tabriz Software Open Talks)
Farzin Bagheri
 
PPTX
NOSQL Databases types and Uses
Suvradeep Rudra
 
ODP
Nonrelational Databases
Udi Bauman
 
PDF
Relational vs. Non-Relational
PostgreSQL Experts, Inc.
 
PDF
NoSQL Databases
BADR
 
PPT
Schemaless Databases
Dan Gunter
 
PPTX
Introduction to Graph Databases
Max De Marzi
 
PPTX
Python Ireland Conference 2016 - Python and MongoDB Workshop
Joe Drumgoole
 
KEY
NoSQL databases and managing big data
Steven Francia
 
PPTX
Scaling up Linked Data
EUCLID project
 
PDF
The Real-time Web in the Age of Agents
Joshua Shinavier
 
PDF
Hdfs Dhruba
Jeff Hammerbacher
 
PPTX
When to Use MongoDB
MongoDB
 
PDF
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
Cloudera, Inc.
 
Data Modeling for NoSQL
Tony Tam
 
The Hadoop Ecosystem for Developers
Zohar Elkayam
 
When to Use MongoDB...and When You Should Not...
MongoDB
 
Non relational databases-no sql
Ram kumar
 
Common MongoDB Use Cases
DATAVERSITY
 
introduction to Neo4j (Tabriz Software Open Talks)
Farzin Bagheri
 
NOSQL Databases types and Uses
Suvradeep Rudra
 
Nonrelational Databases
Udi Bauman
 
Relational vs. Non-Relational
PostgreSQL Experts, Inc.
 
NoSQL Databases
BADR
 
Schemaless Databases
Dan Gunter
 
Introduction to Graph Databases
Max De Marzi
 
Python Ireland Conference 2016 - Python and MongoDB Workshop
Joe Drumgoole
 
NoSQL databases and managing big data
Steven Francia
 
Scaling up Linked Data
EUCLID project
 
The Real-time Web in the Age of Agents
Joshua Shinavier
 
Hdfs Dhruba
Jeff Hammerbacher
 
When to Use MongoDB
MongoDB
 
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
Cloudera, Inc.
 
Ad

Similar to NoSQL-Overview (20)

PDF
Nosql data models
Viet-Trung TRAN
 
PDF
II-SDV 2013 Open Source Platforms to deploy Search and Maps Visualization on ...
Dr. Haxel Consult
 
PPTX
Big Data Overview 2013-2014
KMS Technology
 
PPTX
Selecting best NoSQL
Mohammed Fazuluddin
 
PDF
Oracle Week 2016 - Modern Data Architecture
Arthur Gimpel
 
PPTX
NoSQL: An Analysis
Andrew Brust
 
PPTX
Big data stores
Kumaran Ramanujam
 
PDF
NoSQL Now! NoSQL Architecture Patterns
DATAVERSITY
 
PPTX
Nosql databases
Fayez Shayeb
 
PPTX
No sql databases
swathika rajan
 
PPTX
How Big Data ,Cloud Computing ,Data Science can help business
Ajay Ohri
 
PDF
Big Data - Module 1
Aneej Matthai
 
PPTX
NoSQL A brief look at Apache Cassandra Distributed Database
Joe Alex
 
PPTX
NoSQL Type, Bigdata, and Analytics
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
PDF
Hadoop & no sql new generation database systems
ramazan fırın
 
PPTX
2018 05 08_biological_databases_no_sql
Prof. Wim Van Criekinge
 
ODP
Prezentare: Big Data demistificat
ALTBrasov
 
PPTX
Datastore PPT.pptx
Jatin Chuglani
 
PPT
2. Lecture2_NOSQL_KeyValue.ppt
ShaimaaMohamedGalal
 
PDF
Big data-analytics-cpe8035
Neelam Rawat
 
Nosql data models
Viet-Trung TRAN
 
II-SDV 2013 Open Source Platforms to deploy Search and Maps Visualization on ...
Dr. Haxel Consult
 
Big Data Overview 2013-2014
KMS Technology
 
Selecting best NoSQL
Mohammed Fazuluddin
 
Oracle Week 2016 - Modern Data Architecture
Arthur Gimpel
 
NoSQL: An Analysis
Andrew Brust
 
Big data stores
Kumaran Ramanujam
 
NoSQL Now! NoSQL Architecture Patterns
DATAVERSITY
 
Nosql databases
Fayez Shayeb
 
No sql databases
swathika rajan
 
How Big Data ,Cloud Computing ,Data Science can help business
Ajay Ohri
 
Big Data - Module 1
Aneej Matthai
 
NoSQL A brief look at Apache Cassandra Distributed Database
Joe Alex
 
NoSQL Type, Bigdata, and Analytics
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
Hadoop & no sql new generation database systems
ramazan fırın
 
2018 05 08_biological_databases_no_sql
Prof. Wim Van Criekinge
 
Prezentare: Big Data demistificat
ALTBrasov
 
Datastore PPT.pptx
Jatin Chuglani
 
2. Lecture2_NOSQL_KeyValue.ppt
ShaimaaMohamedGalal
 
Big data-analytics-cpe8035
Neelam Rawat
 
Ad

NoSQL-Overview

  • 2. Ranjeet Kumar Jha Reachable: • [email protected] m • Cell: +91 9811006657 Exp: • Java JEE: 13+ • NoSQL/BigData: 4+ 2 LinkedIn: https://in.linkedin.com/in/jharanjeet (Oracle Certified Enterprise Architect) PolicyBazaar.com
  • 3. Agenda • Before SQL and After SQL • NoSQL universe • Trend of NoSQL • Characteristic of BigData 3V • Where to use NoSQL • What NoSQL must deliver • Classification of NoSQL databases • Size Vs Complexity • Visual Guide of CAP Theorem • Overview of key/Value Store • Overview of Document Store • Overview of Column Family Store • Overview of Graph Store • Use Case of Twitter 3PolicyBazaar.com
  • 4. Three Eras of Databases 4 Note: The era of using RDBMSes for all problems is over. Instead we should use the database most suited for the problem at hand. PolicyBazaar.com
  • 5. Before NoSQL DB Selection Was Easy! 5PolicyBazaar.com
  • 6. Big Data Definition • Volumes & volumes of data • Unstructured • Semi-structured • Not suited for Relational Databases • Often utilizes MapReduce frameworks 6PolicyBazaar.com
  • 10. Pressures on Single Node RDBMS Architectures 10PolicyBazaar.com
  • 12. RDBMS vs. NoSQL 12 Source: http://www.google.com/trends/explore#q=nosql%2C%20rdbms&date=1%2F2009%2051m&cmpt=q PolicyBazaar.com
  • 13. NoSQL or SQL? • Wrong question • What is your problem? – Transactions – Amount of data – Data structure – Scale-out Vs Scale-up – OLTP Or OLAP 13PolicyBazaar.com
  • 14. What is your problem… • Key Evaluation Requirements – Transactional, Durability & Consistency – Response time – Functionality – Data characteristics – Scalability, Clustering – Failover – Maintenance, Online changes, Node Management – Maturity – Community, Support – Hosted or Managed – Cost, open source 14PolicyBazaar.com
  • 15. Why NOSQL Now? •Trend 1: Size •Trend 2: Connectedness •Trend 3: Semi-structure •Trend 4: Architecture 15PolicyBazaar.com
  • 16. Character of Big Data: 3V • Volume: Large volumes of data – Today, Facebook ingests 500 terabytes of new data every day; a Boeing 737 will generate 240 terabytes of flight data during a single flight across the US • Velocity: rate of moving data – E.g. Clickstreams and ad impressions capture user behavior at millions of events per second; • Variety: structured, semi structure, unstructured, images, etc. – Big Data data isn't just numbers, dates, and strings. Big Data is also geospatial data, 3D data, audio and video, and unstructured text, including log files and social media Source: http://www-01.ibm.com/software/data/bigdata/ 16PolicyBazaar.com
  • 17. Many Uses of Data • Transactions (OLTP) • Analysis (OLAP) • Search and Findability • Enterprise Agility • Speed and Reliability • Consistency and Availability • Or anything else… 17PolicyBazaar.com
  • 18. Where to use NoSQL? • Social data • Data processing (Hadoop) • Search (Lucene) • Caching (Memcache, ...) • Data Warehousing • Logging • ... 18PolicyBazaar.com
  • 19. What NoSQL must deliver • Massive scalability – No application-level sharding • Performance • High Availability/Fault Tolerance • Ease of use – Simple operations/administration – No application-level sharding – Simple APIs – Quickly evolve application & schema 19PolicyBazaar.com
  • 20. Classification of NoSQL Databases • Key-Value – Very popular for simple key-value lookup: disk/memory. e.g Dynamo, Redis,, Voldemort, MemcachedDB, Berkeley, HazelCast etc • Document – Popular for document type storage. e.g. MongoDB, OrientDB, CouchDB, Riak etc. • Column Family – Key value with fixed column families, allows dynamic columns within column family. E.g. Cassandra, BigTable, HBase, Hypertable etc • Graph – Connected graph with entity Relationship. e.g.Titan, Neo4j, infiniteGraph 20PolicyBazaar.com
  • 21. NoSQL Store • Key-Value Stores – Dynamo Clones • Redis • Membase • Riak • Tokyo Cabinet • Voldemort • Document Stores – MongoDB – CouchDB – SimpleDB • Column Family – BigTable Clones • Cassandra • Hbase • HyperTable • Graph Databases – Neo4J – Titan – InfoGrid – AllegroGraph 21PolicyBazaar.com
  • 22. NOSQL: Size Vs Complexity 22 Sources: http://blogs.neotechnology.com/emil/2009/11/nosql-scaling-to-size- and-scaling-to-complexity.html PolicyBazaar.com
  • 23. Visual Guide to NoSQL 23Sources: http://blog.nahurst.com/visual-guide-to-nosql-systemsPolicyBazaar.com
  • 24. Key-Value Store • Focus on scaling to huge amounts of data • Designed to handle massive load • Based on Amazon’s Dynamo paper • Data model: (global) collection of Key-Value pairs • Dynamo ring partitioning and replication 24PolicyBazaar.com
  • 25. Types of Key-Value Stores • Eventually-consistent key-value store • Hierarchical key-value stores • Key-Value stores in RAM • Key-Value stores on disk • High availability key-value store • Ordered key-value stores • Values that allow simple list operations 25PolicyBazaar.com
  • 26. Key / value stores (Opaque) • Keys are mapped to values • Values are treated as BLOBs (opaque data) • No type information is stored • Values can be heterogeneous • Example values: { name: “ranjeet“, age: 35, city: “DL“ } => JSON, but store will not care about it xdexadxb0x0b => binary, but store will not care about it 26 Key Value PolicyBazaar.com
  • 27. • Open source in-memory key-value store with optional durability • Focus on high speed reads and writes of common data structures to RAM • Allows simple lists, sets and hashes to be stored within the value and manipulated • Many features that developers like – expiration, transactions, pub/sub, partitioning 27PolicyBazaar.com
  • 28. BigTable clones • Like column oriented Relational Databases, but with a twist • Tables similarly to RDBMS, but handles semi- structured • Based on Google’s BigTable paper 28PolicyBazaar.com
  • 29. Document Store • Data stored in nested hierarchies • Logical data remains stored together as a unit • Any item in the document can be queried • Similar to Key-Value stores, but the DB knows what the Value is • Inspired by Lotus Notes • Documents are often versioned 29PolicyBazaar.com
  • 30. Document Store … • Data model: Collections of Key-Value collections • Pros: No object-relational mapping layer, ideal for search, Schema less • Cons: Complex to implement, incompatible with SQL • Examples: MongoDB, Couchbase, CouchDB 30PolicyBazaar.com
  • 31. MongoDB (DocumentDB) • Open Source JSON data store created by 10gen • Master-slave scale out model • Strong developer community • Sharding built-in, automatic • Implemented in C++ with many APIs (C++, JavaScript, Java, .net, Perl, Python etc.) 31PolicyBazaar.com
  • 32. Column-Family • Key includes a row, column family and column name • Store versioned blobs in one large table • Queries can be done on rows, column families and column names • Pros: Great scale out, Performant, versioning • Cons: Cannot query blob content, row and column designs are critical • Examples: Cassandra, Bigtable, HBase, Hypertable, Apache Accumulo 32PolicyBazaar.com
  • 33. The Evolution of Cassandra 33PolicyBazaar.com
  • 34. Cassandra • Apache open source column family database supported by DataStax • Peer-to-peer distribution model • Strong reputation for linear scale out (millions of writes/second) • Database side security • Written in Java and works well with HDFS and MapReduce 34PolicyBazaar.com
  • 35. Cassandra: Feature Headlines • Elastic – Read and write throughput increases linearly as new machines are • Decentralized – Fault tolerant with no single point of failure; no “master” node • Rich data model – Column based, range slices, column slices, secondary indexes, counters, expiring columns 35 Source: http://cassandra.apache.org/ PolicyBazaar.com
  • 36. • Apache Hadoop is a framework that allows for the distributed processing of large data sets across clusters of commodity computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each providing computation and storage. • Hadoop is an open-source implementation of Google MapReduce, GFS(distributed file system). • Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. • Hadoop fulfill need of common infrastructure – Efficient, reliable, easy to use – Open Source, Apache License Hadoop origins 36PolicyBazaar.com
  • 37. HBase /Hadoop • Open source implementation of MapReduce algorithm written in Java • Initially created by Yahoo • Column-oriented data store • Java interface • HBase designed specifically to work with Hadoop • High-level query language (Pig) • Strong support by many vendors 37PolicyBazaar.com
  • 38. Graph Store • Focus on modeling the structure of data - interconnectivity • Scales to the complexity of the data • Inspired by mathematical Graph Theory ( G=(E,V) ) Data is stored in a series of nodes, relationships and properties • Queries are really graph traversals • Data is stored in a series of nodes, relationships and properties • Ideal when relationships between data is key: – e.g. social networks 38PolicyBazaar.com
  • 39. Graph Store (cont..) • Ideal when relationships between data is key: – e.g. social networks • Data model: “Property Graph” ‣Nodes ‣Relationships/Edges between Nodes ‣Key-Value pairs on both ‣Possibly Edge Labels and/or Node/ Edge Types • Pros: fast network search, works with public linked data sets • Cons: specialized query languages (RDF uses SPARQL) , gramlin, cypher) • Examples: Neo4j, Titan, AllegroGraph, InfiniteGraph.. 39PolicyBazaar.com
  • 40. Graph Stores (cont..) • Used when the relationship and relationships types between items are critical • Used for – Social networking queries: "friends of my friends" – Inference and rules engines – Pattern recognition – Used for working with open-linked data • Automate "joins" of public data 40PolicyBazaar.com
  • 41. Property Graph model • Nodes i.e. Vertex • Relationships between Nodes i.e Edge • Relationships have Labels • Relationships are directed, but traversed at equal speed in both directions • The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not) • Nodes have key-value properties • Relationships have key-value properties 41PolicyBazaar.com
  • 42. Neo4J • Graph database designed to be easy to use by Java developers • Dual license (community edition is GPL) • Works as an embedded java library in your application • Disk-based (not just RAM) • Full ACID 42PolicyBazaar.com
  • 43. Decides what you need • SQL – Relational, transactional processing • NoSQL – Non relational, distributed, high performance and highly scalable • Analytics, Warehouse, BigData – Data Warehousing, Analytics, Data science, and reporting • Combination of all 3 – Begin with SQL, NoSQL and eventually need BigData/ Analytics platform 43PolicyBazaar.com
  • 44. Finally… in One liner… • SQL – Works great , can’t easily scale. • NoSQL • Works great , can’t fit for all • Analytics, BigData – Every Business need it. 44PolicyBazaar.com
  • 45. Use Case: Twitter • Twitter challenges – Needs to store many graphs • Who you are following • Who’s following you • Who you receive phone notifications from etc – To deliver a tweet requires rapid paging of followers – Heavy write load as followers are added and removed – Set arithmetic for @mentions (intersection of users). 45PolicyBazaar.com
  • 46. Use Case: Twitter … • What did they try? • Started with Relational Databases • Tried Key-Value storage of denormalized lists • Did it work? – Nope – Either good at Handling the write load or paging large amounts of data But not both 46PolicyBazaar.com
  • 47. Open source implementations to play with! • MongoDB - http://www.mongodb.org/ • Cassandra - http://cassandra.apache.org/ • Neo4j - http://neo4j.org/ • Hadoop + Hbase - http://hadoop.apache.org/ • Redis - http://code.google.com/p/redis/ • Oracle Berkley DB - http://www.oracle.com/ database/berkeley-db/ • … and Many more… 47PolicyBazaar.com
  • 48. Thank You For any Query or feedback write to me [email protected] [email protected] PolicyBazaar.com 48