SlideShare a Scribd company logo
Data Analytics with NOSQL
Mukundan Agaram
Chris Weiss
Some initial thoughts about data...
Continual issues with large scale web apps
– Data growth + query response time
● Data growth => performance degradation
● Explosion of big data “analytics” use cases
– Increase in unstructured data
● More interconnectivity, more formats, lack of structure...
● Document oriented data (XML/JSON) are difficult to
manage and search
– Distributed server configurations
● Large systems, more distribution and HA
Cloud services has aggravated these issues
Agenda for the night
● What is NOSQL?
● Varieties of NOSQL
● Key Industry Use Cases
● Applications for Data Analytics
● Landscape
● Demos/Walkthroughs
● Closing Discussions
What is NOSQL?
● “...mechanism for storage and retrieval of data
that is modeled in means other than tabular
relations used in relational databases.”
Wikipedia
● Non SQL or Non-relational
● Not Only SQL
● Technically since late 1960...
– E.g. IDMS, IMS, MUMPS, Cache, BerkeleyDB
What is NOSQL?
● Drivers for modern day NOSQL
– Web 2.0
– Big Data
– Facebook, Google, Amazon, Expedia etc.
– Horizontal scaling to clusters of computers
● Achilles heel for RDBMS
– Cost
– Provide
● HA
● Partition Tolerance (a.k.a sharding)
● Speed
NOSQL - Drawbacks and Barriers
● Compromise on consistency (CAP Theorem)
● Custom query languages vs. SQL
● Lack of standardized interfaces
● Existing investments in RDBMS
● Most lack true ACID transactions.
– Use an “eventually” consistent model
– Data is replicated with a conflict resolution algorithm
– Methods for conflict resolution and distribution vary
significantly
CAP Theorem
● a.k.a Brewer's theorem
● Impossible for a distributed computer system to
simultaneously provide
– Consistency
● all nodes see same data at same time
– Availability
● Every request receives a response
– Partition Tolerance
● Fault tolerance to partitioning because of network failures
CAP alignment for NOSQL
Source: http://blog.nahurst.com/visual-guide-to-nosql-systems
NOSQL direction
The landscape is morphing...
● Current NOSQL industry focus
– Address large distributed systems reactionary to the
CAP theorem
● The newer breed of NOSQL address important
aspects such as ACID
● There is a new buzz word …
– NewSQL
Database Evolution
NOSQL Model Classification
Key Value Stores &
Caches
Data is represented as a collection of (K,V) pairs. In-memory,
persistent or eventually persistent.
Document Databases Data is stored in JSON document structures.
RDF, OWL & Triple Stores Meaningful way to connect information. Can inference over
triples (S,P,O). Can be represented graphically. SPARQL
Wide Column Databases Extensible record set. Stores data tables as sections of
columns. Great for EDW.
Graph Databases Stores data as a graph G(V,E). Great for correlation analysis,
recommendation engines and fraud detection.
Multi-model Databases Combination of one or more varieties of the above.
NOSQL Models
● Key-Value
– Cache (EHCache, BigMemory, Coherence, Memcached)
– Store (Redis, Riak, AeroSpike, Oracle NoSQL)
● Document (MongoDB, CouchDB, AmazonDynamoDB)
● Wide Column (Cassandra, HBase, Vertica)
● Graph (Neo4j, Titan, Giraph)
● Multi-model (OrientDB, ArangoDB, Sqrrl)
Source: www.db-engines.com
Consider NOSQL for...
● Enabling “big data” and “web” scale
– Massive distribution through horizontal scaling
● Performant queries (alternatives to RDBMS)
– Denormalization and large horizontal scalability
● Massive write volumes (Facebook, Twitter)
● Fast and dynamic access to key data
● Flexible schemas and data types
● Data/Schema Migration
● Developer centric environments
Consider NOSQL for...
● Diverse data organization options
– Hierarchical correlation
– Graph correlation
– Semantic relationships
– Set based analytics
● Caching in end usage format
● Data Archival
● Big Data Analytics
– Cumulative metrics and insights
– Correlation
Where RDBMS/SQL is better..
● OLTP
● Data Integrity
● SQL centricity
● Complex relationships
– Exception of graph NOSQL
● Maturity, stability and standardization
Use Cases
● Log management (unstructured data)
● Data synchronization (online vs. offline sources)
– Shopping cart, Field sales/services, PoS, Gaming,
Transportation/telemetry
● User profile management
● Customer 360 degree view
● Fraud detection
● Medical/Healthcare diagnosis
● Data Archival
● Recommendation Engines
Applications for Data Analytics
● Complements (part of) Hadoop and Big Data
● Acts as the persistence infrastructure for larger
machine learning use cases
– Predictive Analytics
– Fraud/Anomaly/Outlier Detection
– Recommendation engines
● Provides a back drop for interesting data
visualization initiatives
– Integrate with visualization packages such as
Tableau
Interesting links
● Redis in Practice: Who's online?
www.lukemelia.com/blog/archives/2010/01/17/redis-in-practice-whos-online/
● Inventory list of NOSQL systems
www.nosql-database.org
● Database Engine ranking and analytics
www.db-engines.com
● Visual guide to NOSQL systems
www.blog.nahurst.com/visual-guide-to-nosql-systems
Case Studies / Demos
● Retail fraud detection
– Neo4j
– Contrasting with OrientDB
– Tinkerpop/Gremlin/Blue Print
● 360 degree single view of voter information
– MongoDB
● Schema on read
– Hadoop
Data analytics with NOSQL
Data analytics with NOSQL
Gremlin Blueprints Architecture
Neo4j OrientDB TitanGraph ArangoDB
Qualified Voter – Use Case
● Tracks registration information for all voters in
Michigan
● Uses a tabular geography model
● Highly normalized schema
– Data partitioned into subsets
● Enable local application instances and row level security
● Expensive queries when doing reporting
● Expensive queries for performing “single view”
of voter
● Several tables with tens of millions of records
Voter Schema
Find the first 100 voters in Ingham county with
status and school district
SELECT V.VOTER_IDENTIFICATION_NUMBER,V.FIRST_NAME, V.LAST_NAME, G.CODE AS GENDER,
IDS.NAME AS ID_STATUS, UST.NAME AS UOCAVA_STATUS,
VA.ADDRESS_LINE_ONE, VA.CITY, VA.ZIP_CODE,
DIS.NAME AS SCHOOL_DISTRICT
FROM VOTER V, VOTER_ADDRESS VA, GENDER G,
IDENTIFICATION_STATUS IDS, UOCAVA_STATUS UST, VOTER_STATUS_TYPE VST,
STREET_RANGE SI, DISTINCT_POLITICAL_AREA DPA, DISTINCT_POLITICAL_AREA_DIS DPAD,
DISTRICT DIS, DISTRICT_TYPE DT, COUNTY CO
WHERE V.ID = VA.VOTER_ID AND V.GENDER_ID = G.ID AND V.IDENTIFICATION_STATUS_ID = IDS.ID
AND V.UOCAVA_STATUS_ID = UST.ID AND V.VOTER_STATUS_TYPE_ID = VST.ID AND VST.NAME = 'Active'
AND VA.STREET_RANGE_ID = SI.ID AND SI.DISTINCT_POLITICAL_AREA_ID = DPA.ID
AND VA.IS_ACTIVE = 'Y'
AND DPA.COUNTY_ID = CO.ID AND CO.NAME = 'Ingham'
AND DPA.ID = DPAD.DISTINCT_POLITICAL_AREA_ID AND DPAD.DISTRICT_ID = DIS.ID
AND DIS.DISTRICT_TYPE_ID = DT.ID AND DT.NAME = 'School'
AND ROWNUM <= 100;
Data analytics with NOSQL
Data analytics with NOSQL
Expensive in terms of IO
● Multiple objects read
● Two stage IO:
● Read index
● Read entire table row
● Selected and WHERE clause columns
assembled and then filtered
● Resources for larger volume query would be
high – memory, CPU, fast disk
Parting conclusions
● NOSQL is a mixed bag of fruit
● This space is growing
● There are hundreds of products
● Best value is realized from identifying the
correct use case
– Functional requirements
– Non-functional requirements
Finally you can use NOSQL for...
Thank You!!
Questions?

More Related Content

PPTX
Data(base) taxonomy
Dejan Radic
 
PPTX
Data Structure Introduction chapter 1
vasantiDutta1
 
PPTX
Realizing Semantic Web - Light Weight semantics and beyond
Artificial Intelligence Institute at UofSC
 
PPTX
Donders Institute - Research Data Management
Robert Oostenveld
 
PPTX
Design approach
Raaz Karkee
 
PPTX
Database and types of database
baabtra.com - No. 1 supplier of quality freshers
 
PDF
Creating Effective Data Visualizations for Online Learning
Shalin Hai-Jew
 
PPTX
Data Modeling Basics
renuindia
 
Data(base) taxonomy
Dejan Radic
 
Data Structure Introduction chapter 1
vasantiDutta1
 
Realizing Semantic Web - Light Weight semantics and beyond
Artificial Intelligence Institute at UofSC
 
Donders Institute - Research Data Management
Robert Oostenveld
 
Design approach
Raaz Karkee
 
Database and types of database
baabtra.com - No. 1 supplier of quality freshers
 
Creating Effective Data Visualizations for Online Learning
Shalin Hai-Jew
 
Data Modeling Basics
renuindia
 

What's hot (6)

PPTX
Data Modeling PPT
Trinath
 
PDF
Influence of-structured--semi-structured--unstructured-data-on-various-data-m...
shivz3
 
PPTX
Data structure unitfirst part1
Amar Rawat
 
PPTX
Data Dictionary
Vishal Anand
 
PDF
General concepts: DDI
Arhiv družboslovnih podatkov
 
PPT
Ch1
OmarFarukh3
 
Data Modeling PPT
Trinath
 
Influence of-structured--semi-structured--unstructured-data-on-various-data-m...
shivz3
 
Data structure unitfirst part1
Amar Rawat
 
Data Dictionary
Vishal Anand
 
General concepts: DDI
Arhiv družboslovnih podatkov
 
Ad

Viewers also liked (20)

PPTX
Slide share test 110727
ThinkRealEstate
 
PPT
Multimedia01
Les Davy
 
PDF
Cosug 2012-lzy
OpenCity Community
 
PPT
Elements, Compounds & Mixtures Day 3
jmori1
 
PDF
Crociate e preghiere quotidiane (Programma di Preghiera di Gesù all'umantià, ...
Gesù all'umanità gruppo di preghiera (Italia)
 
PPTX
My life
dcbabb
 
KEY
Linkedin
weareopen
 
PDF
Vesterinen: Etsivä nuorisotyö, ammattina välittäminen
Kouluterveyskysely
 
DOCX
Options for filmingh
FirstClassProductions
 
PDF
Goede leiders zijn goede verhalenvertellers - Hans Donckers - Beanmachine
Antwerp Management School
 
PPT
Privatsparande
GiftIdeasForBoyfriend
 
PDF
Infográfico Pessoal
carmelitadesign
 
PPTX
Doublerbuxtutorial
cutiekate78
 
PPTX
Lecture ready class 5
Les Davy
 
PDF
Walking the talk - 3 insights from Behavior Design
Angad Singh
 
PDF
Globo
Miguel Rosario
 
PDF
Notam Sul/Sudeste - 01-mai-16
Carlos Carvalho
 
PPT
Empacotamento e backport de aplicações em debian
Andre Ferraz
 
PPT
3words pp
ebrown216
 
PPTX
Link Building With Twitter
Aman Talwar
 
Slide share test 110727
ThinkRealEstate
 
Multimedia01
Les Davy
 
Cosug 2012-lzy
OpenCity Community
 
Elements, Compounds & Mixtures Day 3
jmori1
 
Crociate e preghiere quotidiane (Programma di Preghiera di Gesù all'umantià, ...
Gesù all'umanità gruppo di preghiera (Italia)
 
My life
dcbabb
 
Linkedin
weareopen
 
Vesterinen: Etsivä nuorisotyö, ammattina välittäminen
Kouluterveyskysely
 
Options for filmingh
FirstClassProductions
 
Goede leiders zijn goede verhalenvertellers - Hans Donckers - Beanmachine
Antwerp Management School
 
Privatsparande
GiftIdeasForBoyfriend
 
Infográfico Pessoal
carmelitadesign
 
Doublerbuxtutorial
cutiekate78
 
Lecture ready class 5
Les Davy
 
Walking the talk - 3 insights from Behavior Design
Angad Singh
 
Notam Sul/Sudeste - 01-mai-16
Carlos Carvalho
 
Empacotamento e backport de aplicações em debian
Andre Ferraz
 
3words pp
ebrown216
 
Link Building With Twitter
Aman Talwar
 
Ad

Similar to Data analytics with NOSQL (20)

PPTX
Big Data with Not Only SQL
Philippe Julio
 
PPT
Dwdmunit1 a
bhagathk
 
PPTX
nosql.pptx
Prakash Zodge
 
PPTX
PPT 1.1.2.pptx ehhllo hi hwi bdfhd dbdhu
bhushanshashi818
 
PPTX
Nosql
ROXTAD71
 
PPTX
Nosql
Roxana Tadayon
 
PDF
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dataconomy Media
 
PPTX
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
DataStax
 
PPTX
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
Felix Gessert
 
PDF
Big Data technology Landscape
ShivanandaVSeeri
 
PPTX
Introduction of Data Science and Data Analytics
VrushaliSolanke
 
PDF
How to Build Modern Data Architectures Both On Premises and in the Cloud
VMware Tanzu
 
PPTX
Exploring NoSQL and implementing through Cassandra
Dileep Kalidindi
 
PPTX
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
Yuanyuan Tian
 
PPTX
NoSQL Architecture Overview
Christopher Foot
 
PPTX
Introduction of big data unit 1
RojaT4
 
PPTX
Erciyes university
hothaifa alkhazraji
 
PPTX
Big Data Session 1.pptx
ElsonPaul2
 
PPTX
Big Data and the growing relevance of NoSQL
Abhijit Sharma
 
PDF
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
Big Data Value Association
 
Big Data with Not Only SQL
Philippe Julio
 
Dwdmunit1 a
bhagathk
 
nosql.pptx
Prakash Zodge
 
PPT 1.1.2.pptx ehhllo hi hwi bdfhd dbdhu
bhushanshashi818
 
Nosql
ROXTAD71
 
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dataconomy Media
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
DataStax
 
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
Felix Gessert
 
Big Data technology Landscape
ShivanandaVSeeri
 
Introduction of Data Science and Data Analytics
VrushaliSolanke
 
How to Build Modern Data Architectures Both On Premises and in the Cloud
VMware Tanzu
 
Exploring NoSQL and implementing through Cassandra
Dileep Kalidindi
 
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
Yuanyuan Tian
 
NoSQL Architecture Overview
Christopher Foot
 
Introduction of big data unit 1
RojaT4
 
Erciyes university
hothaifa alkhazraji
 
Big Data Session 1.pptx
ElsonPaul2
 
Big Data and the growing relevance of NoSQL
Abhijit Sharma
 
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
Big Data Value Association
 

Recently uploaded (20)

PPTX
Introduction to Data Analytics and Data Science
KavithaCIT
 
PDF
Company Presentation pada Perusahaan ADB.pdf
didikfahmi
 
PPTX
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
PDF
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
PPTX
INFO8116 - Week 10 - Slides.pptx big data architecture
guddipatel10
 
PPTX
Complete_STATA_Introduction_Beginner.pptx
mbayekebe
 
PDF
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
PDF
blockchain123456789012345678901234567890
tanvikhunt1003
 
PPTX
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
PDF
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
PDF
Technical Writing Module-I Complete Notes.pdf
VedprakashArya13
 
PDF
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
 
PPTX
short term internship project on Data visualization
JMJCollegeComputerde
 
PDF
Chad Readey - An Independent Thinker
Chad Readey
 
PPTX
Probability systematic sampling methods.pptx
PrakashRajput19
 
PPTX
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PPTX
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
PPTX
INFO8116 - Week 10 - Slides.pptx data analutics
guddipatel10
 
PDF
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Introduction to Data Analytics and Data Science
KavithaCIT
 
Company Presentation pada Perusahaan ADB.pdf
didikfahmi
 
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
INFO8116 - Week 10 - Slides.pptx big data architecture
guddipatel10
 
Complete_STATA_Introduction_Beginner.pptx
mbayekebe
 
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
blockchain123456789012345678901234567890
tanvikhunt1003
 
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
Technical Writing Module-I Complete Notes.pdf
VedprakashArya13
 
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
 
short term internship project on Data visualization
JMJCollegeComputerde
 
Chad Readey - An Independent Thinker
Chad Readey
 
Probability systematic sampling methods.pptx
PrakashRajput19
 
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
INFO8116 - Week 10 - Slides.pptx data analutics
guddipatel10
 
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 

Data analytics with NOSQL

  • 1. Data Analytics with NOSQL Mukundan Agaram Chris Weiss
  • 2. Some initial thoughts about data... Continual issues with large scale web apps – Data growth + query response time ● Data growth => performance degradation ● Explosion of big data “analytics” use cases – Increase in unstructured data ● More interconnectivity, more formats, lack of structure... ● Document oriented data (XML/JSON) are difficult to manage and search – Distributed server configurations ● Large systems, more distribution and HA Cloud services has aggravated these issues
  • 3. Agenda for the night ● What is NOSQL? ● Varieties of NOSQL ● Key Industry Use Cases ● Applications for Data Analytics ● Landscape ● Demos/Walkthroughs ● Closing Discussions
  • 4. What is NOSQL? ● “...mechanism for storage and retrieval of data that is modeled in means other than tabular relations used in relational databases.” Wikipedia ● Non SQL or Non-relational ● Not Only SQL ● Technically since late 1960... – E.g. IDMS, IMS, MUMPS, Cache, BerkeleyDB
  • 5. What is NOSQL? ● Drivers for modern day NOSQL – Web 2.0 – Big Data – Facebook, Google, Amazon, Expedia etc. – Horizontal scaling to clusters of computers ● Achilles heel for RDBMS – Cost – Provide ● HA ● Partition Tolerance (a.k.a sharding) ● Speed
  • 6. NOSQL - Drawbacks and Barriers ● Compromise on consistency (CAP Theorem) ● Custom query languages vs. SQL ● Lack of standardized interfaces ● Existing investments in RDBMS ● Most lack true ACID transactions. – Use an “eventually” consistent model – Data is replicated with a conflict resolution algorithm – Methods for conflict resolution and distribution vary significantly
  • 7. CAP Theorem ● a.k.a Brewer's theorem ● Impossible for a distributed computer system to simultaneously provide – Consistency ● all nodes see same data at same time – Availability ● Every request receives a response – Partition Tolerance ● Fault tolerance to partitioning because of network failures
  • 8. CAP alignment for NOSQL Source: http://blog.nahurst.com/visual-guide-to-nosql-systems
  • 9. NOSQL direction The landscape is morphing... ● Current NOSQL industry focus – Address large distributed systems reactionary to the CAP theorem ● The newer breed of NOSQL address important aspects such as ACID ● There is a new buzz word … – NewSQL
  • 11. NOSQL Model Classification Key Value Stores & Caches Data is represented as a collection of (K,V) pairs. In-memory, persistent or eventually persistent. Document Databases Data is stored in JSON document structures. RDF, OWL & Triple Stores Meaningful way to connect information. Can inference over triples (S,P,O). Can be represented graphically. SPARQL Wide Column Databases Extensible record set. Stores data tables as sections of columns. Great for EDW. Graph Databases Stores data as a graph G(V,E). Great for correlation analysis, recommendation engines and fraud detection. Multi-model Databases Combination of one or more varieties of the above.
  • 12. NOSQL Models ● Key-Value – Cache (EHCache, BigMemory, Coherence, Memcached) – Store (Redis, Riak, AeroSpike, Oracle NoSQL) ● Document (MongoDB, CouchDB, AmazonDynamoDB) ● Wide Column (Cassandra, HBase, Vertica) ● Graph (Neo4j, Titan, Giraph) ● Multi-model (OrientDB, ArangoDB, Sqrrl)
  • 14. Consider NOSQL for... ● Enabling “big data” and “web” scale – Massive distribution through horizontal scaling ● Performant queries (alternatives to RDBMS) – Denormalization and large horizontal scalability ● Massive write volumes (Facebook, Twitter) ● Fast and dynamic access to key data ● Flexible schemas and data types ● Data/Schema Migration ● Developer centric environments
  • 15. Consider NOSQL for... ● Diverse data organization options – Hierarchical correlation – Graph correlation – Semantic relationships – Set based analytics ● Caching in end usage format ● Data Archival ● Big Data Analytics – Cumulative metrics and insights – Correlation
  • 16. Where RDBMS/SQL is better.. ● OLTP ● Data Integrity ● SQL centricity ● Complex relationships – Exception of graph NOSQL ● Maturity, stability and standardization
  • 17. Use Cases ● Log management (unstructured data) ● Data synchronization (online vs. offline sources) – Shopping cart, Field sales/services, PoS, Gaming, Transportation/telemetry ● User profile management ● Customer 360 degree view ● Fraud detection ● Medical/Healthcare diagnosis ● Data Archival ● Recommendation Engines
  • 18. Applications for Data Analytics ● Complements (part of) Hadoop and Big Data ● Acts as the persistence infrastructure for larger machine learning use cases – Predictive Analytics – Fraud/Anomaly/Outlier Detection – Recommendation engines ● Provides a back drop for interesting data visualization initiatives – Integrate with visualization packages such as Tableau
  • 19. Interesting links ● Redis in Practice: Who's online? www.lukemelia.com/blog/archives/2010/01/17/redis-in-practice-whos-online/ ● Inventory list of NOSQL systems www.nosql-database.org ● Database Engine ranking and analytics www.db-engines.com ● Visual guide to NOSQL systems www.blog.nahurst.com/visual-guide-to-nosql-systems
  • 20. Case Studies / Demos ● Retail fraud detection – Neo4j – Contrasting with OrientDB – Tinkerpop/Gremlin/Blue Print ● 360 degree single view of voter information – MongoDB ● Schema on read – Hadoop
  • 23. Gremlin Blueprints Architecture Neo4j OrientDB TitanGraph ArangoDB
  • 24. Qualified Voter – Use Case ● Tracks registration information for all voters in Michigan ● Uses a tabular geography model ● Highly normalized schema – Data partitioned into subsets ● Enable local application instances and row level security ● Expensive queries when doing reporting ● Expensive queries for performing “single view” of voter ● Several tables with tens of millions of records
  • 26. Find the first 100 voters in Ingham county with status and school district SELECT V.VOTER_IDENTIFICATION_NUMBER,V.FIRST_NAME, V.LAST_NAME, G.CODE AS GENDER, IDS.NAME AS ID_STATUS, UST.NAME AS UOCAVA_STATUS, VA.ADDRESS_LINE_ONE, VA.CITY, VA.ZIP_CODE, DIS.NAME AS SCHOOL_DISTRICT FROM VOTER V, VOTER_ADDRESS VA, GENDER G, IDENTIFICATION_STATUS IDS, UOCAVA_STATUS UST, VOTER_STATUS_TYPE VST, STREET_RANGE SI, DISTINCT_POLITICAL_AREA DPA, DISTINCT_POLITICAL_AREA_DIS DPAD, DISTRICT DIS, DISTRICT_TYPE DT, COUNTY CO WHERE V.ID = VA.VOTER_ID AND V.GENDER_ID = G.ID AND V.IDENTIFICATION_STATUS_ID = IDS.ID AND V.UOCAVA_STATUS_ID = UST.ID AND V.VOTER_STATUS_TYPE_ID = VST.ID AND VST.NAME = 'Active' AND VA.STREET_RANGE_ID = SI.ID AND SI.DISTINCT_POLITICAL_AREA_ID = DPA.ID AND VA.IS_ACTIVE = 'Y' AND DPA.COUNTY_ID = CO.ID AND CO.NAME = 'Ingham' AND DPA.ID = DPAD.DISTINCT_POLITICAL_AREA_ID AND DPAD.DISTRICT_ID = DIS.ID AND DIS.DISTRICT_TYPE_ID = DT.ID AND DT.NAME = 'School' AND ROWNUM <= 100;
  • 29. Expensive in terms of IO ● Multiple objects read ● Two stage IO: ● Read index ● Read entire table row ● Selected and WHERE clause columns assembled and then filtered ● Resources for larger volume query would be high – memory, CPU, fast disk
  • 30. Parting conclusions ● NOSQL is a mixed bag of fruit ● This space is growing ● There are hundreds of products ● Best value is realized from identifying the correct use case – Functional requirements – Non-functional requirements
  • 31. Finally you can use NOSQL for...