Data stax webinar cassandra and titandb insights into datastax graph strategy v3 wc

Cassandra and TitanDB
Insights into DataStax's Graph Strategy
Robin Schumacher – VP Products
Dr. Matthias Broecheler – Director of Engineering

Agenda
• Overview of DataStax
• Introduction to Graph
• Comparing Graph to an RDBMS
• A Look at DataStax’s Graph Strategy
• Next Steps
©2015 DataStax

Founded in April 2010
450+
Santa Clara, Austin, New York, London,
Paris, Tokyo, Sydney
410+
Employees Customers
30
Percent
Overview

1970s 1990s
Client-ServerMainframe
Evolution of Data Management
4
 Monolithic hardware
 Centralized workloads
 Vendor lock-in
 General purpose databases (one size fits all)
 Isolated / semi-connected
 Commodity hardware
 Distributed workloads
 Massive scalability
 Radically connected
Today
Cloud Mobile Social
Infrastructure centric Application / data centric

Cassandra – NoSQL for Modern Enterprise Workloads
Always on
Fully distributed
Best in scale and performance
80%+ contributions -> DataStax
Free tools and drivers
Free training
©2015 DataStax
San
Francisco
Stockholm
New York

Enabling The Internet Enterprise with DataStax Enterprise
©2015 DataStax

What is a Graph Database?
©2015 DataStax
High Level Used to manage highly connected or complex data
User Level Used to support traversal and analytic queries against a data
model that uses vertices, edges and properties to represent
and store data
Technical Level Uses specialized index structures, data partitioning
techniques, and query optimizers to efficiently traverse large
graphs

©2015 DataStax



 



DataStax
DataBricks
Spark
DSE
CassandraJonathan Ellis
Robin
Schumacher
Billy
Bosworth
worksFor
title: VP Product
develops
uses
uses
reportsTo
worksFor
title: CTO
worksFor
title: CEO

©2015 DataStax



 



DataStax
DataBricks
Spark
DSE
CassandraJonathan Ellis
Robin
Schumacher
Billy
Bosworth
worksFor
title: VP Product
develops
uses
uses
reportsTo
worksFor
title: CTO
worksFor
title: CEO
Property
Edge
Vertex

A Graph Database Helps Answer Queries Like…
…should an initiated transaction be considered fraudulent or malicious based
on past user actions or normal patterns of system behavior?
…what products or actions should we recommend to a user based on their
preferences and behavioral patterns to maximize sales or user engagement?
…what campaigns should be run for different segments of a company’s
customer base?
©2015 DataStax

Key Difference Between Graph DB and RDBMS
©2015 DataStax
RDBMS Graph DB
Process to query data elements
(joins) is inefficient on large data
sets or many relationships
Better performance for relationship
queries due to specialized index
structures
Expressing JOIN-intensive queries
in SQL is time-consuming and error-
prone
Intuitive query language enabling
faster application development

RDBMS vs. Graph DB: Query Complexity
©2015 DataStax
SELECT TOP (5) [t14].[ProductName]
FROM (SELECT COUNT(*) AS [value],
[t13].[ProductName]
FROM [customers] AS [t0]
CROSS APPLY (SELECT [t9].[ProductName]
FROM [orders] AS [t1]
CROSS JOIN [order details] AS [t2]
INNER JOIN [products] AS [t3]
ON [t3].[ProductID] = [t2].[ProductID]
INNER JOIN [orders] AS [t5]
ON [t5].[OrderID] = [t4].[OrderID]
LEFT JOIN [customers] AS [t6]
ON [t6].[CustomerID] = [t5].[CustomerID]
CROSS JOIN ([orders] AS [t7]
ON [t9].[ProductID] = [t8].[ProductID])
WHERE NOT EXISTS(SELECT NULL AS [EMPTY]
WHERE [t9].[ProductID] = [t12].[ProductID]
AND [t10].[CustomerID] = [t0].[CustomerID]
AND [t11].[OrderID] = [t10].[OrderID])
AND [t6].[CustomerID] <> [t0].[CustomerID]
AND [t2].[OrderID] = [t1].[OrderID]
AND [t4].[ProductID] = [t3].[ProductID]
AND [t8].[OrderID] = [t7].[OrderID]) AS [t13]
WHERE [t0].[CustomerID] = N'ALFKI'
GROUP BY [t13].[ProductName]) AS [t14]
ORDER BY [t14].[value] DESC
g.V('customerId','ALFKI').as('customer')
.out('ordered').out('contains').out('is').as('products')
.in('is').in('contains').in('ordered').except('customer')
.out('ordered').out('contains').out('is').except('products')
.groupCount().cap().orderMap(T.decr)[0..<5].productNa
me
VS.

RDBMS vs. Graph DB: Data Modeling
©2015 DataStax
SELECT TOP (5) [t14].[ProductName]
FROM (SELECT COUNT(*) AS [value],
[t13].[ProductName]
FROM [customers] AS [t0]
CROSS APPLY (SELECT [t9].[ProductName]
INNER JOIN [orders] AS [t5]
ON [t5].[OrderID] = [t4].[OrderID]
LEFT JOIN [customers] AS [t6]
ON [t6].[CustomerID] = [t5].[CustomerID]
CROSS JOIN ([orders] AS [t7]
ON [t9].[ProductID] = [t8].[ProductID])
WHERE NOT EXISTS(SELECT NULL AS [EMPTY]
WHERE [t9].[ProductID] = [t12].[ProductID]
AND [t11].[OrderID] = [t10].[OrderID])
AND [t6].[CustomerID] <> [t0].[CustomerID]
AND [t2].[OrderID] = [t1].[OrderID]
AND [t4].[ProductID] = [t3].[ProductID]
AND [t8].[OrderID] = [t7].[OrderID]) AS [t13]
WHERE [t0].[CustomerID] = N'ALFKI'
GROUP BY [t13].[ProductName]) AS [t14]
ORDER BY [t14].[value] DESC
VS.

Key Difference Between Graph DB and NoSQL
©2015 DataStax
NoSQL Graph DB
Data model can’t represent relationships
between rows or documents requiring
application developers to maintain those
inside the application which is
cumbersome, inefficient, and error prone
Natively supports
relationships in the data
model and provides a query
language to efficiently
retrieve them

NoSQL vs. Graph DB: Query Expressivity
©2015 DataStax
g.V('customerId','ALFKI').as('customer')
.out('ordered').out('contains').out('is').as('products')
.in('is').in('contains').in('ordered').except('customer')
.out('ordered').out('contains').out('is').except('products')
.groupCount().cap().orderMap(T.decr)[0..<5].productNam
e
VS.?
(requires application code)

A Look at DataStax’s Graph Strategy

Product Strategy for 2015
© 2015 DataStax, All Rights Reserved. 20
• Part of DataStax’s product strategy in 2015 will be to support multiple
data models in DataStax Enterprise (DSE)
• Support for multi-model will occur across several releases of DSE in
2015

Why Multi-Model in DataStax Enterprise?
21
Transactions Analytics Search
Mixed Workload Needed?
Solved in DSE
Wide Row Graph JSON
Mixed Model Needed?
Solved in DSE
DSE
Analytics
Search
Transactions
DSE
Wide Row
JSON
Graph

Why Graph?
• Best answer for applications having highly connected data
• Key enabler of systems of engagement and systems of insight applications
• Use cases include:
• Personalization
• Social engagement systems (e.g. matchmaking services, contacts
catalogs, etc.)
• Fraud detection
• Financial analysis
• Security analysis
• Communication
• Supply chain management
©2015 DataStax

Titan – the Foundation for DSE Graph
• Titan is a scalable, distributed graph database that is optimized for storing,
traversing and querying complex graph data in real time
• Titan is open source and licensed under the Apache 2
• Current technical benefits include:
• Built on top of Cassandra, Hbase, and BerkeleyDB
• Scale-out and multi-data center capable
• Able to support thousands of concurrent users and billions of graph data points
• Analytics on graph data supported via Hadoop integration
• Search enabled via support for Solr, Lucene, and Elasticsearch
©2015 DataStax

What is DataStax Enterprise Graph?
DSE Graph is a scalable graph database solution for modern Web and mobile
applications that need to manage highly connected data
DSE Graph will be deeply integrated into
the DSE platform:
• Tight Cassandra integration
• Graph analytics powered by Spark
• DSE Search support
• OpsCenter monitoring
©2015 DataStax

2015 Plans for Titan / DSE Graph
• DataStax will contribute to TinkerPop and is dedicated to making it the #1
open source graph framework
• Release Titan 1.0 (TP3 compatible; a prerequisite coming out 1-2 months
before)
• First release of DSE Graph to occur in DSE 5.0. EAP builds will be
available for interested customer
• Recommendations for customer are to continue to develop using
TinkerPop to ensure seamless compatibility with DSE Graph
• DataStax to provide utilities/instructions for moving existing Titan
databases to DSE Graph
©2015 DataStax

Next Steps
• Check DataStax blog for updates on DSE Graph
• If a current DSE customer, contact us about participating in upcoming
Early Adopter Program (EAP) releases of DSE Graph
• If haven’t tried DSE yet, download it from
http://www.datastax.com/download and follow our getting started guide in
your own environment (or use the DataStax Sandbox)
©2015 DataStax

Data stax webinar cassandra and titandb insights into datastax graph strategy v3 wc

More Related Content

What's hot (20)

Viewers also liked (13)

Similar to Data stax webinar cassandra and titandb insights into datastax graph strategy v3 wc (20)

More from DataStax (20)

Recently uploaded (20)

Data stax webinar cassandra and titandb insights into datastax graph strategy v3 wc

Editor's Notes