SlideShare a Scribd company logo
1
METADA T A AND T HE PO WER
OF PAT T ERN- F I NDI NG
M A Y 2 4 , 2 0 1 6 F O R D A T A V E R S I T Y
LEON GUZENDA
Chief Technology Marketing Officer
2
A G E N D A
• Who We Are
• Open Source Big & Fast Data Analytics
• Our Core Technology & New Product
• Pattern Finding Examples
• Q & A
O B J E C T I V I T Y , I N C .
4
O B J E C T I V I T Y I N C . O V E R V I E W
• Private company, headquartered in Silicon Valley since 1988
• Verticals:
• Government: Intelligence, defense, crime detection & prevention
• Financial Services
• Industrial Internet of Things (IIoT)
• Energy
• Healthcare
• Horizontals:
• Graph analytics
• Complex, distributed, scalable database applications
SAMPLE CUSTOMERS AND
PARTNERSCapital
Intensive
Customers
Government
Customers
Telco &
Network
Customers
Technology
Partners
SI
Partners
5
O P E N S O U R C E B I G & F A S T
D A T A A N A L Y T I C S
OPEN SOURCE ANALYTICS...
[Fall 2016]
,R
Proprietary Rules,
Ontologies, Queries...
Reports,
Archives...
Workflow Design
GUI
Proprietary
...OPEN SOURCE ANALYTICS
PROS:
• Large community
• Lots of algorithms
• Model works at scale
• Low startup costs
• Cost effective
CONS:
• Most algorithms are based on
statistical correlation, clustering or
filtering
• Graph algorithms mainly tackle
theoretical problems
• Hadoop mostly targets files, not
metadata.
• Metadata tools focus on technical
parameters, not semantic content.
• Vertex, Edge and Triplet operations
• Graph modification operations
• RDD join operations
• Adjacent triplet operations
• Iterative graph-parallel operations
• Page rank, connected, triangle counts etc.
APACHE SPARK GRAPHX API
• Vertex, Edge and Triplet operations
• Graph modification operations
• RDD join operations
• Adjacent triplet operations
• Iterative graph-parallel operations
• Page rank, connected, triangle counts etc.
Spark GraphFrames add
Motifs (a simple subgraph
definition)
APACHE SPARK GRAPHX API
• Vertex, Edge and Triplet operations
• Graph modification operations
• RDD join operations
• Adjacent triplet operations
• Iterative graph-parallel operations
• Page rank, connected, triangle counts etc.
Spark GraphFrames add
Motifs (a simple subgraph
definition)
BUT
Efficient pathfinding and
complex navigation are
inhibited because of a
table/triplet approach.
APACHE SPARK GRAPHX API
O U R C O R E T E C H N O L O G Y
13
O U R F O C U S
• Complex Objects at scale:
• Relationships are first class citizens
• Ultra-fast navigation and pathfinding
• Not restricted by available RAM
• Scalability, performance, reliability and flexibility:
• Distributed database and distributed processing
• Light, small database kernel - from embedded to cluster to cloud
14
• 1,000’s of trillions of unique objects
• 1,000’s of petabytes of storage
• Resolving an ID fast and regardless
of the number of objects
D I S T R I B U T E D D A T A - S I N G L E L O G I C A L V I E W
Put the data and processing where it’s needed
15
Put the data and processing where it’s needed
D I S T R I B U T E D P R O C E S S I N G
ThingSpan
Cache
Client Processes
T H I N G S P A N
T H I N G S P A N E N V I R O N M E N T
• Uses Apache Spark open source processing engine
• In partnership with Cloudera, Databricks, HortonWorks and MapR
• Powerful object and relationship modeling
• Can store data in HDFS and/or POSIX
• Ultra-fast graph navigation, pathfinding and pattern finding
• REST Server and API for loading data and performing graph analytics
• Spark DataFrame support to leverage MLlib, GraphX, SQL etc.
T H I N G S P A N F E A T U R E S
D I S T R I B U T E D P R O C E S S I N G &
D A T A B A S E
Hadoop Distributed File System
Distributed from top to bottom
OPEN SOURCE ANALYTICS STACK
[Fall 2016]
,R
Proprietary Rules,
Ontologies, Queries...
Reports,
Archives...
Workflow Design
GUI
Proprietary
THINGSPAN ENHANCED ANALYTICS STACK
[Later this year]
T H I N G S P A N C O M P O N E N T S
P A T T E R N F I N D I N G
• Conventional Business Intelligence Analytics: Uses filters and statistical correlation to find relationships
between parameters.
• Graph Pattern Finding Analytics: Uses a combination of outlier, navigational and pathfinding queries.
• Find outliers with SQL or MLlib
• Navigational query can specify Vertex and Edge types to be included/excluded and can invoke
methods during the traversal, e.g. to compute transit time to a node.
• Pathfinding query can find shortest or all paths between two or more Vertices.
• Query type order depends upon the problem
P A T T E R N F I N D I N G T E C H N I Q U E S
CITY
LINK
• Mode
• Duration
• Cost
P A T H - F I N D I N G Q U E R Y
• Problem: Find the least expensive route between San Francisco and New
York for a 60 ton, very wide load that must arrive by Saturday and
minimizes mode transitions (road/rail/water etc.)
• Implied: We can avoid Rail connections.
• Financial: Money Laundering Detection
• Intelligence Analysis: Threat Detection
• AdTech: Recommendation Engine Support
• Industrial Internet of Things (IIoT): Network Congestion Analysis
P A T T E R N F I N D I N G E X A M P L E S
1. Load Person, Account and Transaction data into ThingSpan
$
$
$
$
$
$
$
$
🏡🏡
F I N A N C I A L : M O N E Y L A U N D E R I N G D E T E C T I O N
P1
Acc 1
Acc 2
Acc 22
Acc 23
Acc 24
Acc 35
Acc 21
Acc 31
Acc 32
Acc 33
Acc 20
P2 P3
$
2. Identify people with more than 5 accounts (centrality)
$ $
$
$
$
$
$
$
$
🏡🏡 🏡🏡
F I N A N C I A L : A P P L Y S P A R K G R A P H X
Acc 1
Acc 2
P1 P2
Acc 20
Acc 21
Acc 22
Acc 23
Acc 24
Acc 35
P3
Acc 31
Acc 32
Acc 33
3. Look at all of that person's transactions to see if they terminate in just 1 or 2 offshore accounts
$ $$
$
$
$
$
$
4. INVESTIGATE
🏡🏡 🏡🏡
F I N A N C I A L : A P P L Y A N A V I G A T I O N A L Q U E R Y
Acc 1
Acc 2
P2
Acc 20
Acc 21
Acc 22
Acc 23
Acc 24
Acc 35
Acc 31
Acc 32
Acc 33
P1 P3
$
1. Load People, Calls, Places and Sightings into the Graph
Seen2Seen1
PlaceZ
Seen3
Seen4
H U M I N T : T H R E A T D E T E C T I O N
P1 P2 P3 P5
P6 P7 P8
P9
P1
0
P1
2
P1
3
P1
1
P1
4
P1
5
P1
6
P1
8
P1
7 PlaceX
PlaceY
CDR1 CDR2 CDR3
CDR4 CDR5
CDR7
CDR13
CDR15 CDR16
CDR14
CDR6
CDR12
CDR10
CDR8
CDR11
CDR9
CDR17
2. Use Spark GraphX to find "islands" of callers/callees.
P3CDR1 CDR1
CDR1 CDR1
CDR1
CDR1
CDR1 CDR1
P1
7
CDR1
CDR1
CDR1
CDR1
CDR1
CDR1
CDR1
CDR1 CDR2 CDR3
CDR4 CDR5 CDR6
CDR7
CDR8
CDR9
CDR10
CDR11 CDR12
CDR13 CDR14
CDR15 CDR16
H U M I N T : A P P L Y S P A R K G R A P H X
P1 P2
P6
P1
0
P1
6
P1
1
P7 P8
P1
4
P9
P1
2
P1
3
P1
5
P5
P1
8
CDR17
3. Use a navigational query to see if any of those People have been seen
near Places that need to be protected.
PlaceX
CDR1 CDR1
CDR1 CDR1
CDR1
CDR1
CDR1 CDR1
P1
7
CDR1
CDR1
CDR1
CDR1
CDR1
CDR1
CDR1
Seen2Seen1
CDR2 CDR3
CDR4 CDR5 CDR6
CDR7
CDR8
CDR9 CDR10
CDR11 CDR12
CDR13 CDR14
CDR15 CDR16
PlaceY PlaceZ
Seen3
Seen4 CDR17
H U M I N T : A P P L Y A N A V I G A T I O N A L Q U E R Y
P1 CDR1 P2 P3 P5
P6
P1
0
P1
1
P7 P8
P9
P1
6
P1
4
P1
2
P1
3
P1
5
P1
8
CDR1
CDR1
4. P14 and P15 have been seen near potential target PlaceX, so
they plus P11, P7 and P8 should be put under surveillance.
PlaceX
CDR1 CDR1 CDR1
CDR1
CDR1
CDR1
CDR1 CDR1
CDR1
CDR1
CDR1
CDR1
CDR1
CDR1
Seen2Seen1
CDR2 CDR3
CDR4 CDR5 CDR6
CDR7
CDR8
CDR9 CDR10
CDR11 CDR12
CDR13 CDR14
CDR15 CDR16
PlaceZSeen4
H U M I N T : P L A N A C T I O N
P1 P2
P6
P3
P7 P8
P5
P9
P1
2
PlaceY
Seen3
P1
0
P1
6
P1
3
P1
7
CDR17
P1
8
P1
1
P1
4
P1
5
Joe Fred Mary Jane
1. Load Products, Orders, People and Social_Links into ThingSpan.
Bill
A D T E C H : P R E - P L A N N E D A D S
Pr
1
Pr
2
Pr
3
Pr
4
Pr
5
Pr
6
Sale2 Sale3 Sale4 Sale5
Follows Follows Follows
Sale1
Joe Fred Mary
2. We want to place adds for Product Pr2
Bill
A D T E C H : P R E - P L A N N E D A D S
Pr
2
Pr
4
Pr
5
Pr
6
Sale1 Sale2 Sale3 Sale4 Sale5
Follows Follows Follows
Jane
Pr
1
Pr
3
Joe Fred Mary Jane
3. Use ThingSpan to find bloggers who bought Pr2 and who also have followers.
Bill
Result: Fred bought Pr2. Mary follows Fred's blogs. Jane & Bill follow Mary's.
A D T E C H : W H O F O L L O W S B U Y E R S O F T H E P R O D U C T ?
Pr1 Pr2
Pr
3
Pr
4
Pr
5
Pr
6
Sale1 Sale2 Sale3 Sale4 Sale5
Follows
Follows
Follows
Joe Fred Mary Jane
4. Next time you spot Mary, Jane or Bill, display a personalized Ad for Pr2.
Bill
Result: Fred bought Pr2. Mary follows Fred's blogs. Jane & Bill follow Mary's.
💥💥
Buy
1!
A D T E C H : D I S P L A Y T H E A D
Pr1 Pr2
Pr
3
Pr
4
Pr
5
Pr
6
Sale1 Sale2 Sale3 Sale4 Sale5
Follows
Follows
Follows
1. Load Location, Equipment, Link (+Load) into the graph
20% 20%
95%
65%
20%
50%
30%
25%
Link 2
Link 3
Link 4
Link 5 Link 7
Link 8
Link 9
Link 1
Off
Link 6
SAN JOSE SALT LAKE CITY CHICAGO NEW YORK
I I O T : T E L C O N E T W O R K C O N G E S T I O N
L1 L2 L3 L4
E1
E2
E3
E20
E21
E22
E30
E31
E32
E33
E40
2. Use Spark SQL to find links that are over 90% loaded.
20%
95%
65%
20%
50%
30%
Off 25%
Link 2
Link 3
Link 4
Link 6
Link 7
Link 8
Link 9
Link 1
Link 5
SALT LAKE CITY CHICAGO NEW YORKSAN JOSE
I I O T : A P P L Y S P A R K S Q L
L1 L2 L3 L4
E1
E2
E3
E20
E21
E22
E31
E32
E33
E4020% E30
3. Use a graph query to find the leaf nodes (branch ends)...
20% 20%
95%
65%
20%
50%
30%
25%
Link 2
Link 3
Link 4
Link 6
Link 7
Link 8
Link 9
Link 1
Link 5
Off
... Then Investigate...
SALT LAKE CITY CHICAGO NEW YORKSAN JOSE
I I O T : A P P L Y A T H I N G S P A N N A V I G A T I O N A L Q U E R Y
L1 L2 L3 L4
E1 E20 E30 E40
E31E21E2
E3 E22 E32
E33
20% 20%
95%
65%
20%
50%
30%
25%
4. Aha! E2 and E3 in San Jose are streaming 8K UHDTV
video movies from MovieFlix in New York, overloading Link 6.
Link 1
Link 2
Link 3
Link 4
Link 6
Link 7
Link 8
Link 9
Off
Link 5
SALT LAKE CITY CHICAGO NEW YORKSAN JOSE
I I O T : D I A G N O S E
L1 L2 L3 L4
E1 E20 E30 E40
E31E21E2
E3 E22 E32
E33
20% 20%
50%
65%
20%
50%
30%
25%
5. Solved - by switching on Link 5.
Link 1
Link 2
Link 3
Link 4
Link 6
Link 7
Link 8
Link 9
45%
Link 5
SALT LAKE CITY CHICAGO NEW YORKSAN JOSE
I I O T : F I X
L1 L2 L3 L4
E1 E20 E30 E40
E2 E21 E31
E3 E22 E32
E33
S U M M A R Y
• Open Source Big & Fast Data analytics tools are great at what they're
designed for.
• ThingSpan adds a Metadata Store and scalable graph analytics
• Ultra-fast navigation and pathfinding queries.
• It can interoperate with streaming systems and Big Data platforms
• ThingSpan is extensible to other open source systems
QUESTIONS?
Info@objectivity.com
408-992-7100

More Related Content

Viewers also liked (12)

PPTX
Lançamentos Editora Pensamento Cultrix abril 2016
Milena Cherubim
 
PDF
Mobile world 2014
Deepak Raj (2,000+Connections)
 
PDF
Sex, lies & innovation
mattsadler
 
PDF
Fermars field School_ Facilitation Skill
Tapan Maity
 
PPTX
Insight Investments Overview
Allie Kastorff
 
PPT
Skultsje 1 | Wetterwalden Butenfjild | ROC Friese Poort | Centrum Duurzaam
duurzame verhalen
 
PPTX
Sydney JTBD meetup - the 4 forces of progress
Christian Lafrance
 
PDF
The Chief Data Officer Agenda: Metrics for Information and Data Management
DATAVERSITY
 
PDF
女性のためのキャリアセミナー 自分のためにはたらこう
Rico Sengan
 
PDF
Química computacional - Treball de recerca- Pau Bosch Cabot
Pau Bosch Cabot
 
PDF
Successful Data Governance Models and Frameworks
DATAVERSITY
 
ZIP
Model-Driven Development of Semantic Mashup Applications with the Open-Source...
InfoGrid.org
 
Lançamentos Editora Pensamento Cultrix abril 2016
Milena Cherubim
 
Sex, lies & innovation
mattsadler
 
Fermars field School_ Facilitation Skill
Tapan Maity
 
Insight Investments Overview
Allie Kastorff
 
Skultsje 1 | Wetterwalden Butenfjild | ROC Friese Poort | Centrum Duurzaam
duurzame verhalen
 
Sydney JTBD meetup - the 4 forces of progress
Christian Lafrance
 
The Chief Data Officer Agenda: Metrics for Information and Data Management
DATAVERSITY
 
女性のためのキャリアセミナー 自分のためにはたらこう
Rico Sengan
 
Química computacional - Treball de recerca- Pau Bosch Cabot
Pau Bosch Cabot
 
Successful Data Governance Models and Frameworks
DATAVERSITY
 
Model-Driven Development of Semantic Mashup Applications with the Open-Source...
InfoGrid.org
 

Similar to Metadata and the Power of Pattern-Finding (20)

PDF
Neo4j in Oil & Gas: Industry Use Cases and Impac
Neo4j
 
PDF
Building a Graph of all US Businesses Using Spark Technologies by Alexis Roos
Spark Summit
 
PDF
The Analytics Frontier of the Hadoop Eco-System
inside-BigData.com
 
PDF
En un mundo hiperconectado, las bases de datos de grafos son tu arma secreta
javier ramirez
 
PPTX
Tools and Methods for Big Data Analytics by Dahl Winters
Melinda Thielbar
 
PPTX
Tools and Methods for Big Data Analytics by Dahl Winters
Melinda Thielbar
 
PDF
An Introduction to Graph: Database, Analytics, and Cloud Services
Jean Ihm
 
PDF
Web-Scale Graph Analytics with Apache® Spark™
Databricks
 
PPTX
Follow the money with graphs
Stanka Dalekova
 
PDF
Dirty data? Clean it up! - Datapalooza Denver 2016
Dan Lynn
 
PDF
Graph store
Inder Singh
 
PPT
Making sense of the Graph Revolution
InfiniteGraph
 
PDF
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dan Lynn
 
PPTX
Managing Large Scale Financial Time-Series Data with Graphs
Objectivity
 
PPTX
Using Graph Analysis and Fraud Detection in the Fintech Industry
Stanka Dalekova
 
PPTX
Using Graph Analysis and Fraud Detection in the Fintech Industry
Stanka Dalekova
 
PDF
Fluentd meetup #3
Treasure Data, Inc.
 
PDF
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...
Neo4j
 
PDF
Dev Ops Training
Spark Summit
 
PPTX
Software architecture for data applications
Ding Li
 
Neo4j in Oil & Gas: Industry Use Cases and Impac
Neo4j
 
Building a Graph of all US Businesses Using Spark Technologies by Alexis Roos
Spark Summit
 
The Analytics Frontier of the Hadoop Eco-System
inside-BigData.com
 
En un mundo hiperconectado, las bases de datos de grafos son tu arma secreta
javier ramirez
 
Tools and Methods for Big Data Analytics by Dahl Winters
Melinda Thielbar
 
Tools and Methods for Big Data Analytics by Dahl Winters
Melinda Thielbar
 
An Introduction to Graph: Database, Analytics, and Cloud Services
Jean Ihm
 
Web-Scale Graph Analytics with Apache® Spark™
Databricks
 
Follow the money with graphs
Stanka Dalekova
 
Dirty data? Clean it up! - Datapalooza Denver 2016
Dan Lynn
 
Graph store
Inder Singh
 
Making sense of the Graph Revolution
InfiniteGraph
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dan Lynn
 
Managing Large Scale Financial Time-Series Data with Graphs
Objectivity
 
Using Graph Analysis and Fraud Detection in the Fintech Industry
Stanka Dalekova
 
Using Graph Analysis and Fraud Detection in the Fintech Industry
Stanka Dalekova
 
Fluentd meetup #3
Treasure Data, Inc.
 
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...
Neo4j
 
Dev Ops Training
Spark Summit
 
Software architecture for data applications
Ding Li
 
Ad

More from DATAVERSITY (20)

PDF
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
DATAVERSITY
 
PDF
Data at the Speed of Business with Data Mastering and Governance
DATAVERSITY
 
PDF
Exploring Levels of Data Literacy
DATAVERSITY
 
PDF
Building a Data Strategy – Practical Steps for Aligning with Business Goals
DATAVERSITY
 
PDF
Make Data Work for You
DATAVERSITY
 
PDF
Data Catalogs Are the Answer – What is the Question?
DATAVERSITY
 
PDF
Data Catalogs Are the Answer – What Is the Question?
DATAVERSITY
 
PDF
Data Modeling Fundamentals
DATAVERSITY
 
PDF
Showing ROI for Your Analytic Project
DATAVERSITY
 
PDF
How a Semantic Layer Makes Data Mesh Work at Scale
DATAVERSITY
 
PDF
Is Enterprise Data Literacy Possible?
DATAVERSITY
 
PDF
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
DATAVERSITY
 
PDF
Emerging Trends in Data Architecture – What’s the Next Big Thing?
DATAVERSITY
 
PDF
Data Governance Trends - A Look Backwards and Forwards
DATAVERSITY
 
PDF
Data Governance Trends and Best Practices To Implement Today
DATAVERSITY
 
PDF
2023 Trends in Enterprise Analytics
DATAVERSITY
 
PDF
Data Strategy Best Practices
DATAVERSITY
 
PDF
Who Should Own Data Governance – IT or Business?
DATAVERSITY
 
PDF
Data Management Best Practices
DATAVERSITY
 
PDF
MLOps – Applying DevOps to Competitive Advantage
DATAVERSITY
 
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
DATAVERSITY
 
Data at the Speed of Business with Data Mastering and Governance
DATAVERSITY
 
Exploring Levels of Data Literacy
DATAVERSITY
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
DATAVERSITY
 
Make Data Work for You
DATAVERSITY
 
Data Catalogs Are the Answer – What is the Question?
DATAVERSITY
 
Data Catalogs Are the Answer – What Is the Question?
DATAVERSITY
 
Data Modeling Fundamentals
DATAVERSITY
 
Showing ROI for Your Analytic Project
DATAVERSITY
 
How a Semantic Layer Makes Data Mesh Work at Scale
DATAVERSITY
 
Is Enterprise Data Literacy Possible?
DATAVERSITY
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
DATAVERSITY
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
DATAVERSITY
 
Data Governance Trends - A Look Backwards and Forwards
DATAVERSITY
 
Data Governance Trends and Best Practices To Implement Today
DATAVERSITY
 
2023 Trends in Enterprise Analytics
DATAVERSITY
 
Data Strategy Best Practices
DATAVERSITY
 
Who Should Own Data Governance – IT or Business?
DATAVERSITY
 
Data Management Best Practices
DATAVERSITY
 
MLOps – Applying DevOps to Competitive Advantage
DATAVERSITY
 
Ad

Recently uploaded (20)

PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PPTX
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
PDF
Python basic programing language for automation
DanialHabibi2
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PPTX
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
PDF
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
July Patch Tuesday
Ivanti
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
Python basic programing language for automation
DanialHabibi2
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
July Patch Tuesday
Ivanti
 

Metadata and the Power of Pattern-Finding

  • 1. 1 METADA T A AND T HE PO WER OF PAT T ERN- F I NDI NG M A Y 2 4 , 2 0 1 6 F O R D A T A V E R S I T Y LEON GUZENDA Chief Technology Marketing Officer
  • 2. 2 A G E N D A • Who We Are • Open Source Big & Fast Data Analytics • Our Core Technology & New Product • Pattern Finding Examples • Q & A
  • 3. O B J E C T I V I T Y , I N C .
  • 4. 4 O B J E C T I V I T Y I N C . O V E R V I E W • Private company, headquartered in Silicon Valley since 1988 • Verticals: • Government: Intelligence, defense, crime detection & prevention • Financial Services • Industrial Internet of Things (IIoT) • Energy • Healthcare • Horizontals: • Graph analytics • Complex, distributed, scalable database applications
  • 5. SAMPLE CUSTOMERS AND PARTNERSCapital Intensive Customers Government Customers Telco & Network Customers Technology Partners SI Partners 5
  • 6. O P E N S O U R C E B I G & F A S T D A T A A N A L Y T I C S
  • 7. OPEN SOURCE ANALYTICS... [Fall 2016] ,R Proprietary Rules, Ontologies, Queries... Reports, Archives... Workflow Design GUI Proprietary
  • 8. ...OPEN SOURCE ANALYTICS PROS: • Large community • Lots of algorithms • Model works at scale • Low startup costs • Cost effective CONS: • Most algorithms are based on statistical correlation, clustering or filtering • Graph algorithms mainly tackle theoretical problems • Hadoop mostly targets files, not metadata. • Metadata tools focus on technical parameters, not semantic content.
  • 9. • Vertex, Edge and Triplet operations • Graph modification operations • RDD join operations • Adjacent triplet operations • Iterative graph-parallel operations • Page rank, connected, triangle counts etc. APACHE SPARK GRAPHX API
  • 10. • Vertex, Edge and Triplet operations • Graph modification operations • RDD join operations • Adjacent triplet operations • Iterative graph-parallel operations • Page rank, connected, triangle counts etc. Spark GraphFrames add Motifs (a simple subgraph definition) APACHE SPARK GRAPHX API
  • 11. • Vertex, Edge and Triplet operations • Graph modification operations • RDD join operations • Adjacent triplet operations • Iterative graph-parallel operations • Page rank, connected, triangle counts etc. Spark GraphFrames add Motifs (a simple subgraph definition) BUT Efficient pathfinding and complex navigation are inhibited because of a table/triplet approach. APACHE SPARK GRAPHX API
  • 12. O U R C O R E T E C H N O L O G Y
  • 13. 13 O U R F O C U S • Complex Objects at scale: • Relationships are first class citizens • Ultra-fast navigation and pathfinding • Not restricted by available RAM • Scalability, performance, reliability and flexibility: • Distributed database and distributed processing • Light, small database kernel - from embedded to cluster to cloud
  • 14. 14 • 1,000’s of trillions of unique objects • 1,000’s of petabytes of storage • Resolving an ID fast and regardless of the number of objects D I S T R I B U T E D D A T A - S I N G L E L O G I C A L V I E W Put the data and processing where it’s needed
  • 15. 15 Put the data and processing where it’s needed D I S T R I B U T E D P R O C E S S I N G ThingSpan Cache Client Processes
  • 16. T H I N G S P A N
  • 17. T H I N G S P A N E N V I R O N M E N T
  • 18. • Uses Apache Spark open source processing engine • In partnership with Cloudera, Databricks, HortonWorks and MapR • Powerful object and relationship modeling • Can store data in HDFS and/or POSIX • Ultra-fast graph navigation, pathfinding and pattern finding • REST Server and API for loading data and performing graph analytics • Spark DataFrame support to leverage MLlib, GraphX, SQL etc. T H I N G S P A N F E A T U R E S
  • 19. D I S T R I B U T E D P R O C E S S I N G & D A T A B A S E Hadoop Distributed File System Distributed from top to bottom
  • 20. OPEN SOURCE ANALYTICS STACK [Fall 2016] ,R Proprietary Rules, Ontologies, Queries... Reports, Archives... Workflow Design GUI Proprietary
  • 21. THINGSPAN ENHANCED ANALYTICS STACK [Later this year]
  • 22. T H I N G S P A N C O M P O N E N T S
  • 23. P A T T E R N F I N D I N G
  • 24. • Conventional Business Intelligence Analytics: Uses filters and statistical correlation to find relationships between parameters. • Graph Pattern Finding Analytics: Uses a combination of outlier, navigational and pathfinding queries. • Find outliers with SQL or MLlib • Navigational query can specify Vertex and Edge types to be included/excluded and can invoke methods during the traversal, e.g. to compute transit time to a node. • Pathfinding query can find shortest or all paths between two or more Vertices. • Query type order depends upon the problem P A T T E R N F I N D I N G T E C H N I Q U E S
  • 25. CITY LINK • Mode • Duration • Cost P A T H - F I N D I N G Q U E R Y • Problem: Find the least expensive route between San Francisco and New York for a 60 ton, very wide load that must arrive by Saturday and minimizes mode transitions (road/rail/water etc.) • Implied: We can avoid Rail connections.
  • 26. • Financial: Money Laundering Detection • Intelligence Analysis: Threat Detection • AdTech: Recommendation Engine Support • Industrial Internet of Things (IIoT): Network Congestion Analysis P A T T E R N F I N D I N G E X A M P L E S
  • 27. 1. Load Person, Account and Transaction data into ThingSpan $ $ $ $ $ $ $ $ 🏡🏡 F I N A N C I A L : M O N E Y L A U N D E R I N G D E T E C T I O N P1 Acc 1 Acc 2 Acc 22 Acc 23 Acc 24 Acc 35 Acc 21 Acc 31 Acc 32 Acc 33 Acc 20 P2 P3 $
  • 28. 2. Identify people with more than 5 accounts (centrality) $ $ $ $ $ $ $ $ $ 🏡🏡 🏡🏡 F I N A N C I A L : A P P L Y S P A R K G R A P H X Acc 1 Acc 2 P1 P2 Acc 20 Acc 21 Acc 22 Acc 23 Acc 24 Acc 35 P3 Acc 31 Acc 32 Acc 33
  • 29. 3. Look at all of that person's transactions to see if they terminate in just 1 or 2 offshore accounts $ $$ $ $ $ $ $ 4. INVESTIGATE 🏡🏡 🏡🏡 F I N A N C I A L : A P P L Y A N A V I G A T I O N A L Q U E R Y Acc 1 Acc 2 P2 Acc 20 Acc 21 Acc 22 Acc 23 Acc 24 Acc 35 Acc 31 Acc 32 Acc 33 P1 P3 $
  • 30. 1. Load People, Calls, Places and Sightings into the Graph Seen2Seen1 PlaceZ Seen3 Seen4 H U M I N T : T H R E A T D E T E C T I O N P1 P2 P3 P5 P6 P7 P8 P9 P1 0 P1 2 P1 3 P1 1 P1 4 P1 5 P1 6 P1 8 P1 7 PlaceX PlaceY CDR1 CDR2 CDR3 CDR4 CDR5 CDR7 CDR13 CDR15 CDR16 CDR14 CDR6 CDR12 CDR10 CDR8 CDR11 CDR9 CDR17
  • 31. 2. Use Spark GraphX to find "islands" of callers/callees. P3CDR1 CDR1 CDR1 CDR1 CDR1 CDR1 CDR1 CDR1 P1 7 CDR1 CDR1 CDR1 CDR1 CDR1 CDR1 CDR1 CDR1 CDR2 CDR3 CDR4 CDR5 CDR6 CDR7 CDR8 CDR9 CDR10 CDR11 CDR12 CDR13 CDR14 CDR15 CDR16 H U M I N T : A P P L Y S P A R K G R A P H X P1 P2 P6 P1 0 P1 6 P1 1 P7 P8 P1 4 P9 P1 2 P1 3 P1 5 P5 P1 8 CDR17
  • 32. 3. Use a navigational query to see if any of those People have been seen near Places that need to be protected. PlaceX CDR1 CDR1 CDR1 CDR1 CDR1 CDR1 CDR1 CDR1 P1 7 CDR1 CDR1 CDR1 CDR1 CDR1 CDR1 CDR1 Seen2Seen1 CDR2 CDR3 CDR4 CDR5 CDR6 CDR7 CDR8 CDR9 CDR10 CDR11 CDR12 CDR13 CDR14 CDR15 CDR16 PlaceY PlaceZ Seen3 Seen4 CDR17 H U M I N T : A P P L Y A N A V I G A T I O N A L Q U E R Y P1 CDR1 P2 P3 P5 P6 P1 0 P1 1 P7 P8 P9 P1 6 P1 4 P1 2 P1 3 P1 5 P1 8
  • 33. CDR1 CDR1 4. P14 and P15 have been seen near potential target PlaceX, so they plus P11, P7 and P8 should be put under surveillance. PlaceX CDR1 CDR1 CDR1 CDR1 CDR1 CDR1 CDR1 CDR1 CDR1 CDR1 CDR1 CDR1 CDR1 CDR1 Seen2Seen1 CDR2 CDR3 CDR4 CDR5 CDR6 CDR7 CDR8 CDR9 CDR10 CDR11 CDR12 CDR13 CDR14 CDR15 CDR16 PlaceZSeen4 H U M I N T : P L A N A C T I O N P1 P2 P6 P3 P7 P8 P5 P9 P1 2 PlaceY Seen3 P1 0 P1 6 P1 3 P1 7 CDR17 P1 8 P1 1 P1 4 P1 5
  • 34. Joe Fred Mary Jane 1. Load Products, Orders, People and Social_Links into ThingSpan. Bill A D T E C H : P R E - P L A N N E D A D S Pr 1 Pr 2 Pr 3 Pr 4 Pr 5 Pr 6 Sale2 Sale3 Sale4 Sale5 Follows Follows Follows Sale1
  • 35. Joe Fred Mary 2. We want to place adds for Product Pr2 Bill A D T E C H : P R E - P L A N N E D A D S Pr 2 Pr 4 Pr 5 Pr 6 Sale1 Sale2 Sale3 Sale4 Sale5 Follows Follows Follows Jane Pr 1 Pr 3
  • 36. Joe Fred Mary Jane 3. Use ThingSpan to find bloggers who bought Pr2 and who also have followers. Bill Result: Fred bought Pr2. Mary follows Fred's blogs. Jane & Bill follow Mary's. A D T E C H : W H O F O L L O W S B U Y E R S O F T H E P R O D U C T ? Pr1 Pr2 Pr 3 Pr 4 Pr 5 Pr 6 Sale1 Sale2 Sale3 Sale4 Sale5 Follows Follows Follows
  • 37. Joe Fred Mary Jane 4. Next time you spot Mary, Jane or Bill, display a personalized Ad for Pr2. Bill Result: Fred bought Pr2. Mary follows Fred's blogs. Jane & Bill follow Mary's. 💥💥 Buy 1! A D T E C H : D I S P L A Y T H E A D Pr1 Pr2 Pr 3 Pr 4 Pr 5 Pr 6 Sale1 Sale2 Sale3 Sale4 Sale5 Follows Follows Follows
  • 38. 1. Load Location, Equipment, Link (+Load) into the graph 20% 20% 95% 65% 20% 50% 30% 25% Link 2 Link 3 Link 4 Link 5 Link 7 Link 8 Link 9 Link 1 Off Link 6 SAN JOSE SALT LAKE CITY CHICAGO NEW YORK I I O T : T E L C O N E T W O R K C O N G E S T I O N L1 L2 L3 L4 E1 E2 E3 E20 E21 E22 E30 E31 E32 E33 E40
  • 39. 2. Use Spark SQL to find links that are over 90% loaded. 20% 95% 65% 20% 50% 30% Off 25% Link 2 Link 3 Link 4 Link 6 Link 7 Link 8 Link 9 Link 1 Link 5 SALT LAKE CITY CHICAGO NEW YORKSAN JOSE I I O T : A P P L Y S P A R K S Q L L1 L2 L3 L4 E1 E2 E3 E20 E21 E22 E31 E32 E33 E4020% E30
  • 40. 3. Use a graph query to find the leaf nodes (branch ends)... 20% 20% 95% 65% 20% 50% 30% 25% Link 2 Link 3 Link 4 Link 6 Link 7 Link 8 Link 9 Link 1 Link 5 Off ... Then Investigate... SALT LAKE CITY CHICAGO NEW YORKSAN JOSE I I O T : A P P L Y A T H I N G S P A N N A V I G A T I O N A L Q U E R Y L1 L2 L3 L4 E1 E20 E30 E40 E31E21E2 E3 E22 E32 E33
  • 41. 20% 20% 95% 65% 20% 50% 30% 25% 4. Aha! E2 and E3 in San Jose are streaming 8K UHDTV video movies from MovieFlix in New York, overloading Link 6. Link 1 Link 2 Link 3 Link 4 Link 6 Link 7 Link 8 Link 9 Off Link 5 SALT LAKE CITY CHICAGO NEW YORKSAN JOSE I I O T : D I A G N O S E L1 L2 L3 L4 E1 E20 E30 E40 E31E21E2 E3 E22 E32 E33
  • 42. 20% 20% 50% 65% 20% 50% 30% 25% 5. Solved - by switching on Link 5. Link 1 Link 2 Link 3 Link 4 Link 6 Link 7 Link 8 Link 9 45% Link 5 SALT LAKE CITY CHICAGO NEW YORKSAN JOSE I I O T : F I X L1 L2 L3 L4 E1 E20 E30 E40 E2 E21 E31 E3 E22 E32 E33
  • 43. S U M M A R Y • Open Source Big & Fast Data analytics tools are great at what they're designed for. • ThingSpan adds a Metadata Store and scalable graph analytics • Ultra-fast navigation and pathfinding queries. • It can interoperate with streaming systems and Big Data platforms • ThingSpan is extensible to other open source systems