SlideShare a Scribd company logo
Graph Gurus 29
Using Graph Algorithms for Advanced Analytics
Part 3 - Community Detection
1
© 2020 TigerGraph. All Rights Reserved
Today's Presenter
2
Victor Lee
Head of Product Strategy & Developer Relations
● BS in Electrical Engineering and Computer
Science from UC Berkeley, MS in Electrical
Engineering from Stanford University
● PhD in Computer Science from Kent State
University focused on graph data mining
● 20+ years in tech industry
© 2020 TigerGraph. All Rights Reserved
Some Housekeeping Items
● Although your phone is muted we do want to answer your questions -
submit your questions at any time using the Q&A tab in the menu
● The webinar is being recorded and will uploaded to our website shortly
(https://www.tigergraph.com/webinars/) and the URL will be emailed
you
● If you have issues with Zoom please contact the panelists via chat
3
© 2020 TigerGraph. All Rights Reserved
Move Faster with TigerGraph Cloud
4
Built for agile teams who would rather build innovative applications than
procure hardware or configure and manage databases
● Start for free
● Move to production with distributed data and HA replication
© 2020 TigerGraph. All Rights Reserved
Today’s Outline
5
1
3
2
Recap of Parts 1 & 2:
Path and Centrality Graph Algorithms
Community Detection Algorithms
Clustering vs. Partitioning
Strict vs. Lenient Rules
What's a Community?
Who's in and who's out?
4
Demo
Running and modifying GSQL
Community Detection Algorithms
© 2020 TigerGraph. All Rights Reserved
Review: Analytics with Graph Algorithms
● Graph algorithms answer fundamental questions about
connected data
● Each algorithm in a library is tool in an analytics toolkit
● Building blocks for more complex business questions
6
Specialized functions Combine to make
something better
© 2020 TigerGraph. All Rights Reserved
Example Questions/Analyses for Graph Algorithms
Which entity is most centrally
located?
● For delivery logistics or greatest visibility
● Closeness Centrality, Betweenness
Centrality algorithms
7
How much influence does this
entity exert over the others?
● For market penetration & buyer influence
● PageRank algorithm
Which entity has similar relationships
to this entity?
● For grouping customers, products, etc.
● Cosine Similarity, SimRank, RoleSim
algorithms
What are the natural community
groupings in the graph?
● For partitioning risk groups, workgroups,
product offerings, etc.
● Community Detection, MinCut algorithms
© 2020 TigerGraph. All Rights Reserved
Summary for Shortest Path Algorithms
8
1
4
3
Graph Algorithms - tools and building
blocks for analyzing graph data
GSQL Algorithm Library - runs
in-database, high-performance,
easy to read and modify
Shortest Path Algorithms - different
algorithms for weighted and
unweighted graphs
2 Learning To Use Algorithms - know what
problem they solve, pros and cons
© 2020 TigerGraph. All Rights Reserved
Summary for Centrality Algorithms
9
1
4
3
Centrality Algorithms - abstract
concepts of location and travel
Customizing GSQL Library algorithms is
easy and familiar, like procedural SQL
PageRank - uses directed referral edges
to find the most influential nodes.
Personalized PageRank is localized.
2 Closeness and Betweenness use shortest
paths. Betweenness is more complex.
© 2020 TigerGraph. All Rights Reserved
Some Types of Graph Algorithms
● Search
● Path Finding & Analytics
● Centrality / Ranking
● Clustering / Community Detection
● Similarity
● Classification
10
© 2020 TigerGraph. All Rights Reserved 11
How do I find the most influential provider in each region
for a particular medical condition?
Whole-Graph Compute problem
1. Analyze claims data to identify referral relationships
among providers (Time Series Analysis)
2. Create subsets of claims around each condition with
a group of healthcare codes (e.g. CPT codes) for
each region (e.g. local healthcare market)
3. Utilize PageRank to score hubs within each market Dr. Thomas
Condition: Diabetes
Healthcare Market: S. San Jose, CA
Hub Identified: Dr. Thomas
Best Practice: Know Graph Algorithms
© 2020 TigerGraph. All Rights Reserved 12
Who is influenced by these leaders (e.g. other doctors,
chiropractors, physical therapists, facilities)?
Utilize Community Detection
1. Identify communities of providers
around each hub for each region and
for a specific condition
2. Track changes over time to detect
significant shifts in communities
Dr. Thomas
Condition: Diabetes
Healthcare Market: S. San Jose, CA
Hub Identified: Dr. Thomas
Community Detected: Diabetes – S. San Jose – Dr. Thomas
© 2020 TigerGraph. All Rights Reserved
What's a Community?
What's your definition?
13
● People who live in the same
neighborhood?
● Entities which interact with one
another, in a mutual relationship
● May have things in common (similarity),
but that is not the key factor.
© 2020 TigerGraph. All Rights Reserved
Community Detection
● Deciding who is in a community… and who isn't.
● If items are already labeled with their community membership,
then your work is done!
● If no rules, the we need to make
or discover our own rules,
based on level of interaction.
14
© 2020 TigerGraph. All Rights Reserved
Communities can be Clusters or Partitions
● Partitioning puts each item
in exactly one group, even if
the justification is sketchy.
15
● Clustering sends up natural
boundaries. Some items
can be outside.
?
© 2020 TigerGraph. All Rights Reserved
A Spectrum of Community Detection Rules
16
● Direct connection to
1+ other member of the
community
● Connected Component
● Direct connection to
K+ other members of the
community
● K-Core
● Direct connection to
every member to the
community
● Clique
K = 2
© 2020 TigerGraph. All Rights Reserved
Directed vs. Undirected Connections
17
Connected Components Strongly Connected Components
A component is connected if there is a path from every vertex to every
vertex.
● If edges are directed, we say the component is strongly connected.
● If edges are undirected, we say it's (weakly) connected.
© 2020 TigerGraph. All Rights Reserved
Relative Density - Louvain Modularity
18
Modularity is a measure of how "good" is the partitioning of a graph
= (fraction of the edges that fall within the given groups)
minus (the expected fraction if edges were distributed at random)
● Researchers at the University of Louvain developed an especially
efficient method for finding the partitioning with the best Q score.
Which has a higher modularity?
© 2020 TigerGraph. All Rights Reserved
More Applications for Community Detection
● Marketing/Customer Analytics: find who
is chatting with whom, to understand
who is being reached and to improve
targeting.
● Biosciences: identify natural
communities of interaction at the
molecular, tissue, organism, and species
levels.
● Financial analytics: discover clusters of
transactions or contracts, to understand
and predict market dynamics, uncover
illicit activity.
19
Specialized plant biochemistry
drives gene clustering in fungi.
bioRxiv 184242; doi:
https://doi.org/10.1101/184242
http://adrianland.uk/social-media/my-l
inkedin-social-graph/
Regionalism and Overlap in Investment Treaty
Law – Towards Consolidation or Contradiction? J
of Intl Econ Law 17(2), pp. 271-298 (2014)
DEMO
GSQL Graph Algorithms in TigerGraph Cloud
20
© 2020 TigerGraph. All Rights Reserved
Datasets
1. Drug Prescribers (104 vertices, 417 directed edges)
○ Demonstrate healthcare use case, visualize results
2. Kangaroos (17 vertices, 91 undirected edges)
○ to demonstrate k-core visually
○ http://konect.uni-koblenz.de/networks/moreno_kangaroo
3. Flickr Images (105,938 vertices, 2,316,948 undirected edges)
○ Demonstrate scalability of algorithms
○ http://konect.uni-koblenz.de/networks/flickrEdges
21
© 2020 TigerGraph. All Rights Reserved
Data Preparation
● Graph algorithms out-of-the-box assume you have 1 type of
vertex and 1 type of edge directly connecting them:
● In our Healthcare Prescriber data set, we didn't yet have direct
Prescriber-Prescriber relationships.
● We ran a pre-processing query to induce direct Referral relationships:
22
vanilla vanilla
referral
claims
patient
prescriber date1 < date2
© 2020 TigerGraph. All Rights Reserved
GSQL Graph Algorithm Library
● Written in GSQL - high-level, parallelized
● Open-source, user-extensible
● Well-documented
23
docs.tigergraph.com/graph-algorithm-library
© 2020 TigerGraph. All Rights Reserved
TigerGraph GSQL Graph Algorithm Library
✓ Call each algorithm as a GSQL query
or as a RESTful endpoint
✓ Run the algorithms in-database (don't
export the data)
✓ Option to update the graph with the
algorithm results
✓ Able to modify/customize the
algorithms. Turing-complete
language.
✓ Massively parallel processing to
handle big graphs
24
© 2020 TigerGraph. All Rights Reserved
Summary
25
1
3
2
Community Detection Algorithms
Use connectedness to decide
boundaries
Strict vs. Lenient Community Rules
Black&white rules are not always helpful.
Louvain uses relative density.
Communities are Clusters, not Partitions
Don't have to include everyone.
Can overlap?
4
Pre- or Post- step with other algorithms
Many algorithms assume you start
from just one connected community
Q&A
Please submit your questions via the Q&A tab in Zoom
26
© 2020 TigerGraph. All Rights Reserved
More Questions?
Join our Developer Forum
https://groups.google.com/a/opengsql.org/forum/#!forum/gsql-users
Sign up for our Developer Office Hours (every Thursday at 11 AM PST)
https://info.tigergraph.com/officehours
27
© 2020 TigerGraph. All Rights Reserved
Additional Resources
Start Free at TigerGraph Cloud Today!
https://www.tigergraph.com/cloud/
Test Drive Online Demo
https://www.tigergraph.com/demo
Download the Developer Edition
https://www.tigergraph.com/download/
Guru Scripts
https://github.com/tigergraph/ecosys/tree/master/guru_scripts
28
© 2020 TigerGraph. All Rights Reserved
Upcoming Graph Guru Events
29
Coming to Atlanta, Charlotte, London, and more.
View all events and request your own here:
https://www.tigergraph.com/graphguruscomestoyou/
Supply Chain Optimisation using Native Parallel
Graphs
February 26 at 8am PST
https://info.tigergraph.com/supply-chain-optimisation
Thank You
30

More Related Content

What's hot (20)

PDF
Graph-Based Identity Resolution at Scale
TigerGraph
 
PDF
Graph Gurus Episode 6: Community Detection
TigerGraph
 
PDF
Machine Learning Feature Design with TigerGraph 3.0 No-Code GUI
TigerGraph
 
PDF
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
TigerGraph
 
PDF
Graph Gurus 23: Best Practices To Model Your Data Using A Graph Database
TigerGraph
 
PDF
TigerGraph UI Toolkits Financial Crimes
TigerGraph
 
PDF
Graph Gurus Episode 22: Cybersecurity
TigerGraph
 
PDF
Fraud prevention is better with TigerGraph inside
TigerGraph
 
PDF
Graph Gurus Episode 34: Graph Databases are Changing the Fraud Detection and ...
TigerGraph
 
PDF
How to Build An AI Based Customer Data Platform: Learn the design patterns fo...
TigerGraph
 
PDF
Graph + AI World 2020: Opening Day Keynote
TigerGraph
 
PDF
Supply Chain and Logistics Management with Graph & AI
TigerGraph
 
PDF
Increasing Revenue and Loyalty with Real-Time Product Recommendation
TigerGraph
 
PDF
Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices
Neo4j
 
PDF
Shift Remote: AI: Smarter AI with analytical graph databases - Victor Lee (Ti...
Shift Conference
 
PDF
Delivering Large Scale Real-time Graph Analytics with Dell Infrastructure and...
TigerGraph
 
PDF
Production model lifecycle management 2016 09
Greg Makowski
 
PDF
Kamanja: Driving Business Value through Real-Time Decisioning Solutions
Greg Makowski
 
PDF
How to Create 80% of a Big Data Pilot Project
Greg Makowski
 
PDF
Graph intelligence: the future of data-driven investigations
Connected Data World
 
Graph-Based Identity Resolution at Scale
TigerGraph
 
Graph Gurus Episode 6: Community Detection
TigerGraph
 
Machine Learning Feature Design with TigerGraph 3.0 No-Code GUI
TigerGraph
 
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
TigerGraph
 
Graph Gurus 23: Best Practices To Model Your Data Using A Graph Database
TigerGraph
 
TigerGraph UI Toolkits Financial Crimes
TigerGraph
 
Graph Gurus Episode 22: Cybersecurity
TigerGraph
 
Fraud prevention is better with TigerGraph inside
TigerGraph
 
Graph Gurus Episode 34: Graph Databases are Changing the Fraud Detection and ...
TigerGraph
 
How to Build An AI Based Customer Data Platform: Learn the design patterns fo...
TigerGraph
 
Graph + AI World 2020: Opening Day Keynote
TigerGraph
 
Supply Chain and Logistics Management with Graph & AI
TigerGraph
 
Increasing Revenue and Loyalty with Real-Time Product Recommendation
TigerGraph
 
Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices
Neo4j
 
Shift Remote: AI: Smarter AI with analytical graph databases - Victor Lee (Ti...
Shift Conference
 
Delivering Large Scale Real-time Graph Analytics with Dell Infrastructure and...
TigerGraph
 
Production model lifecycle management 2016 09
Greg Makowski
 
Kamanja: Driving Business Value through Real-Time Decisioning Solutions
Greg Makowski
 
How to Create 80% of a Big Data Pilot Project
Greg Makowski
 
Graph intelligence: the future of data-driven investigations
Connected Data World
 

Similar to Graph Gurus Episode 29: Using Graph Algorithms for Advanced Analytics Part 3 (20)

PDF
Graph Gurus Episode 32: Using Graph Algorithms for Advanced Analytics Part 5
TigerGraph
 
PDF
Graph Gurus Episode 27: Using Graph Algorithms for Advanced Analytics Part 2
TigerGraph
 
PDF
Graph Analytics with Greenplum and Apache MADlib
VMware Tanzu
 
PDF
Graph analytic and machine learning
Stanley Wang
 
PDF
Using Graph Algorithms For Advanced Analytics - Part 4 Similarity 30 graph al...
TigerGraph
 
PDF
Data Summer Conf 2018, “Analysing Billion Node Graphs (ENG)” — Giorgi Jvaridz...
Provectus
 
PDF
Graph Gurus Episode 5: Webinar PageRank
TigerGraph
 
PDF
Jürgens diata12-communities
Pascal Juergens
 
PPTX
Algorithm in Social network of graph and social network analysis
oliviaclark2905
 
PDF
Graph Gurus 15: Introducing TigerGraph 2.4
TigerGraph
 
PPTX
Graph Analytics: Graph Algorithms Inside Neo4j
Neo4j
 
PPTX
Network sampling, community detection
roberval mariano
 
PPT
Mediapresentation file for social media.
BSriniVasan3
 
PDF
Graph Analysis Beyond Linear Algebra
Jason Riedy
 
PDF
Graph Gurus Episode 7: Connecting the Dots in Real-Time: Deep Link Analysis w...
TigerGraph
 
PPT
An Introduction to Graph Databases
InfiniteGraph
 
PDF
Advanced Analytics: Graph Database Use Cases
DATAVERSITY
 
PPTX
Comparing three data ingestion approaches where Apache Kafka integrates with ...
HostedbyConfluent
 
PPTX
03 Communities in Networks (2017)
Duke Network Analysis Center
 
PDF
Effective community search_dami2015
Nicola Barbieri
 
Graph Gurus Episode 32: Using Graph Algorithms for Advanced Analytics Part 5
TigerGraph
 
Graph Gurus Episode 27: Using Graph Algorithms for Advanced Analytics Part 2
TigerGraph
 
Graph Analytics with Greenplum and Apache MADlib
VMware Tanzu
 
Graph analytic and machine learning
Stanley Wang
 
Using Graph Algorithms For Advanced Analytics - Part 4 Similarity 30 graph al...
TigerGraph
 
Data Summer Conf 2018, “Analysing Billion Node Graphs (ENG)” — Giorgi Jvaridz...
Provectus
 
Graph Gurus Episode 5: Webinar PageRank
TigerGraph
 
Jürgens diata12-communities
Pascal Juergens
 
Algorithm in Social network of graph and social network analysis
oliviaclark2905
 
Graph Gurus 15: Introducing TigerGraph 2.4
TigerGraph
 
Graph Analytics: Graph Algorithms Inside Neo4j
Neo4j
 
Network sampling, community detection
roberval mariano
 
Mediapresentation file for social media.
BSriniVasan3
 
Graph Analysis Beyond Linear Algebra
Jason Riedy
 
Graph Gurus Episode 7: Connecting the Dots in Real-Time: Deep Link Analysis w...
TigerGraph
 
An Introduction to Graph Databases
InfiniteGraph
 
Advanced Analytics: Graph Database Use Cases
DATAVERSITY
 
Comparing three data ingestion approaches where Apache Kafka integrates with ...
HostedbyConfluent
 
03 Communities in Networks (2017)
Duke Network Analysis Center
 
Effective community search_dami2015
Nicola Barbieri
 
Ad

More from TigerGraph (20)

PDF
MAXIMIZING THE VALUE OF SCIENTIFIC INFORMATION TO ACCELERATE INNOVATION
TigerGraph
 
PDF
Better Together: How Graph database enables easy data integration with Spark ...
TigerGraph
 
PDF
Building an accurate understanding of consumers based on real-world signals
TigerGraph
 
PDF
Care Intervention Assistant - Omaha Clinical Data Information System
TigerGraph
 
PDF
Correspondent Banking Networks
TigerGraph
 
PDF
Deploying an End-to-End TigerGraph Enterprise Architecture using Kafka, Maria...
TigerGraph
 
PDF
Fraud Detection and Compliance with Graph Learning
TigerGraph
 
PDF
Fraudulent credit card cash-out detection On Graphs
TigerGraph
 
PDF
FROM DATAFRAMES TO GRAPH Data Science with pyTigerGraph
TigerGraph
 
PDF
Customer Experience Management
TigerGraph
 
PDF
Davraz - A graph visualization and exploration software.
TigerGraph
 
PDF
Plume - A Code Property Graph Extraction and Analysis Library
TigerGraph
 
PDF
TigerGraph.js
TigerGraph
 
PDF
GRAPHS FOR THE FUTURE ENERGY SYSTEMS
TigerGraph
 
PDF
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
TigerGraph
 
PDF
Recommendation Engine with In-Database Machine Learning
TigerGraph
 
PDF
The key to creating a Golden Thread: the power of Graph Databases for Entity ...
TigerGraph
 
PDF
Training Graph Convolutional Neural Networks in Graph Database
TigerGraph
 
PDF
Deep Link Analytics Empowered by AI + Graph + Verticals
TigerGraph
 
PDF
Fast Parallel Similarity Calculations with FPGA Hardware
TigerGraph
 
MAXIMIZING THE VALUE OF SCIENTIFIC INFORMATION TO ACCELERATE INNOVATION
TigerGraph
 
Better Together: How Graph database enables easy data integration with Spark ...
TigerGraph
 
Building an accurate understanding of consumers based on real-world signals
TigerGraph
 
Care Intervention Assistant - Omaha Clinical Data Information System
TigerGraph
 
Correspondent Banking Networks
TigerGraph
 
Deploying an End-to-End TigerGraph Enterprise Architecture using Kafka, Maria...
TigerGraph
 
Fraud Detection and Compliance with Graph Learning
TigerGraph
 
Fraudulent credit card cash-out detection On Graphs
TigerGraph
 
FROM DATAFRAMES TO GRAPH Data Science with pyTigerGraph
TigerGraph
 
Customer Experience Management
TigerGraph
 
Davraz - A graph visualization and exploration software.
TigerGraph
 
Plume - A Code Property Graph Extraction and Analysis Library
TigerGraph
 
TigerGraph.js
TigerGraph
 
GRAPHS FOR THE FUTURE ENERGY SYSTEMS
TigerGraph
 
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
TigerGraph
 
Recommendation Engine with In-Database Machine Learning
TigerGraph
 
The key to creating a Golden Thread: the power of Graph Databases for Entity ...
TigerGraph
 
Training Graph Convolutional Neural Networks in Graph Database
TigerGraph
 
Deep Link Analytics Empowered by AI + Graph + Verticals
TigerGraph
 
Fast Parallel Similarity Calculations with FPGA Hardware
TigerGraph
 
Ad

Recently uploaded (20)

PDF
The Best NVIDIA GPUs for LLM Inference in 2025.pdf
Tamanna36
 
PPTX
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
PPTX
Powerful Uses of Data Analytics You Should Know
subhashenia
 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PDF
InformaticsPractices-MS - Google Docs.pdf
seshuashwin0829
 
PDF
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
PDF
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
PPTX
Feb 2021 Ransomware Recovery presentation.pptx
enginsayin1
 
PDF
Business implication of Artificial Intelligence.pdf
VishalChugh12
 
PPTX
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
PPTX
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
PDF
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
PDF
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
PPTX
big data eco system fundamentals of data science
arivukarasi
 
PDF
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
PPTX
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
PPT
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
PDF
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
PDF
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
PDF
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
The Best NVIDIA GPUs for LLM Inference in 2025.pdf
Tamanna36
 
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
Powerful Uses of Data Analytics You Should Know
subhashenia
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
InformaticsPractices-MS - Google Docs.pdf
seshuashwin0829
 
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
Feb 2021 Ransomware Recovery presentation.pptx
enginsayin1
 
Business implication of Artificial Intelligence.pdf
VishalChugh12
 
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
big data eco system fundamentals of data science
arivukarasi
 
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 

Graph Gurus Episode 29: Using Graph Algorithms for Advanced Analytics Part 3

  • 1. Graph Gurus 29 Using Graph Algorithms for Advanced Analytics Part 3 - Community Detection 1
  • 2. © 2020 TigerGraph. All Rights Reserved Today's Presenter 2 Victor Lee Head of Product Strategy & Developer Relations ● BS in Electrical Engineering and Computer Science from UC Berkeley, MS in Electrical Engineering from Stanford University ● PhD in Computer Science from Kent State University focused on graph data mining ● 20+ years in tech industry
  • 3. © 2020 TigerGraph. All Rights Reserved Some Housekeeping Items ● Although your phone is muted we do want to answer your questions - submit your questions at any time using the Q&A tab in the menu ● The webinar is being recorded and will uploaded to our website shortly (https://www.tigergraph.com/webinars/) and the URL will be emailed you ● If you have issues with Zoom please contact the panelists via chat 3
  • 4. © 2020 TigerGraph. All Rights Reserved Move Faster with TigerGraph Cloud 4 Built for agile teams who would rather build innovative applications than procure hardware or configure and manage databases ● Start for free ● Move to production with distributed data and HA replication
  • 5. © 2020 TigerGraph. All Rights Reserved Today’s Outline 5 1 3 2 Recap of Parts 1 & 2: Path and Centrality Graph Algorithms Community Detection Algorithms Clustering vs. Partitioning Strict vs. Lenient Rules What's a Community? Who's in and who's out? 4 Demo Running and modifying GSQL Community Detection Algorithms
  • 6. © 2020 TigerGraph. All Rights Reserved Review: Analytics with Graph Algorithms ● Graph algorithms answer fundamental questions about connected data ● Each algorithm in a library is tool in an analytics toolkit ● Building blocks for more complex business questions 6 Specialized functions Combine to make something better
  • 7. © 2020 TigerGraph. All Rights Reserved Example Questions/Analyses for Graph Algorithms Which entity is most centrally located? ● For delivery logistics or greatest visibility ● Closeness Centrality, Betweenness Centrality algorithms 7 How much influence does this entity exert over the others? ● For market penetration & buyer influence ● PageRank algorithm Which entity has similar relationships to this entity? ● For grouping customers, products, etc. ● Cosine Similarity, SimRank, RoleSim algorithms What are the natural community groupings in the graph? ● For partitioning risk groups, workgroups, product offerings, etc. ● Community Detection, MinCut algorithms
  • 8. © 2020 TigerGraph. All Rights Reserved Summary for Shortest Path Algorithms 8 1 4 3 Graph Algorithms - tools and building blocks for analyzing graph data GSQL Algorithm Library - runs in-database, high-performance, easy to read and modify Shortest Path Algorithms - different algorithms for weighted and unweighted graphs 2 Learning To Use Algorithms - know what problem they solve, pros and cons
  • 9. © 2020 TigerGraph. All Rights Reserved Summary for Centrality Algorithms 9 1 4 3 Centrality Algorithms - abstract concepts of location and travel Customizing GSQL Library algorithms is easy and familiar, like procedural SQL PageRank - uses directed referral edges to find the most influential nodes. Personalized PageRank is localized. 2 Closeness and Betweenness use shortest paths. Betweenness is more complex.
  • 10. © 2020 TigerGraph. All Rights Reserved Some Types of Graph Algorithms ● Search ● Path Finding & Analytics ● Centrality / Ranking ● Clustering / Community Detection ● Similarity ● Classification 10
  • 11. © 2020 TigerGraph. All Rights Reserved 11 How do I find the most influential provider in each region for a particular medical condition? Whole-Graph Compute problem 1. Analyze claims data to identify referral relationships among providers (Time Series Analysis) 2. Create subsets of claims around each condition with a group of healthcare codes (e.g. CPT codes) for each region (e.g. local healthcare market) 3. Utilize PageRank to score hubs within each market Dr. Thomas Condition: Diabetes Healthcare Market: S. San Jose, CA Hub Identified: Dr. Thomas Best Practice: Know Graph Algorithms
  • 12. © 2020 TigerGraph. All Rights Reserved 12 Who is influenced by these leaders (e.g. other doctors, chiropractors, physical therapists, facilities)? Utilize Community Detection 1. Identify communities of providers around each hub for each region and for a specific condition 2. Track changes over time to detect significant shifts in communities Dr. Thomas Condition: Diabetes Healthcare Market: S. San Jose, CA Hub Identified: Dr. Thomas Community Detected: Diabetes – S. San Jose – Dr. Thomas
  • 13. © 2020 TigerGraph. All Rights Reserved What's a Community? What's your definition? 13 ● People who live in the same neighborhood? ● Entities which interact with one another, in a mutual relationship ● May have things in common (similarity), but that is not the key factor.
  • 14. © 2020 TigerGraph. All Rights Reserved Community Detection ● Deciding who is in a community… and who isn't. ● If items are already labeled with their community membership, then your work is done! ● If no rules, the we need to make or discover our own rules, based on level of interaction. 14
  • 15. © 2020 TigerGraph. All Rights Reserved Communities can be Clusters or Partitions ● Partitioning puts each item in exactly one group, even if the justification is sketchy. 15 ● Clustering sends up natural boundaries. Some items can be outside. ?
  • 16. © 2020 TigerGraph. All Rights Reserved A Spectrum of Community Detection Rules 16 ● Direct connection to 1+ other member of the community ● Connected Component ● Direct connection to K+ other members of the community ● K-Core ● Direct connection to every member to the community ● Clique K = 2
  • 17. © 2020 TigerGraph. All Rights Reserved Directed vs. Undirected Connections 17 Connected Components Strongly Connected Components A component is connected if there is a path from every vertex to every vertex. ● If edges are directed, we say the component is strongly connected. ● If edges are undirected, we say it's (weakly) connected.
  • 18. © 2020 TigerGraph. All Rights Reserved Relative Density - Louvain Modularity 18 Modularity is a measure of how "good" is the partitioning of a graph = (fraction of the edges that fall within the given groups) minus (the expected fraction if edges were distributed at random) ● Researchers at the University of Louvain developed an especially efficient method for finding the partitioning with the best Q score. Which has a higher modularity?
  • 19. © 2020 TigerGraph. All Rights Reserved More Applications for Community Detection ● Marketing/Customer Analytics: find who is chatting with whom, to understand who is being reached and to improve targeting. ● Biosciences: identify natural communities of interaction at the molecular, tissue, organism, and species levels. ● Financial analytics: discover clusters of transactions or contracts, to understand and predict market dynamics, uncover illicit activity. 19 Specialized plant biochemistry drives gene clustering in fungi. bioRxiv 184242; doi: https://doi.org/10.1101/184242 http://adrianland.uk/social-media/my-l inkedin-social-graph/ Regionalism and Overlap in Investment Treaty Law – Towards Consolidation or Contradiction? J of Intl Econ Law 17(2), pp. 271-298 (2014)
  • 20. DEMO GSQL Graph Algorithms in TigerGraph Cloud 20
  • 21. © 2020 TigerGraph. All Rights Reserved Datasets 1. Drug Prescribers (104 vertices, 417 directed edges) ○ Demonstrate healthcare use case, visualize results 2. Kangaroos (17 vertices, 91 undirected edges) ○ to demonstrate k-core visually ○ http://konect.uni-koblenz.de/networks/moreno_kangaroo 3. Flickr Images (105,938 vertices, 2,316,948 undirected edges) ○ Demonstrate scalability of algorithms ○ http://konect.uni-koblenz.de/networks/flickrEdges 21
  • 22. © 2020 TigerGraph. All Rights Reserved Data Preparation ● Graph algorithms out-of-the-box assume you have 1 type of vertex and 1 type of edge directly connecting them: ● In our Healthcare Prescriber data set, we didn't yet have direct Prescriber-Prescriber relationships. ● We ran a pre-processing query to induce direct Referral relationships: 22 vanilla vanilla referral claims patient prescriber date1 < date2
  • 23. © 2020 TigerGraph. All Rights Reserved GSQL Graph Algorithm Library ● Written in GSQL - high-level, parallelized ● Open-source, user-extensible ● Well-documented 23 docs.tigergraph.com/graph-algorithm-library
  • 24. © 2020 TigerGraph. All Rights Reserved TigerGraph GSQL Graph Algorithm Library ✓ Call each algorithm as a GSQL query or as a RESTful endpoint ✓ Run the algorithms in-database (don't export the data) ✓ Option to update the graph with the algorithm results ✓ Able to modify/customize the algorithms. Turing-complete language. ✓ Massively parallel processing to handle big graphs 24
  • 25. © 2020 TigerGraph. All Rights Reserved Summary 25 1 3 2 Community Detection Algorithms Use connectedness to decide boundaries Strict vs. Lenient Community Rules Black&white rules are not always helpful. Louvain uses relative density. Communities are Clusters, not Partitions Don't have to include everyone. Can overlap? 4 Pre- or Post- step with other algorithms Many algorithms assume you start from just one connected community
  • 26. Q&A Please submit your questions via the Q&A tab in Zoom 26
  • 27. © 2020 TigerGraph. All Rights Reserved More Questions? Join our Developer Forum https://groups.google.com/a/opengsql.org/forum/#!forum/gsql-users Sign up for our Developer Office Hours (every Thursday at 11 AM PST) https://info.tigergraph.com/officehours 27
  • 28. © 2020 TigerGraph. All Rights Reserved Additional Resources Start Free at TigerGraph Cloud Today! https://www.tigergraph.com/cloud/ Test Drive Online Demo https://www.tigergraph.com/demo Download the Developer Edition https://www.tigergraph.com/download/ Guru Scripts https://github.com/tigergraph/ecosys/tree/master/guru_scripts 28
  • 29. © 2020 TigerGraph. All Rights Reserved Upcoming Graph Guru Events 29 Coming to Atlanta, Charlotte, London, and more. View all events and request your own here: https://www.tigergraph.com/graphguruscomestoyou/ Supply Chain Optimisation using Native Parallel Graphs February 26 at 8am PST https://info.tigergraph.com/supply-chain-optimisation