SlideShare a Scribd company logo
Graph Theory at work 
doug.needham@ilwllc.com
• @dougneedham 
• Data Guy - Started as a DBA in the Marine Corps, 
evolved to Architect, now aspiring Data Scientist. 
• Oracle, SQL Server, Cassandra, Hadoop, MySQL. 
• I have a strong relational/traditional background. 
• Perpetual Student 
• Learning new things challenges our assumptions. 
Forces us to take a new perspective on “old” 
problems. Eventually maybe even shows us that 
there is a better way to solve a problem.
• Stand back, we are going to talk about math! 
• Basically we are talking about a bunch of dots joined together by 
lines 
• Vertex – Dot on a graph 
• Edge – Line connecting the two points 
• Triangle – 3 Vertices, 3 Edges 
• Square – 4 Vertices, 4 edges 
• Open Triangle - 3 Vertices, 2 edges 
• A lot of things are networks if you look at them the right way. 
• Mark Newman has done a number of really cool presentations, 
available on Youtube about Network analysis. 
• https://www.youtube.com/watch?v=lETt7IcDWLI
Gephi, Graphx, and Giraph
• The 7 Bridges of Konisberg 
• Every tome on Graph theory or Network 
analysis devotes a small portion of there time 
to the 7 Bridges of Konisberg. 
• If I don’t cover this with you, the gods of 
mathematics will strike me down, and never 
allow me to do analysis again in the future.
Gephi, Graphx, and Giraph
• Folks enjoyed there Sunday afternoon strolls across the 
bridges, but occasionally people would wonder if one 
particular route was more efficient than another. 
• Eventually Leonhard Euler was brought into the debate 
about the efficiency problem. 
• Euler used Vertices to represent the land masses and edges 
(or arcs, at the time) to represent bridges. He realized the 
odd number of edges per vertex made the problem 
unsolvable. 
• And here is the cool thing about mathematicians. If we tell 
you something is impossible, we have to tell you why in a 
way you can understand it. But he also invented the branch 
of mathematics today we call Graph Theory. 
• http://en.wikipedia.org/wiki/Leonhard_Euler
• http://gephi.github.io/ 
• From the website: “Gephi is an interactive visualization and 
exploration platform for all kinds of networks and complex 
systems, dynamic and hierarchical graphs.” 
• To get this yourself go into Facebook and search for: 
Netvizz. (You have to authorized it. You can un-authorized it 
later) 
• Click the application. 
• Click “personal network” 
• Click Start 
• Download your gdf file 
• Quick Demo:
• Shortest path – How are two vertices connected? 
• What is a path? 
• Centrality 
• Transitivity 
• Homophily 
• Directed Graphs – or Digraphs 
• Contagion – How do things “spread” through a network? 
• Let’s rearrange things, how does the layout affect 
understanding? 
• This is not just data visualization, it can also be used for 
prediction. https://www.youtube.com/watch?v=rwA-y-XwjuU
• Requires Spark, which is not a bad deal. 
• Jump to Demo 
• http://ampcamp.berkeley.edu/big-data-mini-course/graph-analytics-with-graphx. 
html
• Giraph, I haven’t really done as much with as I 
wanted to do. Perhaps a later presentation 
with a more detailed example comparing 
GraphX with Giraph.
• I started doing some analysis some time ago 
using Graph models to understand metadata. 
• I came up with two types of Graphs: 
• Data Structure Graph Level 1 – This is roughly like 
an Entity Relationship Diagram (ERD) Tables are 
Vertices, Foreign Keys are Edges. 
• Data Structure Graph Level 2 – Each Vertex in this 
graph is an application. Each Edge is data transfer. 
Roughly equivalent to what we used to call Data 
Flow diagrams.
• A DSG Level 1 can show you where you are 
going to have the most interesting query 
performance of your tables. 
• A DSG Level 2 can show you where the most 
amount of work is going on in your Enterprise.
• Network/Graph Analysis is cool. 
• It can show you some interesting things about your data. 
• Some things to consider. 
• Some thought needs to be put into how the raw data is 
organized for a Graph Analysis. 
• Directed graph, undirected, bigraph? Some up front setup 
work needs to be done. 
• Tools help with the detailed calculations, and show the 
paths, walks, etc. 
• However, due thought should be put towards a network 
analysis project.
• http://blog.revolutionanalytics.com/2012/05/facebook-class-social-network-analysis-with-r-and-hadoop.html

More Related Content

PPTX
Apache Spark GraphX highlights.
Doug Needham
 
PDF
An excursion into Graph Analytics with Apache Spark GraphX
Krishna Sankar
 
PDF
GraphX: Graph analytics for insights about developer communities
Paco Nathan
 
PPTX
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Mo Patel
 
PDF
GraphX is the blue ocean for scala engineers @ Scala Matsuri 2014
鉄平 土佐
 
PDF
Graph Analytics in Spark
Paco Nathan
 
PDF
A New Year in Data Science: ML Unpaused
Paco Nathan
 
PDF
Signals from outer space
GraphAware
 
Apache Spark GraphX highlights.
Doug Needham
 
An excursion into Graph Analytics with Apache Spark GraphX
Krishna Sankar
 
GraphX: Graph analytics for insights about developer communities
Paco Nathan
 
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Mo Patel
 
GraphX is the blue ocean for scala engineers @ Scala Matsuri 2014
鉄平 土佐
 
Graph Analytics in Spark
Paco Nathan
 
A New Year in Data Science: ML Unpaused
Paco Nathan
 
Signals from outer space
GraphAware
 

What's hot (20)

PDF
Data Science in Future Tense
Paco Nathan
 
PDF
Use of standards and related issues in predictive analytics
Paco Nathan
 
PDF
GraphFrames: Graph Queries in Spark SQL by Ankur Dave
Spark Summit
 
PDF
Microservices, containers, and machine learning
Paco Nathan
 
PDF
GraphFrames: Graph Queries In Spark SQL
Spark Summit
 
PDF
Congressional PageRank: Graph Analytics of US Congress With Neo4j
William Lyon
 
PPT
Big Graph Analytics on Neo4j with Apache Spark
Kenny Bastani
 
PDF
GalvanizeU Seattle: Eleven Almost-Truisms About Data
Paco Nathan
 
PDF
Data Science in 2016: Moving Up
Paco Nathan
 
PDF
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Paco Nathan
 
PDF
Jupyter for Education: Beyond Gutenberg and Erasmus
Paco Nathan
 
PPT
Graph Analytics for big data
Sigmoid
 
PPT
Benchmarking graph databases on the problem of community detection
Symeon Papadopoulos
 
PDF
Interpreting Relational Schema to Graphs
Neo4j
 
PDF
Graph-Powered Machine Learning
GraphAware
 
PDF
Magellen: Geospatial Analytics on Spark by Ram Sriharsha
Spark Summit
 
PDF
Power of Polyglot Search
Janos Szendi-Varga
 
PDF
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Fred Madrid
 
PDF
Improve ML Predictions using Connected Feature Extraction
Databricks
 
PPTX
What you need to know to start an AI company?
Mo Patel
 
Data Science in Future Tense
Paco Nathan
 
Use of standards and related issues in predictive analytics
Paco Nathan
 
GraphFrames: Graph Queries in Spark SQL by Ankur Dave
Spark Summit
 
Microservices, containers, and machine learning
Paco Nathan
 
GraphFrames: Graph Queries In Spark SQL
Spark Summit
 
Congressional PageRank: Graph Analytics of US Congress With Neo4j
William Lyon
 
Big Graph Analytics on Neo4j with Apache Spark
Kenny Bastani
 
GalvanizeU Seattle: Eleven Almost-Truisms About Data
Paco Nathan
 
Data Science in 2016: Moving Up
Paco Nathan
 
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Paco Nathan
 
Jupyter for Education: Beyond Gutenberg and Erasmus
Paco Nathan
 
Graph Analytics for big data
Sigmoid
 
Benchmarking graph databases on the problem of community detection
Symeon Papadopoulos
 
Interpreting Relational Schema to Graphs
Neo4j
 
Graph-Powered Machine Learning
GraphAware
 
Magellen: Geospatial Analytics on Spark by Ram Sriharsha
Spark Summit
 
Power of Polyglot Search
Janos Szendi-Varga
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Fred Madrid
 
Improve ML Predictions using Connected Feature Extraction
Databricks
 
What you need to know to start an AI company?
Mo Patel
 
Ad

Viewers also liked (20)

PPTX
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
DataWorks Summit/Hadoop Summit
 
PPT
pojarnaya bezopasnost
mdou_142
 
PDF
The effectsofchanmeditation
walkmankim
 
PPTX
Patent Basics Presentation Mesa Thinkspot 2016
statelibaz
 
PPTX
презентація до занять школа етикету2
Тетяна Коваль
 
PPT
povedenie na pogare
mdou_142
 
PPTX
Using behavioral economics in lunchrooms
aleighb801
 
PPTX
διδω σωτηριου
ekidrou
 
PDF
Elder City Council of Newcastle Newsletter March-April 2014
Byker Community Trust
 
PPTX
Christmas
soniapr30
 
DOCX
张澄基教授《什么是佛法》
walkmankim
 
PPTX
διδω σωτηριου
ekidrou
 
PPT
One indiabulls gurgaon sector 104 99997.44778 dwarka expressway new project i...
sachivchawla
 
PPT
The vibrant startup challenge entry submission
webrosoft
 
PPTX
Christmas
soniapr30
 
PPTX
TOUCH
soniapr30
 
PPTX
CHRISTMAS CARDS
soniapr30
 
DOCX
佛教與基督教的比較
walkmankim
 
PPTX
Alimentos trangénicos ea ii
Yetsin Vinces
 
PDF
Impressoinisme informàtica
torragrau
 
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
DataWorks Summit/Hadoop Summit
 
pojarnaya bezopasnost
mdou_142
 
The effectsofchanmeditation
walkmankim
 
Patent Basics Presentation Mesa Thinkspot 2016
statelibaz
 
презентація до занять школа етикету2
Тетяна Коваль
 
povedenie na pogare
mdou_142
 
Using behavioral economics in lunchrooms
aleighb801
 
διδω σωτηριου
ekidrou
 
Elder City Council of Newcastle Newsletter March-April 2014
Byker Community Trust
 
Christmas
soniapr30
 
张澄基教授《什么是佛法》
walkmankim
 
διδω σωτηριου
ekidrou
 
One indiabulls gurgaon sector 104 99997.44778 dwarka expressway new project i...
sachivchawla
 
The vibrant startup challenge entry submission
webrosoft
 
Christmas
soniapr30
 
TOUCH
soniapr30
 
CHRISTMAS CARDS
soniapr30
 
佛教與基督教的比較
walkmankim
 
Alimentos trangénicos ea ii
Yetsin Vinces
 
Impressoinisme informàtica
torragrau
 
Ad

Similar to Gephi, Graphx, and Giraph (20)

PPTX
Social Network Analysis Introduction including Data Structure Graph overview.
Doug Needham
 
PPTX
Data Structure Graph DMZ #DMZone
Doug Needham
 
PPTX
Interactive visualization and exploration of network data with Gephi
Digital Methods Initiative
 
PDF
Graph Analyses with Python and NetworkX
Benjamin Bengfort
 
PPT
Social Network Based Information Systems (Tin180 Com)
Tin180 VietNam
 
PDF
Distributed graph processing
Bartosz Konieczny
 
PDF
The Graph Abstract Data Type-DATA STRUCTURE.pdf
Archana Gopinath
 
PPTX
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
PyData
 
PPTX
Discrete mathematics presentation related to application
rutujakjadhav20
 
PPT
An Introduction to Graph Databases
InfiniteGraph
 
PDF
Graph theory in Practise
David Simons
 
PDF
Bill howe 8_graphs
Mahammad Valiyev
 
PPTX
Network analysis lecture
Sara-Jayne Terp
 
PDF
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Jason Riedy
 
PPTX
Graph-Theory-The-Foundations-of-Modern-Networks.pptx
killeromm95
 
PDF
Descobrindo o tesouro escondido nos seus dados usando grafos.
Ana Appel
 
PPT
mathematics of network science: basic definitions
phdutm2009
 
PPTX
Big data week 2018 - Graph Analytics on Big Data
Christos Hadjinikolis
 
PPTX
Everything About Graphs in Data Structures.pptx
MdSabbirAhmedEkhon
 
PDF
Graph Analytics with Greenplum and Apache MADlib
VMware Tanzu
 
Social Network Analysis Introduction including Data Structure Graph overview.
Doug Needham
 
Data Structure Graph DMZ #DMZone
Doug Needham
 
Interactive visualization and exploration of network data with Gephi
Digital Methods Initiative
 
Graph Analyses with Python and NetworkX
Benjamin Bengfort
 
Social Network Based Information Systems (Tin180 Com)
Tin180 VietNam
 
Distributed graph processing
Bartosz Konieczny
 
The Graph Abstract Data Type-DATA STRUCTURE.pdf
Archana Gopinath
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
PyData
 
Discrete mathematics presentation related to application
rutujakjadhav20
 
An Introduction to Graph Databases
InfiniteGraph
 
Graph theory in Practise
David Simons
 
Bill howe 8_graphs
Mahammad Valiyev
 
Network analysis lecture
Sara-Jayne Terp
 
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Jason Riedy
 
Graph-Theory-The-Foundations-of-Modern-Networks.pptx
killeromm95
 
Descobrindo o tesouro escondido nos seus dados usando grafos.
Ana Appel
 
mathematics of network science: basic definitions
phdutm2009
 
Big data week 2018 - Graph Analytics on Big Data
Christos Hadjinikolis
 
Everything About Graphs in Data Structures.pptx
MdSabbirAhmedEkhon
 
Graph Analytics with Greenplum and Apache MADlib
VMware Tanzu
 

Recently uploaded (20)

PPTX
INFO8116 - Week 10 - Slides.pptx data analutics
guddipatel10
 
PPTX
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
PDF
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
PPTX
Probability systematic sampling methods.pptx
PrakashRajput19
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PPTX
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
PPTX
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
PDF
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
PPTX
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
PPTX
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
PDF
Blitz Campinas - Dia 24 de maio - Piettro.pdf
fabigreek
 
PPTX
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
PPTX
Presentation on animal welfare a good topic
kidscream385
 
PDF
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
PPTX
Web dev -ppt that helps us understand web technology
shubhragoyal12
 
PDF
blockchain123456789012345678901234567890
tanvikhunt1003
 
PDF
The_Future_of_Data_Analytics_by_CA_Suvidha_Chaplot_UPDATED.pdf
CA Suvidha Chaplot
 
PPTX
M1-T1.pptxM1-T1.pptxM1-T1.pptxM1-T1.pptx
teodoroferiarevanojr
 
PDF
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
PPTX
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
INFO8116 - Week 10 - Slides.pptx data analutics
guddipatel10
 
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
Probability systematic sampling methods.pptx
PrakashRajput19
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
Blitz Campinas - Dia 24 de maio - Piettro.pdf
fabigreek
 
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
Presentation on animal welfare a good topic
kidscream385
 
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
Web dev -ppt that helps us understand web technology
shubhragoyal12
 
blockchain123456789012345678901234567890
tanvikhunt1003
 
The_Future_of_Data_Analytics_by_CA_Suvidha_Chaplot_UPDATED.pdf
CA Suvidha Chaplot
 
M1-T1.pptxM1-T1.pptxM1-T1.pptxM1-T1.pptx
teodoroferiarevanojr
 
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 

Gephi, Graphx, and Giraph

  • 2. • @dougneedham • Data Guy - Started as a DBA in the Marine Corps, evolved to Architect, now aspiring Data Scientist. • Oracle, SQL Server, Cassandra, Hadoop, MySQL. • I have a strong relational/traditional background. • Perpetual Student • Learning new things challenges our assumptions. Forces us to take a new perspective on “old” problems. Eventually maybe even shows us that there is a better way to solve a problem.
  • 3. • Stand back, we are going to talk about math! • Basically we are talking about a bunch of dots joined together by lines • Vertex – Dot on a graph • Edge – Line connecting the two points • Triangle – 3 Vertices, 3 Edges • Square – 4 Vertices, 4 edges • Open Triangle - 3 Vertices, 2 edges • A lot of things are networks if you look at them the right way. • Mark Newman has done a number of really cool presentations, available on Youtube about Network analysis. • https://www.youtube.com/watch?v=lETt7IcDWLI
  • 5. • The 7 Bridges of Konisberg • Every tome on Graph theory or Network analysis devotes a small portion of there time to the 7 Bridges of Konisberg. • If I don’t cover this with you, the gods of mathematics will strike me down, and never allow me to do analysis again in the future.
  • 7. • Folks enjoyed there Sunday afternoon strolls across the bridges, but occasionally people would wonder if one particular route was more efficient than another. • Eventually Leonhard Euler was brought into the debate about the efficiency problem. • Euler used Vertices to represent the land masses and edges (or arcs, at the time) to represent bridges. He realized the odd number of edges per vertex made the problem unsolvable. • And here is the cool thing about mathematicians. If we tell you something is impossible, we have to tell you why in a way you can understand it. But he also invented the branch of mathematics today we call Graph Theory. • http://en.wikipedia.org/wiki/Leonhard_Euler
  • 8. • http://gephi.github.io/ • From the website: “Gephi is an interactive visualization and exploration platform for all kinds of networks and complex systems, dynamic and hierarchical graphs.” • To get this yourself go into Facebook and search for: Netvizz. (You have to authorized it. You can un-authorized it later) • Click the application. • Click “personal network” • Click Start • Download your gdf file • Quick Demo:
  • 9. • Shortest path – How are two vertices connected? • What is a path? • Centrality • Transitivity • Homophily • Directed Graphs – or Digraphs • Contagion – How do things “spread” through a network? • Let’s rearrange things, how does the layout affect understanding? • This is not just data visualization, it can also be used for prediction. https://www.youtube.com/watch?v=rwA-y-XwjuU
  • 10. • Requires Spark, which is not a bad deal. • Jump to Demo • http://ampcamp.berkeley.edu/big-data-mini-course/graph-analytics-with-graphx. html
  • 11. • Giraph, I haven’t really done as much with as I wanted to do. Perhaps a later presentation with a more detailed example comparing GraphX with Giraph.
  • 12. • I started doing some analysis some time ago using Graph models to understand metadata. • I came up with two types of Graphs: • Data Structure Graph Level 1 – This is roughly like an Entity Relationship Diagram (ERD) Tables are Vertices, Foreign Keys are Edges. • Data Structure Graph Level 2 – Each Vertex in this graph is an application. Each Edge is data transfer. Roughly equivalent to what we used to call Data Flow diagrams.
  • 13. • A DSG Level 1 can show you where you are going to have the most interesting query performance of your tables. • A DSG Level 2 can show you where the most amount of work is going on in your Enterprise.
  • 14. • Network/Graph Analysis is cool. • It can show you some interesting things about your data. • Some things to consider. • Some thought needs to be put into how the raw data is organized for a Graph Analysis. • Directed graph, undirected, bigraph? Some up front setup work needs to be done. • Tools help with the detailed calculations, and show the paths, walks, etc. • However, due thought should be put towards a network analysis project.