SlideShare a Scribd company logo
BenchMarking Tool for
Graph Algorithms
IIIT-H Cloud Computing - Major Project
By:
Abhinaba Sarkar 201405616
Malavika Reddy 201201193
Yash Khandelwal 201302164
Nikita Kad 201330030
Description
● In computer science and mathematics, graphs are abstract data structures that model
structural relationships among objects. They are now widely used for data modeling in
application domains for which identifying relationship patterns, rules, and anomalies is useful.
● These domains include the web graph, social networks,etc. The ever-increasing size of graph-
structured data for these applications creates a critical need for scalable systems that can
process large amounts of it efficiently.
● The project aims at making a benchmarking tool for testing the performance of graph
algorithms like BFS, Pagerank,etc. with MapReduce, Giraph, GraphLab and Neo4j and testing
which approach works better on what kind of graphs.
Motivation
● Analyze the runtime of different types of graph algorithms on different
types of distributed systems.
● Performing computation on a graph data structure requires processing at
each node.
● Each node contains node-specific data as well as links (edges) to other
nodes. So computation must traverse the graph which will take a huge
amount of time.
Approach
The BFS/SSSP algorithm is broken in 2 tasks:
● Map Task:In each Map task, we discover all the neighbors of the node currently in queue (we
used color encoding GRAY for nodes in queue) and add them to our graph.
● Reduce Task:In each Reduce task, we set the correct level of the nodes and update the graph.
The pagerank algorithm is also broken in 2 steps:
● Map Task: Each page emit its neighbours and current pagerank.
● Reduce Task: For each key(page) new page rank is calculated using pagerank emitted in the
map task.
○ PR(A)=(1-d) + d(PR(T1)/C(T1) + ... +PR(Tn)/C(Tn)) Where - C(P) is the cardinality (out-
degree) of page P, d is the damping (“random URL”) factor.
Dijkstra:
● Map task : In each of the map tasks, neighbors are discovered and put into
the queue with color coding gray.
● Reduce task : In each of the reduce tasks, we select the nodes according to
the shortest distances from the current node.
Approach contd.
Giraph and Hadoop
All the computations are done on a cluster of 2 nodes
Graphlab
All the computations are performed on single machine
Applications
In today’s world, dynamic social graphs (like:
linkedin, twitter and facebook) are not feasible to
process in single node. Therefore we need to
benchmark the runtime of different graph
algorithms in distributed system.
Example graph: LinkedIn’s social graph
Complexity
● BFS: The complexity of standard BFS algorithm is O(V+E) but because of
the overhead of read/write in distributed computing, the order reaches O
(E*Depth).
● Similar is the case for Dijkstra’s algorithm. But number of iterations will be
higher than BFS.
● Page Rank: The Complexity of pagerank in distributed system is –
(No. of Node + No. of Relations)*Iterations
Benchmarking - Giraph
Nodes Time
1000 4 min
7.836 sec
1 million 10 min
11.443sec
Nodes Time
1000 3 min 5.655
sec
1 million 11 min 0.05
sec
Nodes Time
1000 5 min
12.111 sec
1 million 16 min
8.652 sec
BFS Dijkstra Pagerank
Nodes Time
1000 6.029 sec
10,000 20.154 sec
1 million 1 min 11.124
sec
Nodes Time
1000 4.852 sec
10,000 13.029 sec
1 million 1 min 10.576sec
Page-Rank
Dijkstra
Benchmarking - Graphlab
Benchmarking - Hadoop
Nodes Time
1000 4 min
7.836 sec
1 million 10 min
11.443sec
Nodes Time
1000 3 min 5.655
sec
1 million 11 min 0.05
sec
BFS Dijkstra Pagerank
Nodes Time
1000 5 min
12.111 sec
1 million 16 min
8.652 sec
BFS and Dijkstra’s runtime depend on the depth of the input graph.
Problems we faced
● Poor locality of memory access.
● Very little work per vertex.
● Changing degree of parallelism.
● Running over many machines makes the problem worse
Conclusion and Future Work
● Although GraphLab is fast, there is constraint on memory as it requires as much memory to
contain the edges and their associated values of any single vertex in the graph.
● From the experimental results, it is seen that the time taken for pagerank algorithm is directly
proportional to the number of relations in the graph when the number of nodes and iterations
are constant. This explains the huge difference in time.
● The runtime of BFS is directly proportional to the depth of the graph. So, greater the depth,
more will be the number of iterations and hence more time.
Future Work:
Taking the input graph from file adds a huge overhead of reading and writing to files in each
iteration, so if somehow we can store the graph and its properties in a Database, the read/write
overhead will be gone and the query time will be reduced. So,we plan to include Database in it.

More Related Content

What's hot (20)

PPTX
Dr Richard Fry - Using R as a GIS
Shaun Lewis
 
PPTX
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
Guy Lansley
 
ODP
Spatial Data Integrator - Software Presentation and Use Cases
mathieuraj
 
PPTX
Executing Joins Dynamically in DDBS Query Optimizer
Er. Shiva K. Shrestha
 
PDF
Datech2014-Session1-Document Representation Refinement for Precise Region Des...
IMPACT Centre of Competence
 
PPTX
Graph of UK train stations
Daniyar Mukhanov
 
PPTX
BarnieMAT
Andrea Staccini
 
PPT
Distributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
tugrulh
 
PDF
Reactive Databases for Big Data applications
Graph-TA
 
PDF
5 Ways to Improve Your LiDAR Workflows
Safe Software
 
PPTX
Network analysis and Geocoding.
Habiba28
 
PPTX
GIS fundamentals - raster
Hans van der Kwast
 
PPT
Parallel Processing Concepts
Dr Shashikant Athawale
 
PDF
Sparse inverse covariance estimation
Ayush Singh, MS
 
PPTX
Graph Neural Network - Introduction
Jungwon Kim
 
PDF
How Powerful are Graph Networks?
IAMAl
 
PDF
Time travel and time series analysis with pandas + statsmodels
Alexander Hendorf
 
PPTX
Graph Databases
Sergey Enin
 
PPT
Parallel Algorithms- Sorting and Graph
Dr Shashikant Athawale
 
PPTX
Par add shared ifc parameters
Menno Mekes
 
Dr Richard Fry - Using R as a GIS
Shaun Lewis
 
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
Guy Lansley
 
Spatial Data Integrator - Software Presentation and Use Cases
mathieuraj
 
Executing Joins Dynamically in DDBS Query Optimizer
Er. Shiva K. Shrestha
 
Datech2014-Session1-Document Representation Refinement for Precise Region Des...
IMPACT Centre of Competence
 
Graph of UK train stations
Daniyar Mukhanov
 
BarnieMAT
Andrea Staccini
 
Distributed Computing Seminar - Lecture 2: MapReduce Theory and Implementation
tugrulh
 
Reactive Databases for Big Data applications
Graph-TA
 
5 Ways to Improve Your LiDAR Workflows
Safe Software
 
Network analysis and Geocoding.
Habiba28
 
GIS fundamentals - raster
Hans van der Kwast
 
Parallel Processing Concepts
Dr Shashikant Athawale
 
Sparse inverse covariance estimation
Ayush Singh, MS
 
Graph Neural Network - Introduction
Jungwon Kim
 
How Powerful are Graph Networks?
IAMAl
 
Time travel and time series analysis with pandas + statsmodels
Alexander Hendorf
 
Graph Databases
Sergey Enin
 
Parallel Algorithms- Sorting and Graph
Dr Shashikant Athawale
 
Par add shared ifc parameters
Menno Mekes
 

Viewers also liked (17)

PDF
Benchmarking tool for graph algorithms
Yash Khandelwal
 
PDF
Dynamic Draph / Iterative Computation on Apache Giraph
DataWorks Summit
 
PDF
Apache Giraph
Ahmet Emre Aladağ
 
PDF
Sparksee overview
Sparsity Technologies
 
PPTX
Big Graph Analytics Systems (Sigmod16 Tutorial)
Yuanyuan Tian
 
PPT
Selling Your House Spring-2015
MICHAEL TESSARO
 
PDF
Give your body a nutritious diet
GM Diet Magic
 
PPTX
Київська русь
svinchuk
 
PPTX
El misterio del solitario
Pamela Quirarte
 
PPTX
1
svinchuk
 
KEY
Graphs, Edges & Nodes - Untangling the Social Web
Joël Perras
 
PPSX
5. organ support techniques
BP KOIRALA INSTITUTE OF HELATH SCIENCS,, NEPAL
 
PDF
ملخص رسالة ماجستير أحمد المباريدي
Ahmed EL-Mabaredy
 
PPT
Apple diseases by Nazia Manzar
Nazia Manzar
 
PPTX
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
MLconf
 
PPTX
Instagramrettino
joeyrettino
 
PDF
2015 SW마에스트로 100+ 컨퍼런스_Hacking IoT
2015 SW마에스트로 100+ 컨퍼런스
 
Benchmarking tool for graph algorithms
Yash Khandelwal
 
Dynamic Draph / Iterative Computation on Apache Giraph
DataWorks Summit
 
Apache Giraph
Ahmet Emre Aladağ
 
Sparksee overview
Sparsity Technologies
 
Big Graph Analytics Systems (Sigmod16 Tutorial)
Yuanyuan Tian
 
Selling Your House Spring-2015
MICHAEL TESSARO
 
Give your body a nutritious diet
GM Diet Magic
 
Київська русь
svinchuk
 
El misterio del solitario
Pamela Quirarte
 
Graphs, Edges & Nodes - Untangling the Social Web
Joël Perras
 
ملخص رسالة ماجستير أحمد المباريدي
Ahmed EL-Mabaredy
 
Apple diseases by Nazia Manzar
Nazia Manzar
 
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
MLconf
 
Instagramrettino
joeyrettino
 
2015 SW마에스트로 100+ 컨퍼런스_Hacking IoT
2015 SW마에스트로 100+ 컨퍼런스
 
Ad

Similar to Benchmarking Tool for Graph Algorithms (20)

PDF
Exploring optimizations for dynamic pagerank algorithm based on CUDA : V3
Subhajit Sahu
 
PDF
Exploring optimizations for dynamic PageRank algorithm based on GPU : V4
Subhajit Sahu
 
PDF
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Ontico
 
PDF
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Alexey Zinoviev
 
PDF
Fast and Scalable NUMA-based Thread Parallel Breadth-first Search
Yuichiro Yasui
 
PDF
Graph Gurus Episode 5: Webinar PageRank
TigerGraph
 
PDF
Ling liu part 02:big graph processing
jins0618
 
PDF
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
Subhajit Sahu
 
PDF
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Subhajit Sahu
 
PPT
Lec5 Pagerank
mobius.cn
 
PPT
Lec5 pagerank
Carlos
 
PPT
Lec5 Pagerank
Jeff Hammerbacher
 
PDF
Graph Analysis Beyond Linear Algebra
Jason Riedy
 
PPT
MapReduceAlgorithms.ppt
CheeWeiTan10
 
PPT
Pagerank (from Google)
Sri Prasanna
 
PDF
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
Yuichiro Yasui
 
PDF
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTER
Subhajit Sahu
 
PDF
Graph Algorithms - Map-Reduce Graph Processing
Jason J Pulikkottil
 
PPTX
Basic Graph Algorithms Vertex (Node): lk
ymwjd5j8pb
 
PDF
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Jason Riedy
 
Exploring optimizations for dynamic pagerank algorithm based on CUDA : V3
Subhajit Sahu
 
Exploring optimizations for dynamic PageRank algorithm based on GPU : V4
Subhajit Sahu
 
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Ontico
 
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Alexey Zinoviev
 
Fast and Scalable NUMA-based Thread Parallel Breadth-first Search
Yuichiro Yasui
 
Graph Gurus Episode 5: Webinar PageRank
TigerGraph
 
Ling liu part 02:big graph processing
jins0618
 
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
Subhajit Sahu
 
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Subhajit Sahu
 
Lec5 Pagerank
mobius.cn
 
Lec5 pagerank
Carlos
 
Lec5 Pagerank
Jeff Hammerbacher
 
Graph Analysis Beyond Linear Algebra
Jason Riedy
 
MapReduceAlgorithms.ppt
CheeWeiTan10
 
Pagerank (from Google)
Sri Prasanna
 
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
Yuichiro Yasui
 
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTER
Subhajit Sahu
 
Graph Algorithms - Map-Reduce Graph Processing
Jason J Pulikkottil
 
Basic Graph Algorithms Vertex (Node): lk
ymwjd5j8pb
 
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Jason Riedy
 
Ad

Recently uploaded (20)

PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
PPTX
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
July Patch Tuesday
Ivanti
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
July Patch Tuesday
Ivanti
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 

Benchmarking Tool for Graph Algorithms

  • 1. BenchMarking Tool for Graph Algorithms IIIT-H Cloud Computing - Major Project By: Abhinaba Sarkar 201405616 Malavika Reddy 201201193 Yash Khandelwal 201302164 Nikita Kad 201330030
  • 2. Description ● In computer science and mathematics, graphs are abstract data structures that model structural relationships among objects. They are now widely used for data modeling in application domains for which identifying relationship patterns, rules, and anomalies is useful. ● These domains include the web graph, social networks,etc. The ever-increasing size of graph- structured data for these applications creates a critical need for scalable systems that can process large amounts of it efficiently. ● The project aims at making a benchmarking tool for testing the performance of graph algorithms like BFS, Pagerank,etc. with MapReduce, Giraph, GraphLab and Neo4j and testing which approach works better on what kind of graphs.
  • 3. Motivation ● Analyze the runtime of different types of graph algorithms on different types of distributed systems. ● Performing computation on a graph data structure requires processing at each node. ● Each node contains node-specific data as well as links (edges) to other nodes. So computation must traverse the graph which will take a huge amount of time.
  • 4. Approach The BFS/SSSP algorithm is broken in 2 tasks: ● Map Task:In each Map task, we discover all the neighbors of the node currently in queue (we used color encoding GRAY for nodes in queue) and add them to our graph. ● Reduce Task:In each Reduce task, we set the correct level of the nodes and update the graph. The pagerank algorithm is also broken in 2 steps: ● Map Task: Each page emit its neighbours and current pagerank. ● Reduce Task: For each key(page) new page rank is calculated using pagerank emitted in the map task. ○ PR(A)=(1-d) + d(PR(T1)/C(T1) + ... +PR(Tn)/C(Tn)) Where - C(P) is the cardinality (out- degree) of page P, d is the damping (“random URL”) factor. Dijkstra: ● Map task : In each of the map tasks, neighbors are discovered and put into the queue with color coding gray. ● Reduce task : In each of the reduce tasks, we select the nodes according to the shortest distances from the current node.
  • 5. Approach contd. Giraph and Hadoop All the computations are done on a cluster of 2 nodes Graphlab All the computations are performed on single machine
  • 6. Applications In today’s world, dynamic social graphs (like: linkedin, twitter and facebook) are not feasible to process in single node. Therefore we need to benchmark the runtime of different graph algorithms in distributed system. Example graph: LinkedIn’s social graph
  • 7. Complexity ● BFS: The complexity of standard BFS algorithm is O(V+E) but because of the overhead of read/write in distributed computing, the order reaches O (E*Depth). ● Similar is the case for Dijkstra’s algorithm. But number of iterations will be higher than BFS. ● Page Rank: The Complexity of pagerank in distributed system is – (No. of Node + No. of Relations)*Iterations
  • 8. Benchmarking - Giraph Nodes Time 1000 4 min 7.836 sec 1 million 10 min 11.443sec Nodes Time 1000 3 min 5.655 sec 1 million 11 min 0.05 sec Nodes Time 1000 5 min 12.111 sec 1 million 16 min 8.652 sec BFS Dijkstra Pagerank
  • 9. Nodes Time 1000 6.029 sec 10,000 20.154 sec 1 million 1 min 11.124 sec Nodes Time 1000 4.852 sec 10,000 13.029 sec 1 million 1 min 10.576sec Page-Rank Dijkstra Benchmarking - Graphlab
  • 10. Benchmarking - Hadoop Nodes Time 1000 4 min 7.836 sec 1 million 10 min 11.443sec Nodes Time 1000 3 min 5.655 sec 1 million 11 min 0.05 sec BFS Dijkstra Pagerank Nodes Time 1000 5 min 12.111 sec 1 million 16 min 8.652 sec BFS and Dijkstra’s runtime depend on the depth of the input graph.
  • 11. Problems we faced ● Poor locality of memory access. ● Very little work per vertex. ● Changing degree of parallelism. ● Running over many machines makes the problem worse
  • 12. Conclusion and Future Work ● Although GraphLab is fast, there is constraint on memory as it requires as much memory to contain the edges and their associated values of any single vertex in the graph. ● From the experimental results, it is seen that the time taken for pagerank algorithm is directly proportional to the number of relations in the graph when the number of nodes and iterations are constant. This explains the huge difference in time. ● The runtime of BFS is directly proportional to the depth of the graph. So, greater the depth, more will be the number of iterations and hence more time. Future Work: Taking the input graph from file adds a huge overhead of reading and writing to files in each iteration, so if somehow we can store the graph and its properties in a Database, the read/write overhead will be gone and the query time will be reduced. So,we plan to include Database in it.