SlideShare a Scribd company logo
By :: Jaideep Katkar
Under the Guidance of :: Dr. Tran Thanh
GraphLab Overview
A New Framework For Parallel Machine
Learning
– high-level abstractions for machine
learning problems
– Shared-memory multiprocessor
– Assume no fault tolerance needed
– Concurrent access precessing models with
sequential-consistency guarantees
How GraphLab Works?
– Represent the user's data by a directed graph
– Each block of data is represented by a vertex
and a directed edge
– Shared data table
– User functions:
 Update: modify the vertex and edges state, read
only to shared table
 Fold: sequential aggregation to a key entry in the
shared table, modify vertex data
 Merge: Parallelize Fold function
 Apply: Finalize the key entry in the shared table
GAS Decomposition
GraphLab Toolkit
 Topic Modeling contains applications like LDA which can be used to
cluster documents and extract topical representations.
 Graph Analytics contains application like pagerank and triangle
counting which can be applied to general graphs to estimate
community structure.
 Clustering contains standard data clustering tools such as Kmeans
 Collaborative Filtering contains a collection of applications used to
make predictions about users interests and factorize large matrices.
 Graphical Models contains tools for making joint predictions about
collections of related random variables.
 Computer Vision contains a collection of tools for reasoning about
images.
Running GraphLab on EC2 Cluster
Requirements ::
• You should have Amazon EC2 account eligible to run on us-east-1a zone.
• Amazon AWS console your AWS_ACCESS_KEY_ID and
AWS_SECRET_ACCESS_KEY (under your account name on the top right
corner-> security credentials -> access keys)
• You should have a keypair attached to the zone you are running on (in our
example us-east-1a)
• Install boto. This is the AWS Python client. To install, run: ‘sudo pip boto’.
• Download and install Graphlab as mentioned on next slides.
Satisfying Dependencies on Ubuntu
All the dependencies can be satisfied from the repository:
Below command will install gcc , jdk need to compile graphlab Programs:
Downloading GraphLab version 2.2
You can download GraphLab directly from our Github Repository.
Github also offers a zip download of the repository if you do not have
git. The git command line for cloning the repository is:
Compiling and Running Graphlab
In the graphlabapi directory, will create two sub-directories, release/ and
debug/ . cd into either of these directories and running make will build the
release or the debug versions respectively. Note that this will compile all of
GraphLab, including all toolkits.
Running Stochastic gradient descent (SGD) in
Collaborative Filtering toolkit
The collaborative filtering toolkit contains tools for computing a linear model
of the data, and predicting missing values based on this linear model. This is
useful when computing recommendations for users
http://docs.graphlab.org/collaborative_filtering.html
Running SGD for Netflix Data to predict
User Rating
Input File (Training) for Netflix Data
[User] [item] [rating]
1000 2 5.0
3 7 12.0
6 2 2.1
Creating Directory to load Netflix data
Command Line Arguments to Run SGD
--gamma=XX Gradient descent step size
--lambda=XX Gradient descent regularization
--step_dec=XX Multiplicative step decrease. Should be between 0.1
to 1. Default is 0.9.
--D=X Feature vector width. Common values are 20 - 150.
--max_iter=XX Max number of iterations
--maxval=XX Maximum allowed rating
--minval=XX Min allowed rating
--predictions=XX File name to write prediction to. Note that you will
need a user/item pair input file named something. predict to enable
predictions (see section: ratings).
--tol=XX Stop computation when absolute error of prediction is less
than tolerance. Default is 1e-3.
CS267_Graph_Lab
O/P file
SGD is a simple gradient descent algorithm. Prediction in SGD is
done as : r_ui = p_u * q_i Where r_ui is a scalar rating of user u to
item i, and p_u is the user feature vector of size D, q_i is the item
feature vector of size D and the product is a vector product.
Creating a GraphLab project
 Create a GraphLab project, simply create a sub-
directory in the graphlab/apps/ folder with your
project Name.
 For instance,
graphlab/apps/my_first_GraphLabProject.
 Create a text file called CMakeLists.txt with the
following contents ::
project(My_GraphLabProject)
add_graphlab_executable(my_first_GraphLabProject <ProgramName>.cpp)
Hello World in GraphLab
#include <graphlab.hpp>
using namespace graphlab;
#include <graphlab.hpp>
int main(int argc, char** argv)
{
graphlab::mpi_tools::init(argc, argv);
graphlab::distributed_control dc;
dc.cout() << "Hello World!n";
graphlab::mpi_tools::finalize();
}
• dc is the distributed communication layer which is needed by a number of
the core GraphLab objects, whether you are running distributed or not
• To create the program run the configure script, than run "make" in the
•debug/ release/ build folders. The program when executed, will print "Hello
World!".
Thank You
References ::
http://graphlab.com/community/events/conference14.html
http://graphlab.com/learn/notebooks/introduction_to_sframes.html
http://en.wikipedia.org/wiki/GraphLab
https://www.youtube.com/watch?v=lRN91_-hlkg
https://wiki.engr.illinois.edu/download/attachments/227740647/GraphLab
.pdf?version=1&modificationDate=1382500521000#page=1&zoom=auto,
0,280
http://arxiv.org/pdf/1204.6078v1.pdf
http://select.cs.cmu.edu/code/graphlab/doxygen/html/index.html

More Related Content

What's hot (20)

PDF
A Graph-Based Method For Cross-Entity Threat Detection
Jen Aman
 
PDF
Generalized Linear Models with H2O
Sri Ambati
 
PDF
Parallel External Memory Algorithms Applied to Generalized Linear Models
Revolution Analytics
 
PDF
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
DB Tsai
 
PPTX
Large-scale Recommendation Systems on Just a PC
Aapo Kyrölä
 
PDF
Enhancing Spark SQL Optimizer with Reliable Statistics
Jen Aman
 
PPTX
A Fast and Dirty Intro to NetworkX (and D3)
Lynn Cherny
 
PPTX
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defense
Aapo Kyrölä
 
PDF
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Spark Summit
 
PDF
Data clustering using map reduce
Varad Meru
 
PDF
Generalized Linear Models in Spark MLlib and SparkR
Databricks
 
PDF
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Spark Summit
 
PDF
Scalable Distributed Real-Time Clustering for Big Data Streams
Antonio Severien
 
PPTX
Online learning with structured streaming, spark summit brussels 2016
Ram Sriharsha
 
PPTX
Magellan FOSS4G Talk, Boston 2017
Ram Sriharsha
 
PDF
Graph Analyses with Python and NetworkX
Benjamin Bengfort
 
PDF
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
Spark Summit
 
PPTX
Apache Spark GraphX highlights.
Doug Needham
 
PDF
Large-Scale Machine Learning with Apache Spark
DB Tsai
 
PDF
GraphX and Pregel - Apache Spark
Ashutosh Trivedi
 
A Graph-Based Method For Cross-Entity Threat Detection
Jen Aman
 
Generalized Linear Models with H2O
Sri Ambati
 
Parallel External Memory Algorithms Applied to Generalized Linear Models
Revolution Analytics
 
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
DB Tsai
 
Large-scale Recommendation Systems on Just a PC
Aapo Kyrölä
 
Enhancing Spark SQL Optimizer with Reliable Statistics
Jen Aman
 
A Fast and Dirty Intro to NetworkX (and D3)
Lynn Cherny
 
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defense
Aapo Kyrölä
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Spark Summit
 
Data clustering using map reduce
Varad Meru
 
Generalized Linear Models in Spark MLlib and SparkR
Databricks
 
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Spark Summit
 
Scalable Distributed Real-Time Clustering for Big Data Streams
Antonio Severien
 
Online learning with structured streaming, spark summit brussels 2016
Ram Sriharsha
 
Magellan FOSS4G Talk, Boston 2017
Ram Sriharsha
 
Graph Analyses with Python and NetworkX
Benjamin Bengfort
 
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
Spark Summit
 
Apache Spark GraphX highlights.
Doug Needham
 
Large-Scale Machine Learning with Apache Spark
DB Tsai
 
GraphX and Pregel - Apache Spark
Ashutosh Trivedi
 

Viewers also liked (9)

PDF
Hakin9 nmap-ebook-ch1
Lalad
 
PDF
Graphlab under the hood
Zuhair khayyat
 
PDF
Machine Learning in the Cloud with GraphLab
Danny Bickson
 
PDF
GraphLab
Tushar Sudhakar Jee
 
PDF
PowerGraph
Igor Shevchenko
 
PDF
Ling liu part 01:big graph processing
jins0618
 
PDF
Jeff Bradshaw, Founder, Adaptris
MLconf
 
PDF
Graph processing - Graphlab
Amir Payberah
 
PDF
Graph processing - Powergraph and GraphX
Amir Payberah
 
Hakin9 nmap-ebook-ch1
Lalad
 
Graphlab under the hood
Zuhair khayyat
 
Machine Learning in the Cloud with GraphLab
Danny Bickson
 
PowerGraph
Igor Shevchenko
 
Ling liu part 01:big graph processing
jins0618
 
Jeff Bradshaw, Founder, Adaptris
MLconf
 
Graph processing - Graphlab
Amir Payberah
 
Graph processing - Powergraph and GraphX
Amir Payberah
 
Ad

Similar to CS267_Graph_Lab (20)

PDF
R programming for data science
Sovello Hildebrand
 
PDF
02 c++g3 d (1)
Mohammed Ali
 
PPTX
Tech Talk - Overview of Dash framework for building dashboards
Appsilon Data Science
 
PDF
Polyline download and visualization over terrain models
graphitech
 
PDF
Cascading on starfish
Fei Dong
 
PDF
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal
Srivatsan Ramanujam
 
PDF
Benchmarking tool for graph algorithms
Yash Khandelwal
 
DOCX
Vipul divyanshu mahout_documentation
Vipul Divyanshu
 
DOCX
1 Project 2 Introduction - the SeaPort Project seri.docx
honey725342
 
PDF
Reproducible Research in R and R Studio
Susan Johnston
 
PDF
Building Agents with LangGraph & Gemini
HusseinMalikMammadli
 
PPTX
Decrease build time and application size
Keval Patel
 
PPTX
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
zmhassan
 
PDF
Gradle(the innovation continues)
Sejong Park
 
PDF
Scaling Application on High Performance Computing Clusters and Analysis of th...
Rusif Eyvazli
 
PDF
Apache Airflow® Best Practices: DAG Writing
Aggregage
 
PPT
Potter’S Wheel
Dr Anjan Krishnamurthy
 
PPT
Vedic Calculator
divyang_panchasara
 
PPTX
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
InfluxData
 
R programming for data science
Sovello Hildebrand
 
02 c++g3 d (1)
Mohammed Ali
 
Tech Talk - Overview of Dash framework for building dashboards
Appsilon Data Science
 
Polyline download and visualization over terrain models
graphitech
 
Cascading on starfish
Fei Dong
 
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal
Srivatsan Ramanujam
 
Benchmarking tool for graph algorithms
Yash Khandelwal
 
Vipul divyanshu mahout_documentation
Vipul Divyanshu
 
1 Project 2 Introduction - the SeaPort Project seri.docx
honey725342
 
Reproducible Research in R and R Studio
Susan Johnston
 
Building Agents with LangGraph & Gemini
HusseinMalikMammadli
 
Decrease build time and application size
Keval Patel
 
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
zmhassan
 
Gradle(the innovation continues)
Sejong Park
 
Scaling Application on High Performance Computing Clusters and Analysis of th...
Rusif Eyvazli
 
Apache Airflow® Best Practices: DAG Writing
Aggregage
 
Potter’S Wheel
Dr Anjan Krishnamurthy
 
Vedic Calculator
divyang_panchasara
 
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
InfluxData
 
Ad

Recently uploaded (20)

PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
PPTX
Farrell_Programming Logic and Design slides_10e_ch02_PowerPoint.pptx
bashnahara11
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PDF
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification
Ivan Ruchkin
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
Generative AI vs Predictive AI-The Ultimate Comparison Guide
Lily Clark
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
Researching The Best Chat SDK Providers in 2025
Ray Fields
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PPTX
Agile Chennai 18-19 July 2025 | Workshop - Enhancing Agile Collaboration with...
AgileNetwork
 
PDF
TrustArc Webinar - Navigating Data Privacy in LATAM: Laws, Trends, and Compli...
TrustArc
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PPTX
Simple and concise overview about Quantum computing..pptx
mughal641
 
PPTX
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
PPTX
AI Code Generation Risks (Ramkumar Dilli, CIO, Myridius)
Priyanka Aash
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
The Future of Artificial Intelligence (AI)
Mukul
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
Farrell_Programming Logic and Design slides_10e_ch02_PowerPoint.pptx
bashnahara11
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification
Ivan Ruchkin
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
Generative AI vs Predictive AI-The Ultimate Comparison Guide
Lily Clark
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
Researching The Best Chat SDK Providers in 2025
Ray Fields
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
Agile Chennai 18-19 July 2025 | Workshop - Enhancing Agile Collaboration with...
AgileNetwork
 
TrustArc Webinar - Navigating Data Privacy in LATAM: Laws, Trends, and Compli...
TrustArc
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
Simple and concise overview about Quantum computing..pptx
mughal641
 
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
AI Code Generation Risks (Ramkumar Dilli, CIO, Myridius)
Priyanka Aash
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 

CS267_Graph_Lab

  • 1. By :: Jaideep Katkar Under the Guidance of :: Dr. Tran Thanh
  • 2. GraphLab Overview A New Framework For Parallel Machine Learning – high-level abstractions for machine learning problems – Shared-memory multiprocessor – Assume no fault tolerance needed – Concurrent access precessing models with sequential-consistency guarantees
  • 3. How GraphLab Works? – Represent the user's data by a directed graph – Each block of data is represented by a vertex and a directed edge – Shared data table – User functions:  Update: modify the vertex and edges state, read only to shared table  Fold: sequential aggregation to a key entry in the shared table, modify vertex data  Merge: Parallelize Fold function  Apply: Finalize the key entry in the shared table
  • 5. GraphLab Toolkit  Topic Modeling contains applications like LDA which can be used to cluster documents and extract topical representations.  Graph Analytics contains application like pagerank and triangle counting which can be applied to general graphs to estimate community structure.  Clustering contains standard data clustering tools such as Kmeans  Collaborative Filtering contains a collection of applications used to make predictions about users interests and factorize large matrices.  Graphical Models contains tools for making joint predictions about collections of related random variables.  Computer Vision contains a collection of tools for reasoning about images.
  • 6. Running GraphLab on EC2 Cluster Requirements :: • You should have Amazon EC2 account eligible to run on us-east-1a zone. • Amazon AWS console your AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY (under your account name on the top right corner-> security credentials -> access keys) • You should have a keypair attached to the zone you are running on (in our example us-east-1a) • Install boto. This is the AWS Python client. To install, run: ‘sudo pip boto’. • Download and install Graphlab as mentioned on next slides.
  • 7. Satisfying Dependencies on Ubuntu All the dependencies can be satisfied from the repository: Below command will install gcc , jdk need to compile graphlab Programs: Downloading GraphLab version 2.2 You can download GraphLab directly from our Github Repository. Github also offers a zip download of the repository if you do not have git. The git command line for cloning the repository is:
  • 8. Compiling and Running Graphlab In the graphlabapi directory, will create two sub-directories, release/ and debug/ . cd into either of these directories and running make will build the release or the debug versions respectively. Note that this will compile all of GraphLab, including all toolkits.
  • 9. Running Stochastic gradient descent (SGD) in Collaborative Filtering toolkit The collaborative filtering toolkit contains tools for computing a linear model of the data, and predicting missing values based on this linear model. This is useful when computing recommendations for users http://docs.graphlab.org/collaborative_filtering.html
  • 10. Running SGD for Netflix Data to predict User Rating Input File (Training) for Netflix Data [User] [item] [rating] 1000 2 5.0 3 7 12.0 6 2 2.1 Creating Directory to load Netflix data
  • 11. Command Line Arguments to Run SGD --gamma=XX Gradient descent step size --lambda=XX Gradient descent regularization --step_dec=XX Multiplicative step decrease. Should be between 0.1 to 1. Default is 0.9. --D=X Feature vector width. Common values are 20 - 150. --max_iter=XX Max number of iterations --maxval=XX Maximum allowed rating --minval=XX Min allowed rating --predictions=XX File name to write prediction to. Note that you will need a user/item pair input file named something. predict to enable predictions (see section: ratings). --tol=XX Stop computation when absolute error of prediction is less than tolerance. Default is 1e-3.
  • 13. O/P file SGD is a simple gradient descent algorithm. Prediction in SGD is done as : r_ui = p_u * q_i Where r_ui is a scalar rating of user u to item i, and p_u is the user feature vector of size D, q_i is the item feature vector of size D and the product is a vector product.
  • 14. Creating a GraphLab project  Create a GraphLab project, simply create a sub- directory in the graphlab/apps/ folder with your project Name.  For instance, graphlab/apps/my_first_GraphLabProject.  Create a text file called CMakeLists.txt with the following contents :: project(My_GraphLabProject) add_graphlab_executable(my_first_GraphLabProject <ProgramName>.cpp)
  • 15. Hello World in GraphLab #include <graphlab.hpp> using namespace graphlab; #include <graphlab.hpp> int main(int argc, char** argv) { graphlab::mpi_tools::init(argc, argv); graphlab::distributed_control dc; dc.cout() << "Hello World!n"; graphlab::mpi_tools::finalize(); } • dc is the distributed communication layer which is needed by a number of the core GraphLab objects, whether you are running distributed or not • To create the program run the configure script, than run "make" in the •debug/ release/ build folders. The program when executed, will print "Hello World!".
  • 16. Thank You References :: http://graphlab.com/community/events/conference14.html http://graphlab.com/learn/notebooks/introduction_to_sframes.html http://en.wikipedia.org/wiki/GraphLab https://www.youtube.com/watch?v=lRN91_-hlkg https://wiki.engr.illinois.edu/download/attachments/227740647/GraphLab .pdf?version=1&modificationDate=1382500521000#page=1&zoom=auto, 0,280 http://arxiv.org/pdf/1204.6078v1.pdf http://select.cs.cmu.edu/code/graphlab/doxygen/html/index.html