SlideShare a Scribd company logo
Measuring scholarly impact: Methods and practice
Link prediction with the linkpred tool
Raf Guns
University of Antwerp
raf.guns@uantwerpen.be
If you want to follow along…
Download and install Anaconda Python from
http://continuum.io/downloads
Download the example data from http://bit.ly/1HpZvIa
“A pair of scientists who have five mutual previous
collaborators, for instance, are about twice as likely to
collaborate as a pair with only two, and about 200 times as
likely as a pair with none.” (Newman, 2001; emphasis mine)
Agenda
What is link prediction? (and why?)
Example data
The linkpred tool
Link prediction in practice
Conclusion
What is link prediction?
Networks
Networks in informetrics
 Citation
 Papers
 Journals
 Authors
 Patents
 …
 Collaboration
 Authors
 Institutions
 Countries
 …
 Co-citation
 Bibliographic coupling
 Web links
 And so on
Definitions
A network G = (V, E) consists of:
 A set of nodes or vertices V
 A set of links or edges E
Each link connects two nodes from V
Neighbourhood N(v) of node v: all nodes connected to v
Node degree |N(v)| of v: number of connected nodes =
number of items in set N(v)
Change in networks
Most networks are not static, e.g. in collaboration network:
 New authors appear
 Old authors disappear
 New collaborations are initiated
 Previous collaborators stop collaborating
Change in networks
Some changes are more plausible than others
Change in networks
Different mechanisms have been identified
 Assortativity: similar nodes are more likely to connect
 Preferential attachment: well-connected nodes attract
more new connections
 Cf. cumulative advantage, Matthew effect
The link prediction question
Liben-Nowell and Kleinberg (2003, 2007):
“Given a snapshot of a social network, can we infer which
new interactions among its members are likely to occur
in the near future?”
Link prediction steps
1. Data gathering
2. Preprocessing
3. Prediction
4. Evaluation
Steps
Why link prediction?
 You want to know which links will appear in the future
 Recommendation
 Finding missing links
 Finding ‘anomalous’ links (correct or incorrect)
 Evaluating network formation and evolution models
Our example data
Data
Guns and Rousseau (2013)
 Collaboration between
cities in Africa and
South-Asia
 Topic: malaria
 In three consecutive
time periods
Available as three Pajek
network files:
http://bit.ly/1HpZvIa
1997-2001
2002-2006
2007-2011
The linkpred tool
About
https://github.com/rafguns/linkpred
Cross-platform (written in Python)
Open source: BSD license
Command-line tool!
Alternative: LPmade
(https://github.com/rlichtenwalter/LPmade)
How and where to get linkpred
1. Install Anaconda Python:
http://continuum.io/downloads
2. Open command-line window
3. Run command:
> pip install
https://github.com/rafguns/linkpred/archive/stable.zip
4. Wait until installation is finished
Basic usage
> linkpred
Should display brief usage instructions
> linkpred --help
Displays more complete help output
Basic usage
> linkpred training-network-file --predictors
predictor --output output-type
Read the network in training-network-file, predict using
predictor and give output of output-type
> linkpred training-network-file test-network-
file --predictors predictor --output output-
type
Read the network in training-network-file, compare with test-
network-file, predict using predictor and give output of
output-type
Link prediction in practice
Preprocessing
Nodes may also appear and disappear
 Restrict to intersection of node sets of training and test
network
 Only where test network is available
Restrict by degree (default: only discard isolate nodes)
Directed networks: not supported
 Convert to undirected first
Prediction: choosing predictors
Local
 AdamicAdar
 AssociationStrength
 CommonNeighbours
 Cosine
 DegreeProduct
 Jaccard
 MaxOverlap
 MinOverlap
 NMeasure
 Pearson
 ResourceAllocation
Global
 GraphDistance
 Katz
 RootedPageRank
 SimRank
Other
 Community
 Copy
 Random
Local predictors
Tendency towards triadic closure
Number of common neighbours is a simple but powerful predictor.
Local predictors
 Common neighbours
 Normalizations of common neighbours
 Jaccard coefficient, cosine measure…
 Adamic/Adar (Adamic & Adar, 2003)
𝑊 𝑢, 𝑣 =
𝑧∈𝑁 𝑢 ∩𝑁(𝑣)
1
log |𝑁 𝑧 |
Weighted networks
In weighted networks, links have weights (e.g. number of
joint papers, number of citations…)
Link weights : often ignored!!
Most predictors in linkpred can use link weights
 General idea: higher link weight (e.g., more common
papers), stronger connection
Global predictors
Graph distance: lowest
number of links needed to
travel from a to b
 problem: small world
phenomenon
Global predictors
Katz (1953):
𝑊 𝑣𝑖, 𝑣𝑗 =
𝑘=1
∞
𝛽 𝑘 𝑎𝑖𝑗
(𝑘)
 𝑎𝑖𝑗 : 1 if i and j are linked, 0 otherwise
 𝑎𝑖𝑗
(𝑘)
: number of walks with length k from i to j
 𝛽: parameter, “probability of effectiveness of a single
link”
 Longer walks: lower effectiveness
Global predictors
Rooted PageRank
Global predictors
Rooted PageRank
Global predictors
SimRank (Jeh & Widom, 2002)
“Objects that link to similar objects are similar
themselves.”
𝑊 𝑢, 𝑣 =
𝑐
|𝑁(𝑢)| ∙ |𝑁(𝑣)|
)𝑝∈𝑁(𝑢 )𝑞∈𝑁(𝑣
)𝑊(𝑝, 𝑞
Starting point: a node is maximally similar to itself:
W(v, v) = 1
Demo
Predict
Save predictions to file  import in e.g. Excel
Evaluation
Step 4: ‘How well does it work?’
How?  compare to ‘known good’ test network
Four groups:
Link Non-link
Predicted True positive False positive
Not predicted False negative True negative
Evaluation
Simply save results to text file:
--output cache-evaluations
Create chart:
 Recall-precision
 ROC
Evaluation: recall-precision
Precision: fraction of
correct predictions
Recall: fraction of correctly
predicted links
Evaluation: ROC
False positive rate:
Fraction of incorrectly
predicted links
True positive rate:
fraction of correctly
predicted links
(= recall)
Profiles
A simple way to save and reuse the configuration of a
complex prediction run (options, predictors, parameters…)
Usage example:
> linkpred network-file --profile profile.yml
Format: YAML, see https://en.wikipedia.org/wiki/YAML
Example profile
predictors:
- name: AdamicAdar
displayname: Adamic/Adar
- name: GraphDistance
displayname: Graph distance
parameters:
weight: weight
- name: SimRank
displayname: SimRank (c=0.4)
parameters:
c: 0.4
- name: SimRank
displayname: SimRank (c=0.8)
parameters:
c: 0.8
output:
- cache-predictions
- recall-precision
Conclusion
About link prediction
Link prediction is possible because link formation is not a
purely random process
Limitations:
 Unaware of social and other circumstantial factors
 Which predictor is ‘best’ for a concrete situation?
 Trade-off between prediction accuracy and non-triviality
About linkpred
Relatively simple but powerful
Limitations:
 Not suitable for very large and/or dense networks
 Does not incorporate more complex setups like predictor
combinations, machine learning etc.
All results can be exported for analysis in other software
(cache-*)
Open source: contributions welcome! 

More Related Content

What's hot (20)

PDF
[DL輪読会]Causality Inspired Representation Learning for Domain Generalization
Deep Learning JP
 
PPTX
CNN Machine learning DeepLearning
Abhishek Sharma
 
PDF
[DL輪読会]Transframer: Arbitrary Frame Prediction with Generative Models
Deep Learning JP
 
PDF
Vision and Language(メタサーベイ )
cvpaper. challenge
 
PDF
Image segmentation with deep learning
Antonio Rueda-Toicen
 
PPTX
You only look once (YOLO) : unified real time object detection
Entrepreneur / Startup
 
PDF
Temporal networks - Alain Barrat
Lake Como School of Advanced Studies
 
PDF
Faster R-CNN - PR012
Jinwon Lee
 
PPTX
独立性に基づくブラインド音源分離の発展と独立低ランク行列分析 History of independence-based blind source sep...
Daichi Kitamura
 
PPTX
Graph Neural Network - Introduction
Jungwon Kim
 
PDF
Neural Radiance Fields & Neural Rendering.pdf
NavneetPaul2
 
PDF
環境音の特徴を活用した音響イベント検出・シーン分類
Keisuke Imoto
 
PDF
文献紹介:YOLO series:v1-v5, X, F, and YOWO
Toru Tamaki
 
PPTX
Social Media Mining - Chapter 6 (Community Analysis)
SocialMediaMining
 
PPTX
State of transformers in Computer Vision
Deep Kayal
 
PDF
SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​
SSII
 
PDF
画像キャプションと動作認識の最前線 〜データセットに注目して〜(第17回ステアラボ人工知能セミナー)
STAIR Lab, Chiba Institute of Technology
 
PPTX
Mask R-CNN
Jaehyun Jun
 
PDF
Link prediction
Carlos Castillo (ChaTo)
 
PPTX
深層学習を用いた音源定位、音源分離、クラス分類の統合~環境音セグメンテーション手法の紹介~
Yui Sudo
 
[DL輪読会]Causality Inspired Representation Learning for Domain Generalization
Deep Learning JP
 
CNN Machine learning DeepLearning
Abhishek Sharma
 
[DL輪読会]Transframer: Arbitrary Frame Prediction with Generative Models
Deep Learning JP
 
Vision and Language(メタサーベイ )
cvpaper. challenge
 
Image segmentation with deep learning
Antonio Rueda-Toicen
 
You only look once (YOLO) : unified real time object detection
Entrepreneur / Startup
 
Temporal networks - Alain Barrat
Lake Como School of Advanced Studies
 
Faster R-CNN - PR012
Jinwon Lee
 
独立性に基づくブラインド音源分離の発展と独立低ランク行列分析 History of independence-based blind source sep...
Daichi Kitamura
 
Graph Neural Network - Introduction
Jungwon Kim
 
Neural Radiance Fields & Neural Rendering.pdf
NavneetPaul2
 
環境音の特徴を活用した音響イベント検出・シーン分類
Keisuke Imoto
 
文献紹介:YOLO series:v1-v5, X, F, and YOWO
Toru Tamaki
 
Social Media Mining - Chapter 6 (Community Analysis)
SocialMediaMining
 
State of transformers in Computer Vision
Deep Kayal
 
SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​
SSII
 
画像キャプションと動作認識の最前線 〜データセットに注目して〜(第17回ステアラボ人工知能セミナー)
STAIR Lab, Chiba Institute of Technology
 
Mask R-CNN
Jaehyun Jun
 
Link prediction
Carlos Castillo (ChaTo)
 
深層学習を用いた音源定位、音源分離、クラス分類の統合~環境音セグメンテーション手法の紹介~
Yui Sudo
 

Viewers also liked (20)

PPTX
K-means Clustering with Scikit-Learn
Sarah Guido
 
PPTX
ColegauCymru CollegesWales HE in FE 120413
Lis Parcell
 
PPT
Reaching net-generation learners with social technologies
guestba21f9
 
PPT
Zendframework Parte2
massimiliano.wosz
 
PPT
Leadership.Mena
agek2005
 
PPTX
Publiwide, eBooks as a service model
Sebastien Dubuis
 
PPT
Orientamenti di social media marketing
Communication Village
 
DOCX
El sermón la vaca más sagrada del protestantismo
Paulo Arieu
 
PPTX
Story Telling By Eddie Choi
Eddie Choi
 
PPTX
Digital literacies: setting the scene
Lis Parcell
 
PPT
Iatefl 2013 titova
Moscow State University
 
PDF
WORD
earhidalgo
 
PDF
Рынок смартфонов и планшетов США. 2012 и 2013
Maria Podolyak
 
PPT
Il cannocchiale aristotelico in viaggio per una mostra...
stefanogambari
 
PPT
Orientamenti di social media marketing
Communication Village
 
PPT
The Italian Model Of The Inclusion. Dario Ianes. Edizioni Erickson. Italy
www.erickson.it
 
PPT
コミュニケーションソフトウェアを創るということ
Kazuho Oku
 
PPT
Przyjaźń międzyrodzinna
agata stanisz
 
PPTX
Adv06 f dnk_project_guide
dnaveda
 
PPT
Product Development at the Smithsonian Libraries
eclemrush
 
K-means Clustering with Scikit-Learn
Sarah Guido
 
ColegauCymru CollegesWales HE in FE 120413
Lis Parcell
 
Reaching net-generation learners with social technologies
guestba21f9
 
Zendframework Parte2
massimiliano.wosz
 
Leadership.Mena
agek2005
 
Publiwide, eBooks as a service model
Sebastien Dubuis
 
Orientamenti di social media marketing
Communication Village
 
El sermón la vaca más sagrada del protestantismo
Paulo Arieu
 
Story Telling By Eddie Choi
Eddie Choi
 
Digital literacies: setting the scene
Lis Parcell
 
Iatefl 2013 titova
Moscow State University
 
Рынок смартфонов и планшетов США. 2012 и 2013
Maria Podolyak
 
Il cannocchiale aristotelico in viaggio per una mostra...
stefanogambari
 
Orientamenti di social media marketing
Communication Village
 
The Italian Model Of The Inclusion. Dario Ianes. Edizioni Erickson. Italy
www.erickson.it
 
コミュニケーションソフトウェアを創るということ
Kazuho Oku
 
Przyjaźń międzyrodzinna
agata stanisz
 
Adv06 f dnk_project_guide
dnaveda
 
Product Development at the Smithsonian Libraries
eclemrush
 
Ad

Similar to Link prediction with the linkpred tool (20)

PDF
Ego net facebook data analysis
Samsil Arefin
 
PPTX
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Xiaohan Zeng
 
PPT
Vitus Masters Defense
derDoc
 
PDF
Synthetic Data Generation using exponential random Graph modeling
Graph-TA
 
PPT
Knowledge Sharing over social networking systems
tanguy
 
PPTX
ICWE2017 BigDataEurope
BigData_Europe
 
PDF
Describing configurations of software experiments as Linked Data
Joachim Van Herwegen
 
PDF
stanford_graph-learning_workshop.pdf
AdeIndriawan1
 
PDF
Bayesian Network 을 활용한 예측 분석
datasciencekorea
 
PPTX
2019 swan-cs3
Up2Universe
 
PDF
Velox: Models in Action
Dan Crankshaw
 
PDF
Creating Community at WeWork through Graph Embeddings with node2vec - Karry Lu
Rising Media Ltd.
 
PPT
IEEE ICPADS 2008 - Kalman Graffi - SkyEye.KOM: An Information Management Over...
Kalman Graffi
 
PDF
High-Performance Graph Analysis and Modeling
Nesreen K. Ahmed
 
ODP
Who pulls the strings?
Ronny
 
PPT
Semantic Text Processing Powered by Wikipedia
Maxim Grinev
 
PDF
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
Polytechnic University of Bari
 
PDF
Reproducible Science and Deep Software Variability
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
PDF
OpenML Tutorial ECMLPKDD 2015
Joaquin Vanschoren
 
PPT
2006-05-25__coi-semdis
webuploader
 
Ego net facebook data analysis
Samsil Arefin
 
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Xiaohan Zeng
 
Vitus Masters Defense
derDoc
 
Synthetic Data Generation using exponential random Graph modeling
Graph-TA
 
Knowledge Sharing over social networking systems
tanguy
 
ICWE2017 BigDataEurope
BigData_Europe
 
Describing configurations of software experiments as Linked Data
Joachim Van Herwegen
 
stanford_graph-learning_workshop.pdf
AdeIndriawan1
 
Bayesian Network 을 활용한 예측 분석
datasciencekorea
 
2019 swan-cs3
Up2Universe
 
Velox: Models in Action
Dan Crankshaw
 
Creating Community at WeWork through Graph Embeddings with node2vec - Karry Lu
Rising Media Ltd.
 
IEEE ICPADS 2008 - Kalman Graffi - SkyEye.KOM: An Information Management Over...
Kalman Graffi
 
High-Performance Graph Analysis and Modeling
Nesreen K. Ahmed
 
Who pulls the strings?
Ronny
 
Semantic Text Processing Powered by Wikipedia
Maxim Grinev
 
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
Polytechnic University of Bari
 
Reproducible Science and Deep Software Variability
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
OpenML Tutorial ECMLPKDD 2015
Joaquin Vanschoren
 
2006-05-25__coi-semdis
webuploader
 
Ad

Recently uploaded (20)

PDF
RODENT PEST MANAGEMENT-converted-compressed.pdf
S.B.P.G. COLLEGE BARAGAON VARANASI
 
PDF
Carbon-richDustInjectedintotheInterstellarMediumbyGalacticWCBinaries Survives...
Sérgio Sacani
 
PPTX
Lamarckism is one of the earliest theories of evolution, proposed before Darw...
Laxman Khatal
 
PPTX
Akshay tunneling .pptx_20250331_165945_0000.pptx
akshaythaker18
 
PDF
crestacean parasitim non chordates notes
S.B.P.G. COLLEGE BARAGAON VARANASI
 
PDF
Unit-3 ppt.pdf organic chemistry - 3 unit 3
visionshukla007
 
PDF
A proposed mechanism for the formation of protocell-like structures on Titan
Sérgio Sacani
 
PDF
A young gas giant and hidden substructures in a protoplanetary disk
Sérgio Sacani
 
PPTX
mode_of_action_of_fungicides_final[1] (2).pptx
MrRABIRANJAN
 
PDF
The-Origin- of -Metazoa-vertebrates .ppt
S.B.P.G. COLLEGE BARAGAON VARANASI
 
PPT
Restriction digestion of DNA for students of undergraduate and post graduate ...
DrMukeshRameshPimpli
 
PDF
Chemokines and Receptors Overview – Key to Immune Cell Signaling
Benjamin Lewis Lewis
 
PDF
Introduction of Animal Behaviour full notes.pdf
S.B.P.G. COLLEGE BARAGAON VARANASI
 
PPTX
Phage Therapy and Bacteriophage Biology.pptx
Prachi Virat
 
PPTX
Vectors and applications of genetic engineering Pptx
Ashwini I Chuncha
 
PDF
2025-06-10 TWDB Agency Updates & Legislative Outcomes
tagdpa
 
PDF
WUCHERIA BANCROFTI-converted-compressed.pdf
S.B.P.G. COLLEGE BARAGAON VARANASI
 
PPT
Cell cycle,cell cycle checkpoint and control
DrMukeshRameshPimpli
 
PDF
Continuous Model-Based Engineering of Software-Intensive Systems: Approaches,...
Hugo Bruneliere
 
PPTX
Qualification of DISSOLUTION TEST APPARATUS.pptx
shrutipandit17
 
RODENT PEST MANAGEMENT-converted-compressed.pdf
S.B.P.G. COLLEGE BARAGAON VARANASI
 
Carbon-richDustInjectedintotheInterstellarMediumbyGalacticWCBinaries Survives...
Sérgio Sacani
 
Lamarckism is one of the earliest theories of evolution, proposed before Darw...
Laxman Khatal
 
Akshay tunneling .pptx_20250331_165945_0000.pptx
akshaythaker18
 
crestacean parasitim non chordates notes
S.B.P.G. COLLEGE BARAGAON VARANASI
 
Unit-3 ppt.pdf organic chemistry - 3 unit 3
visionshukla007
 
A proposed mechanism for the formation of protocell-like structures on Titan
Sérgio Sacani
 
A young gas giant and hidden substructures in a protoplanetary disk
Sérgio Sacani
 
mode_of_action_of_fungicides_final[1] (2).pptx
MrRABIRANJAN
 
The-Origin- of -Metazoa-vertebrates .ppt
S.B.P.G. COLLEGE BARAGAON VARANASI
 
Restriction digestion of DNA for students of undergraduate and post graduate ...
DrMukeshRameshPimpli
 
Chemokines and Receptors Overview – Key to Immune Cell Signaling
Benjamin Lewis Lewis
 
Introduction of Animal Behaviour full notes.pdf
S.B.P.G. COLLEGE BARAGAON VARANASI
 
Phage Therapy and Bacteriophage Biology.pptx
Prachi Virat
 
Vectors and applications of genetic engineering Pptx
Ashwini I Chuncha
 
2025-06-10 TWDB Agency Updates & Legislative Outcomes
tagdpa
 
WUCHERIA BANCROFTI-converted-compressed.pdf
S.B.P.G. COLLEGE BARAGAON VARANASI
 
Cell cycle,cell cycle checkpoint and control
DrMukeshRameshPimpli
 
Continuous Model-Based Engineering of Software-Intensive Systems: Approaches,...
Hugo Bruneliere
 
Qualification of DISSOLUTION TEST APPARATUS.pptx
shrutipandit17
 

Link prediction with the linkpred tool

  • 1. Measuring scholarly impact: Methods and practice Link prediction with the linkpred tool Raf Guns University of Antwerp [email protected]
  • 2. If you want to follow along… Download and install Anaconda Python from http://continuum.io/downloads Download the example data from http://bit.ly/1HpZvIa
  • 3. “A pair of scientists who have five mutual previous collaborators, for instance, are about twice as likely to collaborate as a pair with only two, and about 200 times as likely as a pair with none.” (Newman, 2001; emphasis mine)
  • 4. Agenda What is link prediction? (and why?) Example data The linkpred tool Link prediction in practice Conclusion
  • 5. What is link prediction?
  • 7. Networks in informetrics  Citation  Papers  Journals  Authors  Patents  …  Collaboration  Authors  Institutions  Countries  …  Co-citation  Bibliographic coupling  Web links  And so on
  • 8. Definitions A network G = (V, E) consists of:  A set of nodes or vertices V  A set of links or edges E Each link connects two nodes from V Neighbourhood N(v) of node v: all nodes connected to v Node degree |N(v)| of v: number of connected nodes = number of items in set N(v)
  • 9. Change in networks Most networks are not static, e.g. in collaboration network:  New authors appear  Old authors disappear  New collaborations are initiated  Previous collaborators stop collaborating
  • 10. Change in networks Some changes are more plausible than others
  • 11. Change in networks Different mechanisms have been identified  Assortativity: similar nodes are more likely to connect  Preferential attachment: well-connected nodes attract more new connections  Cf. cumulative advantage, Matthew effect
  • 12. The link prediction question Liben-Nowell and Kleinberg (2003, 2007): “Given a snapshot of a social network, can we infer which new interactions among its members are likely to occur in the near future?”
  • 13. Link prediction steps 1. Data gathering 2. Preprocessing 3. Prediction 4. Evaluation
  • 14. Steps
  • 15. Why link prediction?  You want to know which links will appear in the future  Recommendation  Finding missing links  Finding ‘anomalous’ links (correct or incorrect)  Evaluating network formation and evolution models
  • 17. Data Guns and Rousseau (2013)  Collaboration between cities in Africa and South-Asia  Topic: malaria  In three consecutive time periods Available as three Pajek network files: http://bit.ly/1HpZvIa
  • 22. About https://github.com/rafguns/linkpred Cross-platform (written in Python) Open source: BSD license Command-line tool! Alternative: LPmade (https://github.com/rlichtenwalter/LPmade)
  • 23. How and where to get linkpred 1. Install Anaconda Python: http://continuum.io/downloads 2. Open command-line window 3. Run command: > pip install https://github.com/rafguns/linkpred/archive/stable.zip 4. Wait until installation is finished
  • 24. Basic usage > linkpred Should display brief usage instructions > linkpred --help Displays more complete help output
  • 25. Basic usage > linkpred training-network-file --predictors predictor --output output-type Read the network in training-network-file, predict using predictor and give output of output-type > linkpred training-network-file test-network- file --predictors predictor --output output- type Read the network in training-network-file, compare with test- network-file, predict using predictor and give output of output-type
  • 26. Link prediction in practice
  • 27. Preprocessing Nodes may also appear and disappear  Restrict to intersection of node sets of training and test network  Only where test network is available Restrict by degree (default: only discard isolate nodes) Directed networks: not supported  Convert to undirected first
  • 28. Prediction: choosing predictors Local  AdamicAdar  AssociationStrength  CommonNeighbours  Cosine  DegreeProduct  Jaccard  MaxOverlap  MinOverlap  NMeasure  Pearson  ResourceAllocation Global  GraphDistance  Katz  RootedPageRank  SimRank Other  Community  Copy  Random
  • 29. Local predictors Tendency towards triadic closure Number of common neighbours is a simple but powerful predictor.
  • 30. Local predictors  Common neighbours  Normalizations of common neighbours  Jaccard coefficient, cosine measure…  Adamic/Adar (Adamic & Adar, 2003) 𝑊 𝑢, 𝑣 = 𝑧∈𝑁 𝑢 ∩𝑁(𝑣) 1 log |𝑁 𝑧 |
  • 31. Weighted networks In weighted networks, links have weights (e.g. number of joint papers, number of citations…) Link weights : often ignored!! Most predictors in linkpred can use link weights  General idea: higher link weight (e.g., more common papers), stronger connection
  • 32. Global predictors Graph distance: lowest number of links needed to travel from a to b  problem: small world phenomenon
  • 33. Global predictors Katz (1953): 𝑊 𝑣𝑖, 𝑣𝑗 = 𝑘=1 ∞ 𝛽 𝑘 𝑎𝑖𝑗 (𝑘)  𝑎𝑖𝑗 : 1 if i and j are linked, 0 otherwise  𝑎𝑖𝑗 (𝑘) : number of walks with length k from i to j  𝛽: parameter, “probability of effectiveness of a single link”  Longer walks: lower effectiveness
  • 36. Global predictors SimRank (Jeh & Widom, 2002) “Objects that link to similar objects are similar themselves.” 𝑊 𝑢, 𝑣 = 𝑐 |𝑁(𝑢)| ∙ |𝑁(𝑣)| )𝑝∈𝑁(𝑢 )𝑞∈𝑁(𝑣 )𝑊(𝑝, 𝑞 Starting point: a node is maximally similar to itself: W(v, v) = 1
  • 37. Demo Predict Save predictions to file  import in e.g. Excel
  • 38. Evaluation Step 4: ‘How well does it work?’ How?  compare to ‘known good’ test network Four groups: Link Non-link Predicted True positive False positive Not predicted False negative True negative
  • 39. Evaluation Simply save results to text file: --output cache-evaluations Create chart:  Recall-precision  ROC
  • 40. Evaluation: recall-precision Precision: fraction of correct predictions Recall: fraction of correctly predicted links
  • 41. Evaluation: ROC False positive rate: Fraction of incorrectly predicted links True positive rate: fraction of correctly predicted links (= recall)
  • 42. Profiles A simple way to save and reuse the configuration of a complex prediction run (options, predictors, parameters…) Usage example: > linkpred network-file --profile profile.yml Format: YAML, see https://en.wikipedia.org/wiki/YAML
  • 43. Example profile predictors: - name: AdamicAdar displayname: Adamic/Adar - name: GraphDistance displayname: Graph distance parameters: weight: weight - name: SimRank displayname: SimRank (c=0.4) parameters: c: 0.4 - name: SimRank displayname: SimRank (c=0.8) parameters: c: 0.8 output: - cache-predictions - recall-precision
  • 45. About link prediction Link prediction is possible because link formation is not a purely random process Limitations:  Unaware of social and other circumstantial factors  Which predictor is ‘best’ for a concrete situation?  Trade-off between prediction accuracy and non-triviality
  • 46. About linkpred Relatively simple but powerful Limitations:  Not suitable for very large and/or dense networks  Does not incorporate more complex setups like predictor combinations, machine learning etc. All results can be exported for analysis in other software (cache-*) Open source: contributions welcome! 

Editor's Notes

  • #35: -Mensen die wat kennen van webtechnologie, kennen PR misschien als belangrijk algoritme achter de Google-zoekmachine We kiezen eerst een knooppunt als vertrekpunt (= root) Van daaruit kiezen we willekeurig een van de links die er vertrekken Vanuit nieuwe positie volgen we opnieuw een van de links, Enz. Met bepaalde kans keren we terug naar ons vertrekpunt Als we dit lang genoeg doen, kunnen we de kans bepalen dat we ons op gegeven moment bij een bepaald knooppunt bevinden
  • #36: - Elk knooppunt is geschaald naar die kans - We zien dat (volgens rooted PR) de kans op linkvorming tussen root en andere knooppunten afneemt naarmate we verder weggaan van root
  • #42: False positive rate = fallout = fn / (fn+tn)