SlideShare a Scribd company logo
Burcu Kolbay
Pedro Delicado
Arnau Prat Pérez
SYNTHETIC DATA
GENERATION USING
EXPONENTIAL RANDOM
GRAPH MODELING
¡  The need of the synthetic data
¡  Exponential Random Graph Modeling (In theory)
¡  Going through the example
¡  Network simulation
Contents
¡  Internet & Social Media
¡  Data Privacy Issues
¡  The need for testing process
The need of the synthetic data
¡  Log-linear models of the form:
​Pr⁠( 𝑋= 𝑥) =​exp​{​ 𝜃↑′ 𝑧( 𝑥)}/𝐾( 𝜃) 
¡  The problem is the normalizing constant
¡  Solution: log-linear  → logit.
¡  Consider conditional log-odds for a network x and a pair (i,j) of nodes:
§  ​​ 𝑋↓𝑖𝑗 ↓↑𝑐  : status of all pairs in x other than (i,j)
§  ​​ 𝑋↓𝑖𝑗 ↓↑+ : same network as x but ​ 𝑥↓𝑖𝑗 =1
§  ​​ 𝑋↓𝑖𝑗 ↓↑− : same network as x but ​ 𝑥↓𝑖𝑗 = 0
​ 𝑃(​ 𝑋↓𝑖𝑗 =1|​​ 𝑋↓𝑖𝑗 ↓↑𝑐 )/𝑃(​ 𝑋↓𝑖𝑗 =0|​​ 𝑋↓𝑖𝑗 ↓↑𝑐 ) =​exp​{​ 𝜃↑+ 𝑠(​​ 𝑋↓𝑖𝑗 ↓↑+ )}/exp​{​ 𝜃↑+ 𝑠(​​
𝑋↓𝑖𝑗 ↓↑− )} =​exp⁠(​ 𝜃↑+ [𝑠(​​ 𝑋↓𝑖𝑗 ↓↑+ )− 𝑠(​​ 𝑋↓𝑖𝑗 ↓↑− )]) 
​log⁠(​ 𝑃​​ 𝑋↓𝑖𝑗 =1⁠​​ 𝑋↓𝑖𝑗 ↓↑𝑐  /𝑃​​ 𝑋↓𝑖𝑗 =0⁠​​ 𝑋↓𝑖𝑗 ↓↑𝑐   ) =​ 𝜃↑+ [𝑠(​​ 𝑋↓𝑖𝑗 ↓↑+ )− 𝑠(​​ 𝑋↓𝑖𝑗 ↓↑− )]
Exponential random graph modelling (p*)
¡  «Tcnetworks» data : Inter-organizational relationship among 25
agencies within the the Indiana State Tobacco Control Program (2010).
¡  3 types of inter-organizational ties:
§  Frequency of contact
§  Level of collaboration
§  Whether each pair of agencies communicated with one another.
¡  The network data include:
§  a number of node characteristics (e.g., tob yrs, which records how long an agency
has been working in tobacco control),
§  edge characteristics,
§  a sociomatrix (TCdist) which contains the geographic distance between each pair
of agencies.
¡  Vertex attributes:
¡  Our vertex attributes are:
§  Agency_cat
§  Agency_lvl
§  Lead_agency
§  Tob_yrs
Going through the example
¡  3 types of organizations (local, state, and national), is made
up of 1 connected component that is fairly densely connected,
and there is some variability of centrality across the network
members.
Going through the example
¡  Start with base model:
¡  Then we include node attributes:
Going through the example
¡  Including dyadic predictors:
Going through the example
Going through the example
¡  Including relational terms:
¡  Including local structure predictors:
Going through the example
¡  We can check the goodness of fit our model. (e.g. With
minimum geodesic distance, edgewise shared partner, triad
census , degree etc.)
¡  We can check model diagnostics.
¡  An instance of the output for model diagnostics:
Going through an example
¡  Based on the model we can simulate new networks:
Network simulation
¡  We will use a social network data which includes several
number of attributes. (Linkedin)
¡  From different type of attributes we can enrich the
information we extract from the network.
¡  Based on these knowledge, we will be one step closer to
generate synthetic data based on the dependency among the
actors.
Furthermore
¡  A User's Guide to Network Analysis in R 1st ed. 2015 Edition
165-187.
¡  Newman, Mark. Networks: an introduction. OUP Oxford, 2010.
¡  Goodreau, Steven M. "Advances in exponential random graph
(p*) models applied to a large social network." Social
Networks 29.2 (2007): 231-248.
References
burcukolbay@gmail.com
burcu.kolbay@est.fib.upc.edu
Contact

More Related Content

What's hot (20)

PPTX
IR tutorial
Hussein Hazimeh
 
PPTX
Spatial databases
Neha Kulkarni
 
PDF
Streaming Weather Data from Web APIs to Jupyter through Kafka
Leo Salemann
 
PPTX
Algorithms for Query Processing and Optimization of Spatial Operations
Natasha Mandal
 
PDF
Workshop on Real-time & Stream Analytics IEEE BigData 2016
Sabri Skhiri
 
PPT
Improvement of Spatial Data Quality Using the Data Conflation
Beniamino Murgante
 
PPT
Iccsa stankuteha180611
Beniamino Murgante
 
PDF
Annotating Search Results from Web Databases
Mohit Sngg
 
PPTX
Detection of Related Semantic Datasets Based on Frequent Subgraph Mining
Mikel Emaldi Manrique
 
PDF
Nearest keyword set search in multi dimensional datasets
Shakas Technologies
 
PPTX
Design Patterns for Efficient Graph Algorithms in MapReduce__HadoopSummit2010
Yahoo Developer Network
 
PDF
Presentation for OGRS 2016 at Peruggia, Italy
Mayra Zurbaran
 
PDF
An Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
Enrico Palumbo
 
PDF
Spatio textual similarity join
IJDKP
 
PDF
Hotspot Analysis with QGIS - FOSS4G-IT 2017
Daniele Oxoli
 
PPTX
Term Paper Presentation
Shubham Singh
 
PDF
GI2016 ppt shi (big data analytics on the internet)
IGN Vorstand
 
PDF
A survey of web clustering engines
unyil96
 
DOCX
Salient object detection with higher order potentials and learning affinity
I3E Technologies
 
PPTX
Big Data? Big Issues: Degradation in Longitudinal Data and Implications for ...
mwe400
 
IR tutorial
Hussein Hazimeh
 
Spatial databases
Neha Kulkarni
 
Streaming Weather Data from Web APIs to Jupyter through Kafka
Leo Salemann
 
Algorithms for Query Processing and Optimization of Spatial Operations
Natasha Mandal
 
Workshop on Real-time & Stream Analytics IEEE BigData 2016
Sabri Skhiri
 
Improvement of Spatial Data Quality Using the Data Conflation
Beniamino Murgante
 
Iccsa stankuteha180611
Beniamino Murgante
 
Annotating Search Results from Web Databases
Mohit Sngg
 
Detection of Related Semantic Datasets Based on Frequent Subgraph Mining
Mikel Emaldi Manrique
 
Nearest keyword set search in multi dimensional datasets
Shakas Technologies
 
Design Patterns for Efficient Graph Algorithms in MapReduce__HadoopSummit2010
Yahoo Developer Network
 
Presentation for OGRS 2016 at Peruggia, Italy
Mayra Zurbaran
 
An Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
Enrico Palumbo
 
Spatio textual similarity join
IJDKP
 
Hotspot Analysis with QGIS - FOSS4G-IT 2017
Daniele Oxoli
 
Term Paper Presentation
Shubham Singh
 
GI2016 ppt shi (big data analytics on the internet)
IGN Vorstand
 
A survey of web clustering engines
unyil96
 
Salient object detection with higher order potentials and learning affinity
I3E Technologies
 
Big Data? Big Issues: Degradation in Longitudinal Data and Implications for ...
mwe400
 

Viewers also liked (12)

PDF
Identifiability in Dynamic Casual Networks
Graph-TA
 
PDF
The scarcity of crossing dependencies: a direct outcome of a specific constra...
Graph-TA
 
PDF
Polyglot Graph Databases using OCL as pivot
Graph-TA
 
PDF
Using Evolutionary Computing for Feature-driven Graph generation
Graph-TA
 
PDF
Computing on Event-sourced Graphs
Graph-TA
 
PDF
TDM: Masking, Subsetting and generating Synthetic Data
CA Technologies
 
PDF
Holistic Benchmarking of Big Linked Data: HOBBIT
Graph-TA
 
PDF
Modelling the Clustering Coefficient of a Random graph
Graph-TA
 
PDF
Benchmarking Versioning for Big Linked Data
Graph-TA
 
PDF
Reactive Databases for Big Data applications
Graph-TA
 
PDF
Graphalytics: A big data benchmark for graph-processing platforms
Graph-TA
 
PDF
Use of Graphs for Cloud Service Selection in Multi-Cloud Environments
Graph-TA
 
Identifiability in Dynamic Casual Networks
Graph-TA
 
The scarcity of crossing dependencies: a direct outcome of a specific constra...
Graph-TA
 
Polyglot Graph Databases using OCL as pivot
Graph-TA
 
Using Evolutionary Computing for Feature-driven Graph generation
Graph-TA
 
Computing on Event-sourced Graphs
Graph-TA
 
TDM: Masking, Subsetting and generating Synthetic Data
CA Technologies
 
Holistic Benchmarking of Big Linked Data: HOBBIT
Graph-TA
 
Modelling the Clustering Coefficient of a Random graph
Graph-TA
 
Benchmarking Versioning for Big Linked Data
Graph-TA
 
Reactive Databases for Big Data applications
Graph-TA
 
Graphalytics: A big data benchmark for graph-processing platforms
Graph-TA
 
Use of Graphs for Cloud Service Selection in Multi-Cloud Environments
Graph-TA
 
Ad

Similar to Synthetic Data Generation using exponential random Graph modeling (20)

PDF
PGQL: A Language for Graphs
Jean Ihm
 
DOCX
fds u1.docx
GaneshPawar819187
 
PDF
Data Science as a Career and Intro to R
Anshik Bansal
 
PDF
Graph Analyses with Python and NetworkX
Benjamin Bengfort
 
PDF
High-Performance Graph Analysis and Modeling
Nesreen K. Ahmed
 
PPTX
Data Structure Graph DMZ #DMZone
Doug Needham
 
PDF
Neural Nets Deconstructed
Paul Sterk
 
PPTX
Follow the money with graphs
Stanka Dalekova
 
PDF
論文紹介:Graph Pattern Entity Ranking Model for Knowledge Graph Completion
Naomi Shiraishi
 
PDF
Bridging data analysis and interactive visualization
Nacho Caballero
 
PDF
Project
Guilherme Torres
 
PPTX
How we use functional programming to find the bad guys @ Build Stuff LT and U...
Richard Minerich
 
PPTX
Towards an Incremental Schema-level Index for Distributed Linked Open Data G...
Till Blume
 
PPTX
RAMSES: Robust Analytic Models for Science at Extreme Scales
Ian Foster
 
PPTX
Azure Databricks for Data Scientists
Richard Garris
 
PPTX
Attentive Relational Networks for Mapping Images to Scene Graphs
Sangmin Woo
 
PDF
Tutorial On Database Management System
psathishcs
 
PDF
Graph Neural Networks for Recommendations
WQ Fan
 
PDF
Improving Machine Learning using Graph Algorithms
Neo4j
 
PDF
Data visualization
Moushmi Dasgupta
 
PGQL: A Language for Graphs
Jean Ihm
 
fds u1.docx
GaneshPawar819187
 
Data Science as a Career and Intro to R
Anshik Bansal
 
Graph Analyses with Python and NetworkX
Benjamin Bengfort
 
High-Performance Graph Analysis and Modeling
Nesreen K. Ahmed
 
Data Structure Graph DMZ #DMZone
Doug Needham
 
Neural Nets Deconstructed
Paul Sterk
 
Follow the money with graphs
Stanka Dalekova
 
論文紹介:Graph Pattern Entity Ranking Model for Knowledge Graph Completion
Naomi Shiraishi
 
Bridging data analysis and interactive visualization
Nacho Caballero
 
How we use functional programming to find the bad guys @ Build Stuff LT and U...
Richard Minerich
 
Towards an Incremental Schema-level Index for Distributed Linked Open Data G...
Till Blume
 
RAMSES: Robust Analytic Models for Science at Extreme Scales
Ian Foster
 
Azure Databricks for Data Scientists
Richard Garris
 
Attentive Relational Networks for Mapping Images to Scene Graphs
Sangmin Woo
 
Tutorial On Database Management System
psathishcs
 
Graph Neural Networks for Recommendations
WQ Fan
 
Improving Machine Learning using Graph Algorithms
Neo4j
 
Data visualization
Moushmi Dasgupta
 
Ad

More from Graph-TA (16)

PPTX
RDF Graph Data Management in Oracle Database and NoSQL Platforms
Graph-TA
 
PPTX
GRAPHITE — An Extensible Graph Traversal Framework for RDBMS
Graph-TA
 
PPTX
On the Discovery of Novel Drug-Target Interactions from Dense SubGraphs
Graph-TA
 
PDF
Graphalytics: A big data benchmark for graph processing platforms
Graph-TA
 
PDF
Autograph: an evolving lightweight graph tool
Graph-TA
 
PPTX
Understanding Graph Structure in Knowledge Bases
Graph-TA
 
PDF
Finding patterns of chronic disease and medication prescriptions from a large...
Graph-TA
 
PDF
Recent Updates on IBM System G — GraphBIG and Temporal Data
Graph-TA
 
PDF
Analysing the degree distribution of real graphs by means of several probabil...
Graph-TA
 
PPTX
SPIMBENCH: A Scalable, Schema-Aware Instance Matching Benchmark for the Seman...
Graph-TA
 
PPTX
Deriving an Emergent Relational Schema from RDF Data
Graph-TA
 
PDF
Managing RDF data with graph databases
Graph-TA
 
PDF
Graph Based Word Spotting Approach for Large Document Collections
Graph-TA
 
PDF
Use of graphs for political analysis
Graph-TA
 
PPTX
Graphium Chrysalis: Exploiting Graph Database
Graph-TA
 
PDF
Langford sequences through a product of labeled digraphs
Graph-TA
 
RDF Graph Data Management in Oracle Database and NoSQL Platforms
Graph-TA
 
GRAPHITE — An Extensible Graph Traversal Framework for RDBMS
Graph-TA
 
On the Discovery of Novel Drug-Target Interactions from Dense SubGraphs
Graph-TA
 
Graphalytics: A big data benchmark for graph processing platforms
Graph-TA
 
Autograph: an evolving lightweight graph tool
Graph-TA
 
Understanding Graph Structure in Knowledge Bases
Graph-TA
 
Finding patterns of chronic disease and medication prescriptions from a large...
Graph-TA
 
Recent Updates on IBM System G — GraphBIG and Temporal Data
Graph-TA
 
Analysing the degree distribution of real graphs by means of several probabil...
Graph-TA
 
SPIMBENCH: A Scalable, Schema-Aware Instance Matching Benchmark for the Seman...
Graph-TA
 
Deriving an Emergent Relational Schema from RDF Data
Graph-TA
 
Managing RDF data with graph databases
Graph-TA
 
Graph Based Word Spotting Approach for Large Document Collections
Graph-TA
 
Use of graphs for political analysis
Graph-TA
 
Graphium Chrysalis: Exploiting Graph Database
Graph-TA
 
Langford sequences through a product of labeled digraphs
Graph-TA
 

Recently uploaded (20)

PDF
Introduction to Productivity and Quality
মোঃ ফুরকান উদ্দিন জুয়েল
 
PDF
Zilliz Cloud Demo for performance and scale
Zilliz
 
PPTX
Thermal runway and thermal stability.pptx
godow93766
 
PPTX
Introduction to Neural Networks and Perceptron Learning Algorithm.pptx
Kayalvizhi A
 
PPTX
Hashing Introduction , hash functions and techniques
sailajam21
 
DOCX
CS-802 (A) BDH Lab manual IPS Academy Indore
thegodhimself05
 
PPTX
Green Building & Energy Conservation ppt
Sagar Sarangi
 
PDF
PORTFOLIO Golam Kibria Khan — architect with a passion for thoughtful design...
MasumKhan59
 
PDF
GTU Civil Engineering All Semester Syllabus.pdf
Vimal Bhojani
 
PDF
Set Relation Function Practice session 24.05.2025.pdf
DrStephenStrange4
 
PPTX
VITEEE 2026 Exam Details , Important Dates
SonaliSingh127098
 
PDF
Design Thinking basics for Engineers.pdf
CMR University
 
PPTX
artificial intelligence applications in Geomatics
NawrasShatnawi1
 
PDF
Basic_Concepts_in_Clinical_Biochemistry_2018كيمياء_عملي.pdf
AdelLoin
 
PDF
MAD Unit - 2 Activity and Fragment Management in Android (Diploma IT)
JappanMavani
 
PPTX
Shinkawa Proposal to meet Vibration API670.pptx
AchmadBashori2
 
PPTX
Solar Thermal Energy System Seminar.pptx
Gpc Purapuza
 
PPTX
Worm gear strength and wear calculation as per standard VB Bhandari Databook.
shahveer210504
 
PPTX
Evaluation and thermal analysis of shell and tube heat exchanger as per requi...
shahveer210504
 
PPTX
Introduction to Design of Machine Elements
PradeepKumarS27
 
Introduction to Productivity and Quality
মোঃ ফুরকান উদ্দিন জুয়েল
 
Zilliz Cloud Demo for performance and scale
Zilliz
 
Thermal runway and thermal stability.pptx
godow93766
 
Introduction to Neural Networks and Perceptron Learning Algorithm.pptx
Kayalvizhi A
 
Hashing Introduction , hash functions and techniques
sailajam21
 
CS-802 (A) BDH Lab manual IPS Academy Indore
thegodhimself05
 
Green Building & Energy Conservation ppt
Sagar Sarangi
 
PORTFOLIO Golam Kibria Khan — architect with a passion for thoughtful design...
MasumKhan59
 
GTU Civil Engineering All Semester Syllabus.pdf
Vimal Bhojani
 
Set Relation Function Practice session 24.05.2025.pdf
DrStephenStrange4
 
VITEEE 2026 Exam Details , Important Dates
SonaliSingh127098
 
Design Thinking basics for Engineers.pdf
CMR University
 
artificial intelligence applications in Geomatics
NawrasShatnawi1
 
Basic_Concepts_in_Clinical_Biochemistry_2018كيمياء_عملي.pdf
AdelLoin
 
MAD Unit - 2 Activity and Fragment Management in Android (Diploma IT)
JappanMavani
 
Shinkawa Proposal to meet Vibration API670.pptx
AchmadBashori2
 
Solar Thermal Energy System Seminar.pptx
Gpc Purapuza
 
Worm gear strength and wear calculation as per standard VB Bhandari Databook.
shahveer210504
 
Evaluation and thermal analysis of shell and tube heat exchanger as per requi...
shahveer210504
 
Introduction to Design of Machine Elements
PradeepKumarS27
 

Synthetic Data Generation using exponential random Graph modeling

  • 1. Burcu Kolbay Pedro Delicado Arnau Prat Pérez SYNTHETIC DATA GENERATION USING EXPONENTIAL RANDOM GRAPH MODELING
  • 2. ¡  The need of the synthetic data ¡  Exponential Random Graph Modeling (In theory) ¡  Going through the example ¡  Network simulation Contents
  • 3. ¡  Internet & Social Media ¡  Data Privacy Issues ¡  The need for testing process The need of the synthetic data
  • 4. ¡  Log-linear models of the form: ​Pr⁠( 𝑋= 𝑥) =​exp​{​ 𝜃↑′ 𝑧( 𝑥)}/𝐾( 𝜃)  ¡  The problem is the normalizing constant ¡  Solution: log-linear  → logit. ¡  Consider conditional log-odds for a network x and a pair (i,j) of nodes: §  ​​ 𝑋↓𝑖𝑗 ↓↑𝑐  : status of all pairs in x other than (i,j) §  ​​ 𝑋↓𝑖𝑗 ↓↑+ : same network as x but ​ 𝑥↓𝑖𝑗 =1 §  ​​ 𝑋↓𝑖𝑗 ↓↑− : same network as x but ​ 𝑥↓𝑖𝑗 = 0 ​ 𝑃(​ 𝑋↓𝑖𝑗 =1|​​ 𝑋↓𝑖𝑗 ↓↑𝑐 )/𝑃(​ 𝑋↓𝑖𝑗 =0|​​ 𝑋↓𝑖𝑗 ↓↑𝑐 ) =​exp​{​ 𝜃↑+ 𝑠(​​ 𝑋↓𝑖𝑗 ↓↑+ )}/exp​{​ 𝜃↑+ 𝑠(​​ 𝑋↓𝑖𝑗 ↓↑− )} =​exp⁠(​ 𝜃↑+ [𝑠(​​ 𝑋↓𝑖𝑗 ↓↑+ )− 𝑠(​​ 𝑋↓𝑖𝑗 ↓↑− )])  ​log⁠(​ 𝑃​​ 𝑋↓𝑖𝑗 =1⁠​​ 𝑋↓𝑖𝑗 ↓↑𝑐  /𝑃​​ 𝑋↓𝑖𝑗 =0⁠​​ 𝑋↓𝑖𝑗 ↓↑𝑐   ) =​ 𝜃↑+ [𝑠(​​ 𝑋↓𝑖𝑗 ↓↑+ )− 𝑠(​​ 𝑋↓𝑖𝑗 ↓↑− )] Exponential random graph modelling (p*)
  • 5. ¡  «Tcnetworks» data : Inter-organizational relationship among 25 agencies within the the Indiana State Tobacco Control Program (2010). ¡  3 types of inter-organizational ties: §  Frequency of contact §  Level of collaboration §  Whether each pair of agencies communicated with one another. ¡  The network data include: §  a number of node characteristics (e.g., tob yrs, which records how long an agency has been working in tobacco control), §  edge characteristics, §  a sociomatrix (TCdist) which contains the geographic distance between each pair of agencies. ¡  Vertex attributes: ¡  Our vertex attributes are: §  Agency_cat §  Agency_lvl §  Lead_agency §  Tob_yrs Going through the example
  • 6. ¡  3 types of organizations (local, state, and national), is made up of 1 connected component that is fairly densely connected, and there is some variability of centrality across the network members. Going through the example
  • 7. ¡  Start with base model: ¡  Then we include node attributes: Going through the example
  • 8. ¡  Including dyadic predictors: Going through the example
  • 9. Going through the example ¡  Including relational terms:
  • 10. ¡  Including local structure predictors: Going through the example
  • 11. ¡  We can check the goodness of fit our model. (e.g. With minimum geodesic distance, edgewise shared partner, triad census , degree etc.) ¡  We can check model diagnostics. ¡  An instance of the output for model diagnostics: Going through an example
  • 12. ¡  Based on the model we can simulate new networks: Network simulation
  • 13. ¡  We will use a social network data which includes several number of attributes. (Linkedin) ¡  From different type of attributes we can enrich the information we extract from the network. ¡  Based on these knowledge, we will be one step closer to generate synthetic data based on the dependency among the actors. Furthermore
  • 14. ¡  A User's Guide to Network Analysis in R 1st ed. 2015 Edition 165-187. ¡  Newman, Mark. Networks: an introduction. OUP Oxford, 2010. ¡  Goodreau, Steven M. "Advances in exponential random graph (p*) models applied to a large social network." Social Networks 29.2 (2007): 231-248. References