SlideShare a Scribd company logo
Tweet Along @dgleich


                                                             The Spectre of the Spectrum
                                                                                                An empirical study of the




                             ?
                                                                                                spectra of large networks


                                                                                                     David F. Gleich
                                  
                                                                                        Sandia National Laboratories

                                                                                                             SIAM CSE 2011
  Thanks to Ali Pinar, Jaideep Ray, Tammy Kolda,                                                            28 February 2011
  C. Seshadhri, Rich Lehoucq @ Sandia
  and                                                                                     Supported by Sandia’s John von Neumann
  Jure Leskovec and Michael Mahoney @ Stanford                                                postdoctoral fellowship and the DOE
  for helpful discussions.                                                                       Office of Science’s Graphs project.

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin
                       Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.
2/24/2011                                                      SIAM CSE 2011                                                               1/36
2/24/2011   SIAM CSE 2011   2/28
Tweet Along @dgleich



There’s information inside the spectra




      Words in dictionary definitions                                  Internet router network
       111k vertices, 2.7M edges                                      192k vetices, 1.2M edges
             These figures show the normalized Laplacian. Banerjee and Jost (2009) also noted such shapes in the spectra.
2/24/2011                                         SIAM CSE 2011                                                      3/28
Tweet Along @dgleich



Why are we interested in the spectra?
Modeling

Properties
  Moments of the adjacency

Anomalies

Regularities

Network Comparison
  Fay et al. 2010 – Weighted Spectral Density
  The network is as19971108 from Jure’s snap collect (a few thousand nodes) and we insert random connections from 50 nodes
2/24/2011                                            SIAM CSE 2011                                                    4/28
Tweet Along @dgleich



Matrices from graphs
Adjacency matrix                          Random walk matrix
                                           
 
      if                                  Modularity matrix
                                           
                                           

Laplacian matrix
                                          Not covered
                                          Signless Laplacian matrix
 
                                          Incidence matrix
                                           (It is incidentally discussed)

                                          Seidel matrix
Normalized Laplacian matrix
                                          Heat Kernel
 
 
                             Everything is undirected. Mostly connected components only too.
2/24/2011              SIAM CSE 2011                                                    5/28
Tweet Along @dgleich



Erdős–Rényi Semi-circles
Based on Wigner’s
  semi-circle law.

The eigenvalues of the




                                           Count
adjacency matrix for
n=1000, averaged
over 10 trials

Semi-circle with outlier
if average degree is
large enough.
                                                          Eigenvalue

                     Observed by Farkas and in the book “Network Alignment” edited by Brandes (Chapter 14)
2/24/2011                            SIAM CSE 2011                                                    6/28
Tweet Along @dgleich



Previous results
Farkas et al. : Significant deviation from the
  semi-circle law for the adjacency matrix

Mihail and Papadimitriou : Leading eigenvalues
  of the adjacency matrix obey a power-law
  based on the degree-sequence

Chung et al. : Normalized Laplacian still
  obeys a semi-circle law if min-degree large

Banerjee and Jost : Study of types of patterns
  that emerge in evolving graph models –
  explain many features of the spectra
2/24/2011                   SIAM CSE 2011                       7/28
Tweet Along @dgleich



In comparison to other empiric studies
We use “exact” computation of spectra,
 instead of approximation.

We study “all” of the standard matrices
 over a range of large networks.

Our “large” is bigger.

We look at a few different random graph
 models.



2/24/2011                  SIAM CSE 2011                  8/28
Tweet Along @dgleich



Data sources
 SNAP               Various                   100s-100,000s
 SNAP-p2p           Gnutella Network          5-60k, ~30 inst.
 SNAP-as-733        Autonomous Sys.           ~5,000, 733 inst.
 SNAP-caida         Router networks           ~20,000, ~125 inst.
 Pajek              Various                   100s-100,000s
 Models             Copying Model             1k-100k 9 inst. 324 gs
                    Pref. Attach              1k-100k 9 inst. 164 gs
                    Forest Fire               1k-100k 9 inst. 324 gs
 Mine               Various                   2k-500k
 Newman             Various
 Arenas             Various
 Porter             Facebook                  100 schools, 5k-60k
 IsoRank, Natalie   Protein-Protein           <10k , 4 graphs
                                                 Thanks to all who make data available
2/24/2011                     SIAM CSE 2011                                       9/28
Tweet Along @dgleich



Big graphs
Arxiv           86376             1035126                          Co-author
Dblp            93156             356290                           Co-author
Dictionary(*)   111982            2750576                          Word defns.
Internet(*)     124651            414428                           Routers
Itdk0304        190914            1215220                          Routers
p2p-gnu(*)      62561             295756                           Peer-to-peer
Patents(*)      230686            1109898                          Citations
Roads           126146            323900                           Roads
Wordnet(*)      75606             240036                           Word relation
web-nb.edu      325729            2994268                          Web



                         (*) denotes that this is a weakly connected component of a directed graph.
2/24/2011                  SIAM CSE 2011                                                     10/28
Tweet Along @dgleich



Models
Preferential Attachment
Start graph with a k-node clique. Add a new node and
  connect to k random nodes, chosen proportional to degree.

Copying model
Start graph with a k-node clique. Add a new node and pick a
  parent uniformly at random. Copy edges of parent and
  make an error with probability  

Forest Fire
Start graph with a k-node clique. Add a new node and pick a
  parent uniformly at random. Do a random “bfs’/”forest fire”
  and link to all nodes “burned”
2/24/2011                 SIAM CSE 2011                      11/28
Tweet Along @dgleich




                COMPUTING
                 SPECTRA OF
            LARGE NETWORKS




2/24/2011               SIAM CSE 2011                   12
Tweet Along @dgleich




                   Redsky, Hopper I, Hopper II, and a Cielo testbed. Details if time.
2/24/2011   SIAM CSE 2011                                                      13/28
Tweet Along @dgleich



Eigenvalues with ScaLAPACK
Mostly the same approach as in LAPACK

1. Reduce to tridiagonal form
   (most time consuming part)
2. Distribute tridiagonals to
    all processors
3. Each processor finds
   all eigenvalues
4. Each processor computes a
   subset of eigenvectors

I’m actually using the MRRR algorithm,
where steps 3 and 4 are better and faster
                       MRRR due to Parlett and Dhillon; implemented in ScaLAPACK by Christof Vomel.
2/24/2011                     SIAM CSE 2011                                                   14/28
Tweet Along @dgleich



Alternatives
Use ARPACK to get extrema

Use ARPACK to get interior around                 via the folded spectrum
 




                                                                      Large nearly
                                                                      repeated sets of
                                                                      eigenvalues will
                                                                      make this tricky.
                        Farkas et al. used this approach. Figure from somewhere on the web… sorry!
2/24/2011                    SIAM CSE 2011                                                   15/28
Tweet Along @dgleich



Adding MPI tasks vs. using threads
Most math libraries have threaded versions
   (Intel MKL, AMD ACML)
Is it better to use threads or MPI tasks?

It depends.

                                                              Cray libsci
                 Intel MKL
                                               Threads          Ranks             Time
Threads Ranks        Time-T   Time-E
                                               1                64                1412.5
1           36       1271.4   339.0
                                               4                16                1881.4
4           9        1058.1   456.6
                                               16               4                 Omitted.

                                   Normalized Laplacian for 36k-by-36k co-author graph of CondMat
2/24/2011                      SIAM CSE 2011                                                16/28
Tweet Along @dgleich



    Weak Parallel Scaling
                                    Time  
Time in hours




                                    Good strong
                                      scaling up to
                                      325,000 vertices

                                    Estimated time for
                                      500,000 nodes
                                      9 hours with
                                      925 nodes
                                      (7400 procs)
                 
    2/24/2011       SIAM CSE 2011                          17/28
Tweet Along @dgleich




            EXAMPLES




2/24/2011        SIAM CSE 2011                   18
Tweet Along @dgleich



A $8,000 matrix computation




                                                                           325729 nodes and 2994268 edges
            500 nodes and 4000 processors on Redsky for 5 hours x 2 for normalized Laplacian/adjacency matrix
2/24/2011                               SIAM CSE 2011                                                   19/28
Tweet Along @dgleich




            Yes!


                     These are cases where we have multiple instances of the same graph.
2/24/2011          SIAM CSE 2011                                                  20/28
Tweet Along @dgleich



Already known?




                                 Just the facebook spectra.
2/24/2011        SIAM CSE 2011                       21/28
Tweet Along @dgleich



Already known?




                  I soon realized I was searching for “spectre” instead of spectrum, oops.
2/24/2011        SIAM CSE 2011                                                      22/28
Tweet Along @dgleich



Spikes?                              Repeated rows
                                     Identical rows grow the null-space.

                                     Banerjee and Jost
Unit eigenvalue                      Motif doubling and joining small
                                     graphs will tend to cause repeated
                                     eigenvalues and null vectors.




                  Banerjee and Jost explained how evolving graphs should produce repeated eigenvalues
2/24/2011                      SIAM CSE 2011                                                   23/28
Tweet Along @dgleich



Copying model




            Obvious follow up here: does a random sample with the same degree distribution show the same thing?
2/24/2011                                SIAM CSE 2011                                                    24/28
Tweet Along @dgleich



Forest Fire models




2/24/2011       SIAM CSE 2011                 25/28
Tweet Along @dgleich



Preferential Attachment




            Semi-circle in log-space!


2/24/2011                SIAM CSE 2011                 26/28
Tweet Along @dgleich



Where is this going?
We can compute
spectra for large
networks if
needed.

Study relationship
with known power-
laws in spectra

Eigenvector
localization

Directed Laplacians
2/24/2011             SIAM CSE 2011                 27/28
Tweet Along @dgleich



Nullspaces of the adjacency matrix
 

So unit eigenvalues of the normalized Laplacian are null-
  vectors of the adjacency matrix.




2/24/2011                  SIAM CSE 2011                       28/28
Code will be available eventually. Image from good financial cents.
2/24/2011                               SIAM CSE 2011                             29 of <Total>

More Related Content

PPTX
Perifericos
USCO
 
PPTX
Herramientas web 2.0
Silvia95Juliana
 
PPT
Сергей Прохоров - Нижний Новгород руководитель проектов "Электронное правител...
Бизнес-инкубатор
 
PDF
20110522 systems of typed lambda_calculi_moskvin_lecture12
Computer Science Club
 
PPT
Apresentação qualificação gabrielmassote_v2
Gabriel Massote
 
DOCX
Pregunta esencial
dieguitoflow
 
PPS
Boeing 314b-camerafan
George Martin
 
PDF
Spectra of Large Network
David Gleich
 
Perifericos
USCO
 
Herramientas web 2.0
Silvia95Juliana
 
Сергей Прохоров - Нижний Новгород руководитель проектов "Электронное правител...
Бизнес-инкубатор
 
20110522 systems of typed lambda_calculi_moskvin_lecture12
Computer Science Club
 
Apresentação qualificação gabrielmassote_v2
Gabriel Massote
 
Pregunta esencial
dieguitoflow
 
Boeing 314b-camerafan
George Martin
 
Spectra of Large Network
David Gleich
 

Similar to The Spectre of the Spectra (20)

PDF
The spectre of the spectrum
David Gleich
 
PDF
On Algorithmic Problems Concerning Graphs of Higher Degree of Symmetry
GiselleginaGloria
 
PDF
Fast Katz and Commuters
David Gleich
 
PDF
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
Matthew Lease
 
PDF
On algorithmic problems concerning graphs of higher degree of symmetry
graphhoc
 
PDF
ON ALGORITHMIC PROBLEMS CONCERNING GRAPHS OF HIGHER DEGREE OF SYMMETRY
Fransiskeran
 
PPTX
Networkx & Gephi Tutorial #Pydata NYC
Gilad Lotan
 
PDF
SIAM Annual Meeting 2012: Streaming Graph Analytics for Massive Graphs
Jason Riedy
 
PDF
Harvard_University_-_Linear_Al
Ramil Jay Ureta
 
PDF
Harvard_University_-_Linear_Al
ramiljayureta
 
PDF
abookthatiwanttoshareonmediumformyproject
chinmaydeshpande231
 
PDF
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
Jason Riedy
 
PDF
Statistical_mechanics_of_complex_network.pdf
savadogomoumini562
 
PDF
Using spectral radius ratio for node degree
IJCNCJournal
 
PDF
Areejit Samal Randomizing metabolic networks
Areejit Samal
 
ODP
finding nobel prize window by PageRank
Yuji Fujita
 
PDF
Fast pair-wise and node-wise algorithms for commute times and Katz scores
David Gleich
 
PPTX
Complex Networks
Amir Masoud Abdol
 
PPT
Randomness conductors
wtyru1989
 
PDF
Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large ...
David Gleich
 
The spectre of the spectrum
David Gleich
 
On Algorithmic Problems Concerning Graphs of Higher Degree of Symmetry
GiselleginaGloria
 
Fast Katz and Commuters
David Gleich
 
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
Matthew Lease
 
On algorithmic problems concerning graphs of higher degree of symmetry
graphhoc
 
ON ALGORITHMIC PROBLEMS CONCERNING GRAPHS OF HIGHER DEGREE OF SYMMETRY
Fransiskeran
 
Networkx & Gephi Tutorial #Pydata NYC
Gilad Lotan
 
SIAM Annual Meeting 2012: Streaming Graph Analytics for Massive Graphs
Jason Riedy
 
Harvard_University_-_Linear_Al
Ramil Jay Ureta
 
Harvard_University_-_Linear_Al
ramiljayureta
 
abookthatiwanttoshareonmediumformyproject
chinmaydeshpande231
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
Jason Riedy
 
Statistical_mechanics_of_complex_network.pdf
savadogomoumini562
 
Using spectral radius ratio for node degree
IJCNCJournal
 
Areejit Samal Randomizing metabolic networks
Areejit Samal
 
finding nobel prize window by PageRank
Yuji Fujita
 
Fast pair-wise and node-wise algorithms for commute times and Katz scores
David Gleich
 
Complex Networks
Amir Masoud Abdol
 
Randomness conductors
wtyru1989
 
Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large ...
David Gleich
 
Ad

More from David Gleich (20)

PDF
Engineering Data Science Objectives for Social Network Analysis
David Gleich
 
PDF
Correlation clustering and community detection in graphs and networks
David Gleich
 
PDF
Spectral clustering with motifs and higher-order structures
David Gleich
 
PDF
Higher-order organization of complex networks
David Gleich
 
PDF
Spacey random walks and higher-order data analysis
David Gleich
 
PDF
Non-exhaustive, Overlapping K-means
David Gleich
 
PDF
Using Local Spectral Methods to Robustify Graph-Based Learning
David Gleich
 
PDF
Spacey random walks and higher order Markov chains
David Gleich
 
PDF
Localized methods in graph mining
David Gleich
 
PDF
PageRank Centrality of dynamic graph structures
David Gleich
 
PDF
Iterative methods with special structures
David Gleich
 
PDF
Big data matrix factorizations and Overlapping community detection in graphs
David Gleich
 
PDF
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
David Gleich
 
PDF
Localized methods for diffusions in large graphs
David Gleich
 
PDF
Anti-differentiating Approximation Algorithms: PageRank and MinCut
David Gleich
 
PDF
Fast relaxation methods for the matrix exponential
David Gleich
 
PDF
Fast matrix primitives for ranking, link-prediction and more
David Gleich
 
PDF
Gaps between the theory and practice of large-scale matrix-based network comp...
David Gleich
 
PDF
MapReduce Tall-and-skinny QR and applications
David Gleich
 
PDF
Recommendation and graph algorithms in Hadoop and SQL
David Gleich
 
Engineering Data Science Objectives for Social Network Analysis
David Gleich
 
Correlation clustering and community detection in graphs and networks
David Gleich
 
Spectral clustering with motifs and higher-order structures
David Gleich
 
Higher-order organization of complex networks
David Gleich
 
Spacey random walks and higher-order data analysis
David Gleich
 
Non-exhaustive, Overlapping K-means
David Gleich
 
Using Local Spectral Methods to Robustify Graph-Based Learning
David Gleich
 
Spacey random walks and higher order Markov chains
David Gleich
 
Localized methods in graph mining
David Gleich
 
PageRank Centrality of dynamic graph structures
David Gleich
 
Iterative methods with special structures
David Gleich
 
Big data matrix factorizations and Overlapping community detection in graphs
David Gleich
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
David Gleich
 
Localized methods for diffusions in large graphs
David Gleich
 
Anti-differentiating Approximation Algorithms: PageRank and MinCut
David Gleich
 
Fast relaxation methods for the matrix exponential
David Gleich
 
Fast matrix primitives for ranking, link-prediction and more
David Gleich
 
Gaps between the theory and practice of large-scale matrix-based network comp...
David Gleich
 
MapReduce Tall-and-skinny QR and applications
David Gleich
 
Recommendation and graph algorithms in Hadoop and SQL
David Gleich
 
Ad

Recently uploaded (20)

PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
PDF
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PDF
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
PDF
Doc9.....................................
SofiaCollazos
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
Doc9.....................................
SofiaCollazos
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 

The Spectre of the Spectra

  • 1. Tweet Along @dgleich The Spectre of the Spectrum An empirical study of the ? spectra of large networks David F. Gleich   Sandia National Laboratories SIAM CSE 2011 Thanks to Ali Pinar, Jaideep Ray, Tammy Kolda, 28 February 2011 C. Seshadhri, Rich Lehoucq @ Sandia and Supported by Sandia’s John von Neumann Jure Leskovec and Michael Mahoney @ Stanford postdoctoral fellowship and the DOE for helpful discussions. Office of Science’s Graphs project. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. 2/24/2011 SIAM CSE 2011 1/36
  • 2. 2/24/2011 SIAM CSE 2011 2/28
  • 3. Tweet Along @dgleich There’s information inside the spectra Words in dictionary definitions Internet router network 111k vertices, 2.7M edges 192k vetices, 1.2M edges These figures show the normalized Laplacian. Banerjee and Jost (2009) also noted such shapes in the spectra. 2/24/2011 SIAM CSE 2011 3/28
  • 4. Tweet Along @dgleich Why are we interested in the spectra? Modeling Properties Moments of the adjacency Anomalies Regularities Network Comparison Fay et al. 2010 – Weighted Spectral Density The network is as19971108 from Jure’s snap collect (a few thousand nodes) and we insert random connections from 50 nodes 2/24/2011 SIAM CSE 2011 4/28
  • 5. Tweet Along @dgleich Matrices from graphs Adjacency matrix Random walk matrix       if   Modularity matrix       Laplacian matrix   Not covered Signless Laplacian matrix   Incidence matrix   (It is incidentally discussed) Seidel matrix Normalized Laplacian matrix Heat Kernel     Everything is undirected. Mostly connected components only too. 2/24/2011 SIAM CSE 2011 5/28
  • 6. Tweet Along @dgleich Erdős–RĂŠnyi Semi-circles Based on Wigner’s semi-circle law. The eigenvalues of the Count adjacency matrix for n=1000, averaged over 10 trials Semi-circle with outlier if average degree is large enough. Eigenvalue Observed by Farkas and in the book “Network Alignment” edited by Brandes (Chapter 14) 2/24/2011 SIAM CSE 2011 6/28
  • 7. Tweet Along @dgleich Previous results Farkas et al. : Significant deviation from the semi-circle law for the adjacency matrix Mihail and Papadimitriou : Leading eigenvalues of the adjacency matrix obey a power-law based on the degree-sequence Chung et al. : Normalized Laplacian still obeys a semi-circle law if min-degree large Banerjee and Jost : Study of types of patterns that emerge in evolving graph models – explain many features of the spectra 2/24/2011 SIAM CSE 2011 7/28
  • 8. Tweet Along @dgleich In comparison to other empiric studies We use “exact” computation of spectra, instead of approximation. We study “all” of the standard matrices over a range of large networks. Our “large” is bigger. We look at a few different random graph models. 2/24/2011 SIAM CSE 2011 8/28
  • 9. Tweet Along @dgleich Data sources SNAP Various 100s-100,000s SNAP-p2p Gnutella Network 5-60k, ~30 inst. SNAP-as-733 Autonomous Sys. ~5,000, 733 inst. SNAP-caida Router networks ~20,000, ~125 inst. Pajek Various 100s-100,000s Models Copying Model 1k-100k 9 inst. 324 gs Pref. Attach 1k-100k 9 inst. 164 gs Forest Fire 1k-100k 9 inst. 324 gs Mine Various 2k-500k Newman Various Arenas Various Porter Facebook 100 schools, 5k-60k IsoRank, Natalie Protein-Protein <10k , 4 graphs Thanks to all who make data available 2/24/2011 SIAM CSE 2011 9/28
  • 10. Tweet Along @dgleich Big graphs Arxiv 86376 1035126 Co-author Dblp 93156 356290 Co-author Dictionary(*) 111982 2750576 Word defns. Internet(*) 124651 414428 Routers Itdk0304 190914 1215220 Routers p2p-gnu(*) 62561 295756 Peer-to-peer Patents(*) 230686 1109898 Citations Roads 126146 323900 Roads Wordnet(*) 75606 240036 Word relation web-nb.edu 325729 2994268 Web (*) denotes that this is a weakly connected component of a directed graph. 2/24/2011 SIAM CSE 2011 10/28
  • 11. Tweet Along @dgleich Models Preferential Attachment Start graph with a k-node clique. Add a new node and connect to k random nodes, chosen proportional to degree. Copying model Start graph with a k-node clique. Add a new node and pick a parent uniformly at random. Copy edges of parent and make an error with probability   Forest Fire Start graph with a k-node clique. Add a new node and pick a parent uniformly at random. Do a random “bfs’/”forest fire” and link to all nodes “burned” 2/24/2011 SIAM CSE 2011 11/28
  • 12. Tweet Along @dgleich COMPUTING SPECTRA OF LARGE NETWORKS 2/24/2011 SIAM CSE 2011 12
  • 13. Tweet Along @dgleich Redsky, Hopper I, Hopper II, and a Cielo testbed. Details if time. 2/24/2011 SIAM CSE 2011 13/28
  • 14. Tweet Along @dgleich Eigenvalues with ScaLAPACK Mostly the same approach as in LAPACK 1. Reduce to tridiagonal form (most time consuming part) 2. Distribute tridiagonals to all processors 3. Each processor finds all eigenvalues 4. Each processor computes a subset of eigenvectors I’m actually using the MRRR algorithm, where steps 3 and 4 are better and faster MRRR due to Parlett and Dhillon; implemented in ScaLAPACK by Christof Vomel. 2/24/2011 SIAM CSE 2011 14/28
  • 15. Tweet Along @dgleich Alternatives Use ARPACK to get extrema Use ARPACK to get interior around   via the folded spectrum   Large nearly repeated sets of eigenvalues will make this tricky. Farkas et al. used this approach. Figure from somewhere on the web… sorry! 2/24/2011 SIAM CSE 2011 15/28
  • 16. Tweet Along @dgleich Adding MPI tasks vs. using threads Most math libraries have threaded versions (Intel MKL, AMD ACML) Is it better to use threads or MPI tasks? It depends. Cray libsci Intel MKL Threads Ranks Time Threads Ranks Time-T Time-E 1 64 1412.5 1 36 1271.4 339.0 4 16 1881.4 4 9 1058.1 456.6 16 4 Omitted. Normalized Laplacian for 36k-by-36k co-author graph of CondMat 2/24/2011 SIAM CSE 2011 16/28
  • 17. Tweet Along @dgleich Weak Parallel Scaling Time   Time in hours Good strong scaling up to 325,000 vertices Estimated time for 500,000 nodes 9 hours with 925 nodes (7400 procs)   2/24/2011 SIAM CSE 2011 17/28
  • 18. Tweet Along @dgleich EXAMPLES 2/24/2011 SIAM CSE 2011 18
  • 19. Tweet Along @dgleich A $8,000 matrix computation 325729 nodes and 2994268 edges 500 nodes and 4000 processors on Redsky for 5 hours x 2 for normalized Laplacian/adjacency matrix 2/24/2011 SIAM CSE 2011 19/28
  • 20. Tweet Along @dgleich Yes! These are cases where we have multiple instances of the same graph. 2/24/2011 SIAM CSE 2011 20/28
  • 21. Tweet Along @dgleich Already known? Just the facebook spectra. 2/24/2011 SIAM CSE 2011 21/28
  • 22. Tweet Along @dgleich Already known? I soon realized I was searching for “spectre” instead of spectrum, oops. 2/24/2011 SIAM CSE 2011 22/28
  • 23. Tweet Along @dgleich Spikes? Repeated rows Identical rows grow the null-space. Banerjee and Jost Unit eigenvalue Motif doubling and joining small   graphs will tend to cause repeated eigenvalues and null vectors. Banerjee and Jost explained how evolving graphs should produce repeated eigenvalues 2/24/2011 SIAM CSE 2011 23/28
  • 24. Tweet Along @dgleich Copying model Obvious follow up here: does a random sample with the same degree distribution show the same thing? 2/24/2011 SIAM CSE 2011 24/28
  • 25. Tweet Along @dgleich Forest Fire models 2/24/2011 SIAM CSE 2011 25/28
  • 26. Tweet Along @dgleich Preferential Attachment Semi-circle in log-space! 2/24/2011 SIAM CSE 2011 26/28
  • 27. Tweet Along @dgleich Where is this going? We can compute spectra for large networks if needed. Study relationship with known power- laws in spectra Eigenvector localization Directed Laplacians 2/24/2011 SIAM CSE 2011 27/28
  • 28. Tweet Along @dgleich Nullspaces of the adjacency matrix   So unit eigenvalues of the normalized Laplacian are null- vectors of the adjacency matrix. 2/24/2011 SIAM CSE 2011 28/28
  • 29. Code will be available eventually. Image from good financial cents. 2/24/2011 SIAM CSE 2011 29 of <Total>