Large-scale analysis of bibliometric
networks
Nees Jan van Eck
Centre for Science and Technology Studies (CWTS), Leiden University
International Conference on Data-driven Discovery:
When Data Science Meets Information Science
Beijing, China, June 20, 2016
Bibliographic databases: ‘Big data’
1
Web of Science Scopus
Journals 12,000 20,000
Publications 45 million 35 million
Citations 1 billion 0.9 billion
Bibliometric networks
2
Web of
Science
Scopus
Citation network
of pubs / authors / journals
Co-authorship network
of authors / organizations
Co-citation network
of pubs / authors / journals
Co-occurrence network
of keywords / terms
Bibliographic coupling network
of pubs / authors / journals
Bibliographic
database
Outline
• Software tools
• Network analysis techniques
• Analysis of data science
3
Software tools
4
Software tools
• VOSviewer (www.vosviewer.com)
– Tool for constructing and visualizing bibliometric networks
• CitNetExplorer (www.citnetexplorer.nl)
– Tool for visualizing and analyzing citation networks of
publications
• Both tools have been developed together
with my colleague Ludo Waltman 5
VOSviewer
6
VOSviewer: Overview
• Software tool for visualizing (bibliometric) networks
• Built-in support for popular bibliographic databases
• Text mining functionality
• Layout and clustering techniques
• Advanced visualization features:
– Smart labeling algorithm
– Overlay visualizations
– Density visualizations (‘heat map’)
• Users:
– Researchers
– Professional users (e.g., universities, libraries, funders,
publishers)
7
Map of university co-authorship
network
8
Map of journal citation network
9
CitNetExplorer
10
• Any type of bibliometric
network
• Co-authorship, direct citations,
co-citation, and bibliographic
coupling
• Time dimension is ignored
• Networks of at most ~10,000
nodes are supported
• Only citation networks of
publications
• Direct citation between
publications
• Time dimension is explicitly
considered
• Millions of publications are
supported
11
VOSviewer CitNetExplorer
Network
analysis
techniques
12
Network analysis techniques
13
Layout:
• Assigning the nodes in a network to
locations in a (usually 2d) space
(a.k.a. mapping)
• Visualization of similarities (VOS)
Clustering:
• Partitioning the nodes in a network
into a number of groups (a.k.a.
community detection)
• Weighted modularity
• Smart local moving algorithm
1414
Clustering can be seen as mapping
in a restricted space
1515
Clustering can be seen as mapping
in a restricted space
Unified approach to mapping and
clustering
Minimize
where
n: number of nodes in the network
m: total weight of all edges in the network
Aij: weight of edge between nodes i and j
ki: total weight of all edges of node i
16
 

ji
ij
ji
ijij
ji
n ddA
kk
m
xxQ 2
1
2
),,( 
Mapping
xi: vector denoting the location
of node i in a p-dimensional
space


p
k
jkikjiij xxxxd
1
2
)(
Clustering
xi: integer denoting the
community to which node i
belongs
: resolution parameter






ji
ji
ij
xx
xx
d
if1
if0

Smart local moving algorithm
17
Q = 0.4198
Q = 0.3791
Reduced
network
Local moving
heuristic in
subnetworks
Local moving heuristic
Original
network
Algorithmically constructed
classification system of science
• 17.8 million publications from the period 2000–
2015 indexed in Web of Science
• 282.4 million citation relations
• Classification system of 3 hierarchical levels:
– 27 broad disciplines
– 817 fields
– 4,113 subfields
18
Breakdown of scientific literature into
817 fields
19
Social sciences
and humanitiesBiomedical and
health sciences
Life and earth
sciences
Mathematics and
computer science
Physical
sciences and
engineering
Publications in scientometrics
subfield
20
Time-line map of highly cited
scientometrics publications
21
Analysis of
data science
22
What is data science?
• Empirical operationalization of data science based
on publications with ‘data’ in title or abstract
23
Wikipedia: “Data Science is an interdisciplinary field
about processes and systems to extract knowledge
or insights from data … which is a continuation of
some of the data analysis fields such as statistics,
data mining, and predictive analytics”
LCDS: “Data Science … deals with finding, analyzing
and validating complex patterns in data. Data
Science methods are indispensable for maintaining a
competitive edge in all disciplines in science”
Growth of data-driven research
24
0%
2%
4%
6%
8%
10%
12%
14%
16%
18%
20%
1990 1995 2000 2005 2010 2015
Percentageofpublications
% 'data' publications % 'theory' publications
Breakdown of scientific literature into
817 fields
25
Social sciences
and humanitiesBiomedical and
health sciences
Life and earth
sciences
Mathematics and
computer science
Physical
sciences and
engineering
Data-driven nature of different
scientific fields
26
Social sciences
and humanitiesBiomedical and
health sciences
Life and earth
sciences
Mathematics and
computer science
Physical
sciences and
engineering
% pub. with ‘data’ in title or abstract
Data-driven nature of different
scientific fields
27
artificial
intelligence
statisticsbioinformatics
neuroimaging pattern
recognition
astronomy
earth
water
climate
remote
sensing
nutrition
obesity
addiction
accident
analysis
% pub. with ‘data’ in title or abstract
Data science fields (at least 25% ‘data’
publications)
28
Social sciences
and humanitiesBiomedical and
health sciences
Life and earth
sciences
Mathematics and
computer science
Physical
sciences and
engineering
Term map of data science fields
29
China’s publication output in data
science fields
30
Social sciences
and humanitiesBiomedical and
health sciences
Life and earth
sciences
Mathematics and
computer science
Physical
sciences and
engineering
China’s publication output in data
science fields
31
artificial
intelligence
pattern
recognition
high
energy
earth
atmospheres
weather
remote
sensing
Chinese institutes with most publications
in data science fields (2011-2015)
• Chinese Academy of Sciences
• Peking University
• Tsinghua University
• China University of Geosciences
• Zhejiang University
• Nanjing University
• Shanghai Jiao Tong University
• University of Science and Technology of China
• Beijing Normal University
• University of Hong Kong
32
CAS publication output in data
science fields
33
earth
atmospheres
weather
remote
sensing
vegetation
astronomy
high energy
Term map based on CAS publications in
data science fields
34
CAS (Beijing Branch) publication
output in data science fields
35
astronomy
earth
atmospheres
weather
remote
sensing
vegetation
high energy
CAS (Shanghai Branch) publication
output in data science fields
36
bioinformatics
genetics
astronomy
nuclear
Do it yourself!
37
www.vosviewer.com www.citnetexplorer.nl
Thank you for your attention!
38

Large-scale analysis of bibliometric networks

  • 1.
    Large-scale analysis ofbibliometric networks Nees Jan van Eck Centre for Science and Technology Studies (CWTS), Leiden University International Conference on Data-driven Discovery: When Data Science Meets Information Science Beijing, China, June 20, 2016
  • 2.
    Bibliographic databases: ‘Bigdata’ 1 Web of Science Scopus Journals 12,000 20,000 Publications 45 million 35 million Citations 1 billion 0.9 billion
  • 3.
    Bibliometric networks 2 Web of Science Scopus Citationnetwork of pubs / authors / journals Co-authorship network of authors / organizations Co-citation network of pubs / authors / journals Co-occurrence network of keywords / terms Bibliographic coupling network of pubs / authors / journals Bibliographic database
  • 4.
    Outline • Software tools •Network analysis techniques • Analysis of data science 3
  • 5.
  • 6.
    Software tools • VOSviewer(www.vosviewer.com) – Tool for constructing and visualizing bibliometric networks • CitNetExplorer (www.citnetexplorer.nl) – Tool for visualizing and analyzing citation networks of publications • Both tools have been developed together with my colleague Ludo Waltman 5
  • 7.
  • 8.
    VOSviewer: Overview • Softwaretool for visualizing (bibliometric) networks • Built-in support for popular bibliographic databases • Text mining functionality • Layout and clustering techniques • Advanced visualization features: – Smart labeling algorithm – Overlay visualizations – Density visualizations (‘heat map’) • Users: – Researchers – Professional users (e.g., universities, libraries, funders, publishers) 7
  • 9.
    Map of universityco-authorship network 8
  • 10.
    Map of journalcitation network 9
  • 11.
  • 12.
    • Any typeof bibliometric network • Co-authorship, direct citations, co-citation, and bibliographic coupling • Time dimension is ignored • Networks of at most ~10,000 nodes are supported • Only citation networks of publications • Direct citation between publications • Time dimension is explicitly considered • Millions of publications are supported 11 VOSviewer CitNetExplorer
  • 13.
  • 14.
    Network analysis techniques 13 Layout: •Assigning the nodes in a network to locations in a (usually 2d) space (a.k.a. mapping) • Visualization of similarities (VOS) Clustering: • Partitioning the nodes in a network into a number of groups (a.k.a. community detection) • Weighted modularity • Smart local moving algorithm
  • 15.
    1414 Clustering can beseen as mapping in a restricted space
  • 16.
    1515 Clustering can beseen as mapping in a restricted space
  • 17.
    Unified approach tomapping and clustering Minimize where n: number of nodes in the network m: total weight of all edges in the network Aij: weight of edge between nodes i and j ki: total weight of all edges of node i 16    ji ij ji ijij ji n ddA kk m xxQ 2 1 2 ),,(  Mapping xi: vector denoting the location of node i in a p-dimensional space   p k jkikjiij xxxxd 1 2 )( Clustering xi: integer denoting the community to which node i belongs : resolution parameter       ji ji ij xx xx d if1 if0 
  • 18.
    Smart local movingalgorithm 17 Q = 0.4198 Q = 0.3791 Reduced network Local moving heuristic in subnetworks Local moving heuristic Original network
  • 19.
    Algorithmically constructed classification systemof science • 17.8 million publications from the period 2000– 2015 indexed in Web of Science • 282.4 million citation relations • Classification system of 3 hierarchical levels: – 27 broad disciplines – 817 fields – 4,113 subfields 18
  • 20.
    Breakdown of scientificliterature into 817 fields 19 Social sciences and humanitiesBiomedical and health sciences Life and earth sciences Mathematics and computer science Physical sciences and engineering
  • 21.
  • 22.
    Time-line map ofhighly cited scientometrics publications 21
  • 23.
  • 24.
    What is datascience? • Empirical operationalization of data science based on publications with ‘data’ in title or abstract 23 Wikipedia: “Data Science is an interdisciplinary field about processes and systems to extract knowledge or insights from data … which is a continuation of some of the data analysis fields such as statistics, data mining, and predictive analytics” LCDS: “Data Science … deals with finding, analyzing and validating complex patterns in data. Data Science methods are indispensable for maintaining a competitive edge in all disciplines in science”
  • 25.
    Growth of data-drivenresearch 24 0% 2% 4% 6% 8% 10% 12% 14% 16% 18% 20% 1990 1995 2000 2005 2010 2015 Percentageofpublications % 'data' publications % 'theory' publications
  • 26.
    Breakdown of scientificliterature into 817 fields 25 Social sciences and humanitiesBiomedical and health sciences Life and earth sciences Mathematics and computer science Physical sciences and engineering
  • 27.
    Data-driven nature ofdifferent scientific fields 26 Social sciences and humanitiesBiomedical and health sciences Life and earth sciences Mathematics and computer science Physical sciences and engineering % pub. with ‘data’ in title or abstract
  • 28.
    Data-driven nature ofdifferent scientific fields 27 artificial intelligence statisticsbioinformatics neuroimaging pattern recognition astronomy earth water climate remote sensing nutrition obesity addiction accident analysis % pub. with ‘data’ in title or abstract
  • 29.
    Data science fields(at least 25% ‘data’ publications) 28 Social sciences and humanitiesBiomedical and health sciences Life and earth sciences Mathematics and computer science Physical sciences and engineering
  • 30.
    Term map ofdata science fields 29
  • 31.
    China’s publication outputin data science fields 30 Social sciences and humanitiesBiomedical and health sciences Life and earth sciences Mathematics and computer science Physical sciences and engineering
  • 32.
    China’s publication outputin data science fields 31 artificial intelligence pattern recognition high energy earth atmospheres weather remote sensing
  • 33.
    Chinese institutes withmost publications in data science fields (2011-2015) • Chinese Academy of Sciences • Peking University • Tsinghua University • China University of Geosciences • Zhejiang University • Nanjing University • Shanghai Jiao Tong University • University of Science and Technology of China • Beijing Normal University • University of Hong Kong 32
  • 34.
    CAS publication outputin data science fields 33 earth atmospheres weather remote sensing vegetation astronomy high energy
  • 35.
    Term map basedon CAS publications in data science fields 34
  • 36.
    CAS (Beijing Branch)publication output in data science fields 35 astronomy earth atmospheres weather remote sensing vegetation high energy
  • 37.
    CAS (Shanghai Branch)publication output in data science fields 36 bioinformatics genetics astronomy nuclear
  • 38.
  • 39.
    Thank you foryour attention! 38