Comparing bibliographic data sources
Ludo Waltman, Martijn Visser, Nees Jan van Eck
Centre for Science and Technology Studies (CWTS), Leiden University
Workshop on Open Citations
Bologna
September 3, 2018
Introduction
• Increasing number of alternatives (Google Scholar, Microsoft Academic,
Dimensions, Crossref, OpenCitations Corpus) to traditional bibliographic data
sources (Web of Science, Scopus)
• Some alternatives are more open than others
• How do the various data sources compare in terms of the completeness and
quality of their citation data?
1
Data sources
• Scopus
– May 2018
– Requires subscription
• Web of Science
– SCIE, SSCI, AHCI, CPCI
– June 2018
– Requires subscription
• Dimensions
– June 2018
– Openly available through web interface
• Crossref
– August 2017
– Openly available through API
2
Coverage of publications
3
All publications Publications with DOI
Publications with
unique DOI
Web of Science 40.06 100.0% 18.79 46.9% 18.77 46.9%
Scopus 44.88 100.0% 31.06 69.2% 30.64 68.3%
Dimensions 57.47 100.0% 55.09 95.9% 54.95 95.6%
Crossref 53.81 100.0% 53.81 100.0% 53.81 100.0%
• Publication counts in millions
• Time period 1996-2017
• Note that Crossref is incomplete in 2017
Coverage of publications: Dimensions vs. Scopus
4
Comparison of citation data
5
Scopus-WoS overlap: 460.0M
Only in Scopus: 24.9M
Only in WoS: 15.5M
Scopus-Dimensions overlap: 414.3M
Only in Scopus: 43.5M
Only in Dimensions: 17.9M
Scopus-Crossref overlap: 144.1M
Only in Scopus: 305.1M
Only in Crossref: 5.4M
In these pairwise comparisons of data sources, only
citation links between citing and cited publications
indexed in both data sources are considered
Causes of discrepancies between data sources
• Inaccuracies in references
• Inaccuracies in reference data
• Inaccuracies in citation matching
• Multiple versions of a publication
• Multiple records for a publication
• Citations being closed or not having been deposited
6
Example: Discrepancies between Scopus and
Dimensions
7
Example: Discrepancies between Scopus and
Dimensions
8
Example: Discrepancies between Scopus and Web of
Science
9
Group author and/or supplement
seem to cause problems in Web
of Science
Example: Discrepancies within Web of Science
10
September 20, 2017
November 1, 2017
November 8, 2017
Conclusions
• Substantial discrepancies between data sources
• Reasonably complete citation data in Dimensions
• Large gaps in citation data in Crossref, due to citations being closed or not
having been deposited
• Need for transparent high-quality citation matching algorithm
• Completeness and quality of other metadata?
11
Thank you for your attention!
12

Comparing bibliographic data sources

  • 1.
    Comparing bibliographic datasources Ludo Waltman, Martijn Visser, Nees Jan van Eck Centre for Science and Technology Studies (CWTS), Leiden University Workshop on Open Citations Bologna September 3, 2018
  • 2.
    Introduction • Increasing numberof alternatives (Google Scholar, Microsoft Academic, Dimensions, Crossref, OpenCitations Corpus) to traditional bibliographic data sources (Web of Science, Scopus) • Some alternatives are more open than others • How do the various data sources compare in terms of the completeness and quality of their citation data? 1
  • 3.
    Data sources • Scopus –May 2018 – Requires subscription • Web of Science – SCIE, SSCI, AHCI, CPCI – June 2018 – Requires subscription • Dimensions – June 2018 – Openly available through web interface • Crossref – August 2017 – Openly available through API 2
  • 4.
    Coverage of publications 3 Allpublications Publications with DOI Publications with unique DOI Web of Science 40.06 100.0% 18.79 46.9% 18.77 46.9% Scopus 44.88 100.0% 31.06 69.2% 30.64 68.3% Dimensions 57.47 100.0% 55.09 95.9% 54.95 95.6% Crossref 53.81 100.0% 53.81 100.0% 53.81 100.0% • Publication counts in millions • Time period 1996-2017 • Note that Crossref is incomplete in 2017
  • 5.
    Coverage of publications:Dimensions vs. Scopus 4
  • 6.
    Comparison of citationdata 5 Scopus-WoS overlap: 460.0M Only in Scopus: 24.9M Only in WoS: 15.5M Scopus-Dimensions overlap: 414.3M Only in Scopus: 43.5M Only in Dimensions: 17.9M Scopus-Crossref overlap: 144.1M Only in Scopus: 305.1M Only in Crossref: 5.4M In these pairwise comparisons of data sources, only citation links between citing and cited publications indexed in both data sources are considered
  • 7.
    Causes of discrepanciesbetween data sources • Inaccuracies in references • Inaccuracies in reference data • Inaccuracies in citation matching • Multiple versions of a publication • Multiple records for a publication • Citations being closed or not having been deposited 6
  • 8.
    Example: Discrepancies betweenScopus and Dimensions 7
  • 9.
    Example: Discrepancies betweenScopus and Dimensions 8
  • 10.
    Example: Discrepancies betweenScopus and Web of Science 9 Group author and/or supplement seem to cause problems in Web of Science
  • 11.
    Example: Discrepancies withinWeb of Science 10 September 20, 2017 November 1, 2017 November 8, 2017
  • 12.
    Conclusions • Substantial discrepanciesbetween data sources • Reasonably complete citation data in Dimensions • Large gaps in citation data in Crossref, due to citations being closed or not having been deposited • Need for transparent high-quality citation matching algorithm • Completeness and quality of other metadata? 11
  • 13.
    Thank you foryour attention! 12

Editor's Notes

  • #11 It is not certain why so many citation links are missing in WoS. Some references that are very similar to the ones above are linked in WoS. Probably it has to do with group author and supplement,