Skip to content

dtkaczyk/sneaked-references

Repository files navigation

Detecting sneaked references

A set of scripts for detecting and analysing the cases of sneaked references in the Crossref metadata.

Sneaked references are references registered in some scientometric platforms without being listed in the actual publications where they ought to be found.

The following scripts are included:

  • 1_get_metadata.py - download metadata records from Crossref REST API
  • 2_get_pdfs.py - download corresponding PDFs from the landing pages
  • 3_method2_detect.py - detect sneaked references by searching for reference strings from the metadata records within the PDF text
  • 4_use_grobid.py - extract bibliographic references from PDFs using GROBID
  • 5_method1_use_last.py - detect sneaked references by comparing the metadata records with references extracted by GROBID
  • 6_compare_methods.py - compare and merge the results of two methods
  • 7_Graph.R - generate images and some additional statistics
  • 8_statistics.py - calculate overal statistics

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published