A set of scripts for detecting and analysing the cases of sneaked references in the Crossref metadata.
Sneaked references are references registered in some scientometric platforms without being listed in the actual publications where they ought to be found.
The following scripts are included:
1_get_metadata.py- download metadata records from Crossref REST API2_get_pdfs.py- download corresponding PDFs from the landing pages3_method2_detect.py- detect sneaked references by searching for reference strings from the metadata records within the PDF text4_use_grobid.py- extract bibliographic references from PDFs using GROBID5_method1_use_last.py- detect sneaked references by comparing the metadata records with references extracted by GROBID6_compare_methods.py- compare and merge the results of two methods7_Graph.R- generate images and some additional statistics8_statistics.py- calculate overal statistics