Published October 27, 2022 | Version v1
Poster Open

CONSTRUCTION OF A TOOL FOR THE DIGITAL ANALYSIS OF TEXTS

  • 1. State scientific institution Ukrainian Institute of Scientific and Technical Expertise and Information

Description

Methods for determining the similarity of texts are at the forefront of such fields of research as computational linguistics, literary studies, communication sciences, philosophy, and health sciences. Similarity determination is in demand in higher education institutions and academic communities. But unlike commercialized "anti-plagiarism" checking services, the integration of the similarity detection tool into large national text archives provides much wider opportunities for its use in terms of access to content and diversification of research purposes.

During the research in the framework of the joint Ukrainian-Latvian project* "Methods of text analysis and tools for determining similarities in large national text archives: on the example of the Latvian National Digital Library and the National Repository of Academic Texts of Ukraine", we developed our own text matching and analysis tool. It is built on an open-source distributed search and analytics engine written in Java that supports a large number of data types including text, numeric, geospatial, structured, and unstructured ElasticSearch.

The new tool, which received the working name "Antic", was tested by us on the full database of academic texts contained in the National Repository of Academic Texts of Ukraine. As a result of comparing the text downloaded for verification and the NRAT database, the user receives a report with a list of all found matches of text fragments, indicating the number of such matches and active hyperlinks to sources of information where such matches were found - R&D reports, abstracts and dissertations for obtaining a scientific degree.

Files

Kamyshyn Suhyі 20221027-28.pdf

Files (429.4 kB)

Name Size Download all
md5:cd1ab1662c71a98fe3f9fef901aabe39
429.4 kB Preview Download