Published October 27, 2022 | Version v1
Poster Open

DEVELOPMENT OF ANALYTICAL TOOLS FOR WORKING WITH LARGE TEXT ARCHIVES

  • 1. State scientific institution Ukrainian Institute of Scientific and Technical Expertise and Information

Description

Open science is based on access to scientific knowledge, scientific infrastructure, open scientific communication and open dialogue. One of its key elements is an open infrastructure, in particular, knowledge-based resources: scientific archives, platforms and repositories. In Ukraine, according to our estimates, 143 institutional repositories function in institutions of higher education and scientific institutions. The Ministry of Education and Culture of Ukraine, together with UkrISTEI, implements a large-scale project to develop the National Repository of Academic Texts (NRAT) https://nrat.ukrintei.ua.

Currently, the National Repository is filled with academic texts from the Fund for the State Registration of Research and Design Works and Dissertations, which has been operating in Ukraine since 1992. As of the end of October 2022, it contains more than 255,000 academic texts, including 123,000 R&D reports and 132,000 dissertations complete with abstracts. This database is constantly updated, both archival materials, as well as new ones, recently published and registered, are added to it.

In the short term, we are faced with the tasks of filling the NRAT with all types of academic texts from institutional repositories, developing and implementing analytical tools, providing users with information about the value and demand of academic texts, their implementation, conducting a comprehensive analysis of the scientific landscape of Ukraine, ensuring information integration with other domestic and foreign archives.

To work effectively with information, it’s necessary that the user has convenient analytical tools at his disposal that allow him to quickly process search queries, analyze text arrays, determine the chronology of changes in the researched field (changes in basic definitions, applied methods), outline new, gaining strength trends in directions of research, to identify "niches" and lacunae that have not yet been sufficiently studied. To solve one of these tasks, together with the Latvian National Library, with the support of the Ministry of Education and Science of Ukraine, we are implementing a project (No. 0121U113981, 0122U200102) designed to promote the introduction of new text analysis services in national digital archives. 

We have developed the "Antic" tool for searching of repetitions of text fragments in academic texts provided for comparison and carried out its comprehensive approbation. This tool has proven its functionality. At the same time, directions for its improvement are already clear. For ease of use and greater visibility of the obtained results, it is necessary to increase the speed, change the format of the report (it should display the binding of the found sources to the text provided for verification, contain color marking of repeated fragments, accompany the reference to the sources from the comparison database with appropriate metadata), add tools of individual customization taking into account the needs of the user, make the interaction interface more friendly, add a reference, instructional materials and tips for use.

Files

Chmyr 20221027-28 UA-EN.pdf

Files (537.9 kB)

Name Size Download all
md5:798177fd1120bb7d6f5a0a4a9f7512e1
537.9 kB Preview Download