Dominika Tkaczyk

Dominika Tkaczyk · 2025-08-22T11:57:57.389Z

We are hiring a remote Program Technical Lead at Crossref to help shape the future of open infrastructure for global scholarly communication. This role will lead technical work within one of our programs - guiding architecture, coordinating between different technical functions, and supporting our shift from a monolith to a network of interconnected modules. If you're passionate about open infrastructure and ready to drive meaningful change at scale, we'd love to hear from you! https://lnkd.in/eWBWk2gt

Dublin, County Dublin, Ireland
572 followers 500+ connections

View mutual connections with Dominika

Welcome back

Email or phone

Password

Forgot password?

or

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

or

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

Join to view profile

Crossref

Systems Research Institute, Polish Academy of Sciences

Activity

And we find ourselves at the final session of #Crossref2025 with a fascinating panel on "Research Nexus in the real world: What is the impact and…

And we find ourselves at the final session of #Crossref2025 with a fascinating panel on "Research Nexus in the real world: What is the impact and…

Liked by Dominika Tkaczyk
Why did Crossref need a dedicated Data Science team? Dominika Tkaczyk, Director of Technology, explains why and gives the team's mission as "The…

Why did Crossref need a dedicated Data Science team? Dominika Tkaczyk, Director of Technology, explains why and gives the team's mission as "The…

Liked by Dominika Tkaczyk
I’m thrilled to be joining Comarch this November as Chief AI Officer. We are living in extraordinary times, at the cusp of the greatest…

I’m thrilled to be joining Comarch this November as Chief AI Officer. We are living in extraordinary times, at the cusp of the greatest…

Liked by Dominika Tkaczyk

Join now to see all activity

Experience

Crossref

Dublin, County Dublin, Ireland
-
-
-

Dublin, Ireland
-

Dublin, Ireland
-

ICM, University of Warsaw
-

ICM, Networking Group, Warszawa

Education

Systems Research Institute, Polish Academy of Sciences

-

2012 - 2016
-

2002 - 2007

Licenses & Certifications

Golden Award Scandium 2016

Codility

Issued Apr 2016

Credential ID certDUB8BY-7TPUM26W5TY8DXVU

See credential
Data Analysis and Statistical Inference

Coursera

Issued May 2015

See credential
Statistical Inference

Coursera

Issued Feb 2015

See credential
Mining Massive Datasets

Coursera

Issued Dec 2014

See credential
Functional Programming Principles in Scala

Coursera

Issued Nov 2014

See credential
R Programming

Coursera

Issued Sep 2014

See credential
Machine Learning

Coursera

Issued Dec 2013

See credential

Publications

CERMINE: automatic extraction of structured metadata from scientific literature

International Journal on Document Analysis and Recognition 2015
CERMINE is a comprehensive open-source system for extracting structured metadata from scientific articles in a born-digital form. The system is based on a modular workflow, whose loosely coupled architecture allows for individual component evaluation and adjustment, enables effortless improvements and replacements of independent parts of the algorithm and facilitates future architecture expanding. The implementations of most steps are based on supervised and unsupervised machine learning…

CERMINE is a comprehensive open-source system for extracting structured metadata from scientific articles in a born-digital form. The system is based on a modular workflow, whose loosely coupled architecture allows for individual component evaluation and adjustment, enables effortless improvements and replacements of independent parts of the algorithm and facilitates future architecture expanding. The implementations of most steps are based on supervised and unsupervised machine learning techniques, which simplifies the procedure of adapting the system to new document layouts and styles. The evaluation of the extraction workflow carried out with the use of a large dataset showed good performance for most metadata types, with the average F score of 77.5 %. CERMINE system is available under an open-source licence and can be accessed at http://cermine.ceon.pl. In this paper, we outline the overall workflow architecture and provide details about individual steps implementations. We also thoroughly compare CERMINE to similar solutions, describe evaluation methodology and finally report its results.

Other authors
See publication
Extracting Contextual Information from Scientific Literature Using CERMINE System

Semantic Web Evaluation Challenges 2015
Other authors
See publication
Structured Affiliations Extraction from Scientific Literature

D-Lib Magazine 2015
Other authors
See publication
CERMINE — automatic extraction of metadata and references from scientific literature

11th IAPR International Workshop on Document Analysis Systems 2014
CERMINE is a comprehensive open source system for extracting metadata and parsed bibliographic references from scientific articles in born-digital form. The system is based on a modular workflow, whose architecture allows for single step training and evaluation, enables effortless modifications and replacements of individual components and simplifies further architecture expanding. The implementations of most steps are based on supervised and unsupervised machine-learning techniques, which…

CERMINE is a comprehensive open source system for extracting metadata and parsed bibliographic references from scientific articles in born-digital form. The system is based on a modular workflow, whose architecture allows for single step training and evaluation, enables effortless modifications and replacements of individual components and simplifies further architecture expanding. The implementations of most steps are based on supervised and unsupervised machine-learning techniques, which simplifies the process of adjusting the system to
new document layouts. The paper describes the overall workflow architecture, provides details about individual implementations and reports evaluation methodology and results. CERMINE service is available at http://cermine.ceon.pl.

Other authors
GROTOAP2 — The Methodology of Creating a Large Ground Truth Dataset of Scientific Articles

D-Lib Magazine 2014
Scientific literature analysis improves knowledge propagation and plays a key role in understanding and assessment of scholarly communication in scientific world. In recent years many tools and services for analysing the content of scientific articles have been developed. One of the most important tasks in this research area is understanding the roles of different parts of the document. It is impossible to build effective solutions for problems related to document fragments classification and…

Scientific literature analysis improves knowledge propagation and plays a key role in understanding and assessment of scholarly communication in scientific world. In recent years many tools and services for analysing the content of scientific articles have been developed. One of the most important tasks in this research area is understanding the roles of different parts of the document. It is impossible to build effective solutions for problems related to document fragments classification and evaluate their performance without a reliable test set, that contains both input documents and the expected results of classification. In this paper we present GROTOAP2 — a large dataset of ground truth files containing labelled fragments of scientific articles in PDF format, useful for training and evaluation of document content analysis-related solutions. GROTOAP2 was successfully used for training CERMINE — our system for extracting metadata and content from scientific articles. The dataset is based on articles from PubMed Central Open Access Subset. GROTOAP2 is published under Open Access license. The semi-automatic method used to construct GROTOAP2 is scalable and can be adjusted for building large datasets from other data sources. The article presents the content of GROTOAP2, describes the entire creation process and reports the evaluation methodology and results.

Other authors
See publication
Large Scale Citation Matching Using Apache Hadoop

Research and Advanced Technology for Digital Libraries, volume 8092 of Lecture Notes in Computer Science, Springer Berlin Heidelberg 2013
During the process of citation matching links from bibliography entries to referenced publications are created. Such links are indicators of topical similarity between linked texts, are used in assessing the impact of the referenced document and improve navigation in the user interfaces of digital libraries. In this paper we present a citation matching method and show how to scale it up to handle great amounts of data using appropriate indexing and a MapReduce paradigm in the Hadoop environment.

Other authors
See publication
Methodology for evaluating citation parsing and matching

Intelligent Tools for Building a Scientific Information Platform, volume 467 of Studies in Computational Intelligence, Springer Berlin Heidelberg 2013
Bibliographic references between scholarly publications contain valuable information for researchers and developers involved with digital repositories. They are indicators of topical similarity between linked texts, impact of the referenced document, and improve navigation in user interfaces of digital libraries. Consequently, several approaches to extraction, parsing and resolving said references have been proposed to date. In this paper we develop a methodology for evaluating parsing and…

Bibliographic references between scholarly publications contain valuable information for researchers and developers involved with digital repositories. They are indicators of topical similarity between linked texts, impact of the referenced document, and improve navigation in user interfaces of digital libraries. Consequently, several approaches to extraction, parsing and resolving said references have been proposed to date. In this paper we develop a methodology for evaluating parsing and matching algorithms and choosing the most appropriate one for a document collection at hand. We apply the methodology for evaluating reference parsing and matching module of the YADDA2 software platform.

Other authors
See publication
A Modular Metadata Extraction System for Born-Digital Articles

10th IAPR International Workshop on Document Analysis Systems 2012
We present a comprehensive system for extracting metadata from scholarly articles. In our approach the entire document is inspected, including headers and footers of all the pages as well as bibliographic references. The system is based on a modular workflow which allows for evaluation, unit testing and replacement of individual components. The workflow is optimized towards processing of born-digital documents, but may accept scanned document images as well. The machine-learning approaches we…

We present a comprehensive system for extracting metadata from scholarly articles. In our approach the entire document is inspected, including headers and footers of all the pages as well as bibliographic references. The system is based on a modular workflow which allows for evaluation, unit testing and replacement of individual components. The workflow is optimized towards processing of born-digital documents, but may accept scanned document images as well. The machine-learning approaches we have chosen for solving individual tasks increase the ability to adapt to new document layouts and formats. The evaluation tests we have performed showed good results of the individual implementations and the entire metadata extraction process.

Other authors
See publication
GROTOAP: ground truth for open access publications

JCDL '12 Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries 2012
The field of digital document content analysis includes many important tasks, for example page segmentation or zone classification. It is impossible to build effective solutions for such problems and evaluate their performance without a reliable test set, that contains both input documents and expected results of segmentation and classification. In this paper we present GROTOAP --- a test set useful for training and performance evaluation of page segmentation and zone classification tasks. The…

The field of digital document content analysis includes many important tasks, for example page segmentation or zone classification. It is impossible to build effective solutions for such problems and evaluate their performance without a reliable test set, that contains both input documents and expected results of segmentation and classification. In this paper we present GROTOAP --- a test set useful for training and performance evaluation of page segmentation and zone classification tasks. The test set contains input articles in a digital form and corresponding ground truth files. All input documents included in the test set have been selected from DOAJ database, which indexes articles published under CC-BY license. The whole test set is available under the same license.

Other authors
See publication
Workflow of Metadata Extraction from Retro-Born Digital Documents

Towards a Digital Mathematics Library. Bertinoro, Italy, July 20-21st, 2011 2011
In this work-in-progress report we propose a workflow for metadata extraction from articles in a digital form. We decompose the problem into clearly defined sub-tasks and outline possible implementations of the sub-tasks. We report the progress of implementation and tests, and state future work.

Other authors
See publication

Honors & Awards

ESWC 2015 SemPub Best Performing Approach Award

Semantic Publishing Challenge at 12th Extended Semantic Web Conference

May 2015

CERMINE, the tool for mining scientific publications, won the best performing approach award at Semantic Publishing Challenge hosted by the 12th Extended Semantic Web Conference (http://2015.eswc-conferences.org/)
DAS 2014 Best Student Paper Award

11th IAPR International Workshop on Document Analysis Systems

Apr 2014

The paper entitled "CERMINE - automatic extraction of metadata and references from scientific literature" won the Best Student Paper Award at Document Analysis Systems conference (http://das2014.sciencesconf.org/resource/page/id/27)

Languages

Polish

Native or bilingual proficiency
English

Professional working proficiency

More activity by Dominika

Another new open dataset just dropped! Last time it was affiliations, now it's GRANTS! Here are over 250,000 Crossref grant<>publication matches for…

Another new open dataset just dropped! Last time it was affiliations, now it's GRANTS! Here are over 250,000 Crossref grant<>publication matches for…

Liked by Dominika Tkaczyk
If Carlsberg did jobs... Probably the best role in #ScholarlyPublishing. Public Knowledge Project is hiring a Managing Director, responsible for…

If Carlsberg did jobs... Probably the best role in #ScholarlyPublishing. Public Knowledge Project is hiring a Managing Director, responsible for…

Liked by Dominika Tkaczyk
We've got metadata. Lots of it. And we need a Program Technical Lead to help keep it all connected, open, and sustainable. ✔ Work remotely ✔ Lead a…

We've got metadata. Lots of it. And we need a Program Technical Lead to help keep it all connected, open, and sustainable. ✔ Work remotely ✔ Lead a…

Liked by Dominika Tkaczyk
We are hiring a remote Program Technical Lead at Crossref to help shape the future of open infrastructure for global scholarly communication. This…

We are hiring a remote Program Technical Lead at Crossref to help shape the future of open infrastructure for global scholarly communication. This…

Shared by Dominika Tkaczyk
Sooo, this happened (it actually did). Most reactions have been "Wait, I thought Crossref was already in the cloud". Well, nope, we just talked…

Sooo, this happened (it actually did). Most reactions have been "Wait, I thought Crossref was already in the cloud". Well, nope, we just talked…

Liked by Dominika Tkaczyk
🚀 Crossref is hiring a DevOps Engineer! Join our fully remote, mission-driven team and help build and support critical infrastructure for the…

🚀 Crossref is hiring a DevOps Engineer! Join our fully remote, mission-driven team and help build and support critical infrastructure for the…

Liked by Dominika Tkaczyk
I was fortunate to be a panellist at an excellent event organised by Institute of International and European Affairs on Ireland and AI-readiness. My…

I was fortunate to be a panellist at an excellent event organised by Institute of International and European Affairs on Ireland and AI-readiness. My…

Liked by Dominika Tkaczyk
Crossref is hosting a pub watch party on the last afternoon of the Metascience 2025 conference, screening and discussing two of the pre-conference…

Crossref is hosting a pub watch party on the last afternoon of the Metascience 2025 conference, screening and discussing two of the pre-conference…

Liked by Dominika Tkaczyk
speechless...

speechless...

Liked by Dominika Tkaczyk

View Dominika’s full profile

See who you know in common
Get introduced
Contact Dominika directly

Join to view full profile

Other similar profiles

Jakub Potocki

Jakub Potocki

Ireland

Connect
Daniel Bialek

Daniel Bialek

Cork

Connect
Tomasz Chocyk

Tomasz Chocyk

Ireland

Connect
Patryk Paszenda

Patryk Paszenda

Sale Engineer

Carrick-on-Shannon

Connect
Marta Doberschuetz-O'Shaughnessy

Marta Doberschuetz-O'Shaughnessy

Ireland

Connect
Mateusz Piskorz

Mateusz Piskorz

Lead Software Developer

Kielce

Connect
Paweł Obrębski

Paweł Obrębski

Java, Spring, Micro Services, Docker, Kubernetes

Ireland

Connect
Wojciech Mruk

Wojciech Mruk

Ireland

Connect
Adam Szczepański

Adam Szczepański

Senior Software Engineer at Allegro

Poznan Metropolitan Area

Connect
Paweł Kieliszczyk

Paweł Kieliszczyk

Ireland

Connect
Edyta Bogunia

Edyta Bogunia

Ireland

Connect
Agnieszka Dymel

Agnieszka Dymel

Ireland

Connect
Marek Kajda

Marek Kajda

Senior Full Stack Developer - Team Lead in Enterpryze

Dublin

Connect
Ryszard Czypicki

Ryszard Czypicki

Software Engineer at Visa

Singapore

Connect
Łukasz Plackowski

Łukasz Plackowski

Staff Software Engineer w Guidewire Software

Ireland

Connect
Agnieszka Szczepańska

Agnieszka Szczepańska

Software Engineer at Sauce Labs

Poznań

Connect
Grzegorz Czerwiński

Grzegorz Czerwiński

Mazowieckie, Poland

Connect
Weronika Falkowska Falkowska

Weronika Falkowska Falkowska

Ireland

Connect
Mariusz Kaszewiak

Mariusz Kaszewiak

Cobh

Connect
Adrian Sypos

Adrian Sypos

Ireland

Connect

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Dominika Tkaczyk

Dublin, County Dublin, Ireland 572 followers 500+ connections

Activity

And we find ourselves at the final session of #Crossref2025 with a fascinating panel on "Research Nexus in the real world: What is the impact and…

Liked by Dominika Tkaczyk

Why did Crossref need a dedicated Data Science team? Dominika Tkaczyk, Director of Technology, explains why and gives the team's mission as "The…

Liked by Dominika Tkaczyk

I’m thrilled to be joining Comarch this November as Chief AI Officer. We are living in extraordinary times, at the cusp of the greatest…

Liked by Dominika Tkaczyk

Experience

-

-

-

-

-

-

Education

Systems Research Institute, Polish Academy of Sciences

-

-

Licenses & Certifications

Publications

International Journal on Document Analysis and Recognition 2015

Semantic Web Evaluation Challenges 2015

D-Lib Magazine 2015

CERMINE — automatic extraction of metadata and references from scientific literature

11th IAPR International Workshop on Document Analysis Systems 2014

D-Lib Magazine 2014

Research and Advanced Technology for Digital Libraries, volume 8092 of Lecture Notes in Computer Science, Springer Berlin Heidelberg 2013

Intelligent Tools for Building a Scientific Information Platform, volume 467 of Studies in Computational Intelligence, Springer Berlin Heidelberg 2013

10th IAPR International Workshop on Document Analysis Systems 2012

JCDL '12 Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries 2012

Towards a Digital Mathematics Library. Bertinoro, Italy, July 20-21st, 2011 2011

Honors & Awards

ESWC 2015 SemPub Best Performing Approach Award

Semantic Publishing Challenge at 12th Extended Semantic Web Conference

DAS 2014 Best Student Paper Award

11th IAPR International Workshop on Document Analysis Systems

Languages

Polish

Native or bilingual proficiency

English

Professional working proficiency

More activity by Dominika

Another new open dataset just dropped! Last time it was affiliations, now it's GRANTS! Here are over 250,000 Crossref grant<>publication matches for…

Liked by Dominika Tkaczyk

If Carlsberg did jobs... Probably the best role in #ScholarlyPublishing. Public Knowledge Project is hiring a Managing Director, responsible for…

Liked by Dominika Tkaczyk

We've got metadata. Lots of it. And we need a Program Technical Lead to help keep it all connected, open, and sustainable. ✔ Work remotely ✔ Lead a…

Liked by Dominika Tkaczyk

We are hiring a remote Program Technical Lead at Crossref to help shape the future of open infrastructure for global scholarly communication. This…

Shared by Dominika Tkaczyk

Sooo, this happened (it actually did). Most reactions have been "Wait, I thought Crossref was already in the cloud". Well, nope, we just talked…

Liked by Dominika Tkaczyk

🚀 Crossref is hiring a DevOps Engineer! Join our fully remote, mission-driven team and help build and support critical infrastructure for the…

Liked by Dominika Tkaczyk

I was fortunate to be a panellist at an excellent event organised by Institute of International and European Affairs on Ireland and AI-readiness. My…

Liked by Dominika Tkaczyk

Crossref is hosting a pub watch party on the last afternoon of the Metascience 2025 conference, screening and discussing two of the pre-conference…

Liked by Dominika Tkaczyk

speechless...

Liked by Dominika Tkaczyk

View Dominika’s full profile

Other similar profiles

Jakub Potocki

Daniel Bialek

Tomasz Chocyk

Patryk Paszenda

Marta Doberschuetz-O'Shaughnessy

Mateusz Piskorz

Paweł Obrębski

Wojciech Mruk

Adam Szczepański

Paweł Kieliszczyk

Edyta Bogunia

Agnieszka Dymel

Marek Kajda

Ryszard Czypicki

Łukasz Plackowski

Agnieszka Szczepańska

Dublin, County Dublin, Ireland
572 followers 500+ connections