SlideShare a Scribd company logo
Research Knowledge Graphs at GESIS & NFDI4DS
„Faire Datenökonomie“, Berlin
Stefan Dietze, 15.03.2022
Sharing & Reuse of Research Data, Resources, Knowledge
▪ Research Data
▪ Publications
▪ Code/Scripts
▪ ML Models
▪ Methods
▪ Claims
▪ Metrics
Relations between scientific resources, data, knowledge Research Data Cycle
Common questions for researchers
• Which top-tier publications cite which data/method?
(„dataset authority“)
• Which data was used to train/evaluate which method?
Which method to produce what data?
• Which claims are supported/cited/rejected by what
dataset or publication?
▪ Research Data
▪ Publications
▪ Code/Scripts
▪ ML Models
▪ Methods
▪ Claims
▪ Metrics
Relations between scientific resources, data, knowledge
Sharing & Reuse of Research Data, Resources, Knowledge
Challenges
• Data & metadata about resources and concepts not
represented in structured, machine-interpretable,
integrated manner (hidden in publications, web pages
etc)
• Persistent identifiers (e.g. DOIs) used inconsistently
(e.g. on publications/datasets)
• Relations and semantics not explicit
• Reproducibility crisis in CS/DS/AI
▪ Research Data
▪ Publications
▪ Code/Scripts
▪ ML Models
▪ Methods
▪ Claims
▪ Metrics
Relations between scientific resources, data, knowledge
Sharing & Reuse of Research Data, Resources, Knowledge
Knowledge Graphs for FAIR Research Data
• Improving data interoperability and reuse through
established W3C standards for data sharing (on the Web), e.g.
RDF, JSON, shared vocabularies
(e.g. schema.org, DCAT, DDI), APIs for data reuse and linking
• Making links between resources and concepts explicit &
machine-interpretable
(e.g. which publications cite what dataset? Which claim is
supported/rejected by publication X/dataset Y?)
• Consistent use of persisent IDs (e.g. URIs, DOIs) across all
data, e.g. concepts, resources etc („DOIs for all“)
Resources
▪ Datasets
▪ Publications
▪ Code
▪ Software
Concepts
▪ Terms &
Definitions
▪ Claims
▪ Methods
▪ Topics
▪ Entities
6
© Frank van Harmelen
KGs in practice
7
Research KGs in Practice: integrated search @ GESIS
https://search.gesis.org/
Dataset
Rel. Publications
8
From publications to machine-interpretable metadata KGs
Disambiguation of dataset & software/script citations
https://data.gesis.org/softwarekg
▪ Training deep learning-
based model for extraction
software & data references
in large-scale data
(3.5 M publications)
▪ Data lifting into KG
▪ 300+ M triples / statements
▪ Search across
data/software/publications
(GESIS Search)
From publications to machine-interpretable metadata KGs
Understanding scientific software/data usage
https://data.gesis.org/softwarekg
(Schindler et al., CIKM2021)
▪ Understanding SW
usage, citation habits
and their evolution
across disciplines
▪ Rise of data science =
rise of software usage
From publications to machine-interpretable metadata KGs
Understanding scientific software/data usage
https://data.gesis.org/softwarekg
▪ Top adopters of data
science/AI/software…
From publications to machine-interpretable metadata KGs
Understanding scientific software/data usage
https://data.gesis.org/softwarekg
▪ Top adopters of data
science/AI/software…
▪ …follow the worst
citation habits
http://dbpedia.org/resource/Tim_Berners-Lee
wna:positive-emotion
onyx:Intensity "0.75"
onyx:Intensity "0.0"
http://dbpedia.org/resource/Solid
wna:negative-emotion
From social media to machine-interpretable research data KGs
https://data.gesis.org/tweetskb
Building a public research knowledge graph from Twitter data
From social media to machine-interpretable research data KGs
https://data.gesis.org/tweetskb
TweetsKB – a large-scale research KG of societal opinions
▪ Harvesting & archiving of 10 Billion tweets
(permanent collection from Twitter 1% sample since
2013)
▪ Information extraction pipeline to build a KG of
entities, interactions & sentiments
(distributed batch processing via Hadoop
Map/Reduce)
o Entity linking with knowledge graph/DBpedia
(“president”/“potus”/”trump” =>
dbp:DonaldTrump)
o Sentiment analysis/annotation
o Geotagging
o Lifting into knowledge graph schema
▪ Public, large-scale research corpus of public
opinions and their evolution
=> interdisciplinary research
P. Fafalios, V. Iosifidis, E. Ntoutsi, and S. Dietze, TweetsKB: A Public
and Large-Scale RDF Corpus of Annotated Tweets, ESWC'18.
RKG-based social science research using TweetsKB
https://dd4p.gesis.org
Investigating Vaccine Hesitancy in DACH countries
Investigating Vaccine Hesitancy in DACH countries
Germany suspends
vaccinations with Astra
Zeneca
RKG-based social science research using TweetsKB
https://dd4p.gesis.org
Twitter discourse zu “Impfbereitschaft”
RKG-based discourse analysis using TweetsKB
Vaccine Hesitancy– key topics in “safety” category
„Schwangerschaft“ „Kimmich“
„Alter“
„Nebenwirkungen“
„Herzinfarkt“
„Zulassung“
https://dd4p.gesis.org
Knowledge Graphs for Research Data: Initiatives
17
Summary: Research KGs @ GESIS
18
Tools for constructing scholarly knowledge graphs
● NLP and deep learning-powered methods for extracting large-scale KGs
about methods, claims, data, software involved in the scientific process
Large-scale scholarly KGs, e.g.
● KGs about scholarly use of software & research data
(e.g. SoftwareKG: 1.8 M disambiguated software mentions extracted
from 3 M publications, https://data.gesis.org/softwarekg/)
● Web mined KGs of social science research data, e.g. public opinions,
claims and attitudes expressed on social media
(e.g. TweetsKB: > 10 Bn semantically annotated tweets, sentiments,
https://data.gesis.org/tweetskb)
Semantic Search powered by KGs and related tools
● RKG-powered search across scholarly publications, datasets, methods
and their relations (e.g. GESIS Search, https://search.gesis.org)
https://gesis.org/en/kts
https://search.gesis.org
Outlook: a joint Research KG in NFDI4DS
19
 19
Resources
▪ Datasets
▪ Publications
▪ Code
▪ Software
Concepts
▪ Terms &
Definitions
▪ Claims
▪ Methods
▪ Topics
▪ Entities
• Community
• Expert-curation/annotation
workflow/tools
• Focus point & hub
• PIDs
https://www.nfdi4datascience.de/
• Large-scale knowledge
graphs
• Automated deep learning-
based methods for
extracting KGs
• Populating ORKG
20
 20
@stefandietze
http://stefandietze.net

More Related Content

PDF
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
Stefan Dietze
 
PPTX
The State of Linked Government Data
Richard Cyganiak
 
PPTX
Cognitive data
Sören Auer
 
PPTX
Towards an Open Research Knowledge Graph
Sören Auer
 
PPTX
Describing Scholarly Contributions semantically with the Open Research Knowle...
Sören Auer
 
PPTX
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
Tom Plasterer
 
PDF
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
Tom Plasterer
 
PPTX
Linked Data for Biopharma
Tom Plasterer
 
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
Stefan Dietze
 
The State of Linked Government Data
Richard Cyganiak
 
Cognitive data
Sören Auer
 
Towards an Open Research Knowledge Graph
Sören Auer
 
Describing Scholarly Contributions semantically with the Open Research Knowle...
Sören Auer
 
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
Tom Plasterer
 
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
Tom Plasterer
 
Linked Data for Biopharma
Tom Plasterer
 

What's hot (20)

PDF
Nordic health data metadata
Fredric Landqvist
 
PPTX
Open PHACTS : Linked Data Future Challenges
SciBite Limited
 
PPT
Human Genome and Big Data Challenges
Philip Bourne
 
PPTX
Data Communities - reusable data in and outside your organization.
Paul Groth
 
PDF
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
National Information Standards Organization (NISO)
 
PPTX
Towards Knowledge Graph based Representation, Augmentation and Exploration of...
Sören Auer
 
PDF
Beyond research data infrastructures: exploiting artificial & crowd intellige...
Stefan Dietze
 
PPTX
FAIRy stories: tales from building the FAIR Research Commons
Carole Goble
 
PDF
ESA Ignite talk on UC3 Dash platform for data sharing
Carly Strasser
 
PPTX
Enabling better science - Results and vision of the OpenAIRE infrastructure a...
Paolo Manghi
 
PPTX
Knowledge graph construction for research & medicine
Paul Groth
 
PPTX
Content + Signals: The value of the entire data estate for machine learning
Paul Groth
 
PDF
Towards research data knowledge graphs
Stefan Dietze
 
PPTX
The Roots: Linked data and the foundations of successful Agriculture Data
Paul Groth
 
PPTX
Ziegler Open Data in Special Collections Libraries
National Information Standards Organization (NISO)
 
PDF
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Merce Crosas
 
PDF
Knowledge Graph Maintenance
Paul Groth
 
PDF
Exploration, visualization and querying of linked open data sources
Laura Po
 
PPTX
From Data Search to Data Showcasing
Paul Groth
 
PPTX
From Text to Data to the World: The Future of Knowledge Graphs
Paul Groth
 
Nordic health data metadata
Fredric Landqvist
 
Open PHACTS : Linked Data Future Challenges
SciBite Limited
 
Human Genome and Big Data Challenges
Philip Bourne
 
Data Communities - reusable data in and outside your organization.
Paul Groth
 
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
National Information Standards Organization (NISO)
 
Towards Knowledge Graph based Representation, Augmentation and Exploration of...
Sören Auer
 
Beyond research data infrastructures: exploiting artificial & crowd intellige...
Stefan Dietze
 
FAIRy stories: tales from building the FAIR Research Commons
Carole Goble
 
ESA Ignite talk on UC3 Dash platform for data sharing
Carly Strasser
 
Enabling better science - Results and vision of the OpenAIRE infrastructure a...
Paolo Manghi
 
Knowledge graph construction for research & medicine
Paul Groth
 
Content + Signals: The value of the entire data estate for machine learning
Paul Groth
 
Towards research data knowledge graphs
Stefan Dietze
 
The Roots: Linked data and the foundations of successful Agriculture Data
Paul Groth
 
Ziegler Open Data in Special Collections Libraries
National Information Standards Organization (NISO)
 
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Merce Crosas
 
Knowledge Graph Maintenance
Paul Groth
 
Exploration, visualization and querying of linked open data sources
Laura Po
 
From Data Search to Data Showcasing
Paul Groth
 
From Text to Data to the World: The Future of Knowledge Graphs
Paul Groth
 
Ad

Similar to Research Knowledge Graphs at GESIS & NFDI4DataScience (20)

PDF
Research Knowledge Graphs at NFDI4DS & GESIS
Stefan Dietze
 
PDF
Understanding Scientific and Societal Adoption and Impact of Science Through ...
Stefan Dietze
 
PDF
Managing, Sharing and Curating Your Research Data in a Digital Environment
philipdurbin
 
PDF
INCLUSION OF DATA ARCHIVES IN DATA MANAGEMENT PLAN
Arhiv družboslovnih podatkov
 
PDF
My FAIR share of the work - Diamond Light Source - Dec 2018
Susanna-Assunta Sansone
 
PDF
Open Science - Global Perspectives/Simon Hodson
Academy of Science of South Africa (ASSAf)
 
PDF
Big Data for Library Services (2017)
Albert Anthony Gavino, MBA
 
PPTX
Open Science Globally: Some Developments/Dr Simon Hodson
African Open Science Platform
 
PDF
Big Data & DS Analytics for PAARL
Philippine Association of Academic/Research Librarians
 
PPTX
Research-Data-Management-and-your-PhD
University of Liverpool Library
 
PDF
Managing Metadata for Science and Technology Studies: the RISIS case
Rinke Hoekstra
 
PPTX
FAIR for the future: embracing all things data
ARDC
 
PDF
Alain Frey Research Data for universities and information producers
Incisive_Events
 
PPTX
Introduction to research data management
dri_ireland
 
PDF
dkNET Office Hours: NIH Data Management and Sharing Mandate 05/03/2024
dkNET
 
PPTX
Pushing back, standards and standard organizations in a Semantic Web enabled ...
Kerstin Forsberg
 
PPTX
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Tim Williams
 
PPTX
Birgit Schmidt: RDA for Libraries from an International Perspective
dri_ireland
 
PPTX
Introduction to open-data
OpenAccessBelgium
 
PPTX
Research Data Management and Reproducibility
University of Liverpool Library
 
Research Knowledge Graphs at NFDI4DS & GESIS
Stefan Dietze
 
Understanding Scientific and Societal Adoption and Impact of Science Through ...
Stefan Dietze
 
Managing, Sharing and Curating Your Research Data in a Digital Environment
philipdurbin
 
INCLUSION OF DATA ARCHIVES IN DATA MANAGEMENT PLAN
Arhiv družboslovnih podatkov
 
My FAIR share of the work - Diamond Light Source - Dec 2018
Susanna-Assunta Sansone
 
Open Science - Global Perspectives/Simon Hodson
Academy of Science of South Africa (ASSAf)
 
Big Data for Library Services (2017)
Albert Anthony Gavino, MBA
 
Open Science Globally: Some Developments/Dr Simon Hodson
African Open Science Platform
 
Research-Data-Management-and-your-PhD
University of Liverpool Library
 
Managing Metadata for Science and Technology Studies: the RISIS case
Rinke Hoekstra
 
FAIR for the future: embracing all things data
ARDC
 
Alain Frey Research Data for universities and information producers
Incisive_Events
 
Introduction to research data management
dri_ireland
 
dkNET Office Hours: NIH Data Management and Sharing Mandate 05/03/2024
dkNET
 
Pushing back, standards and standard organizations in a Semantic Web enabled ...
Kerstin Forsberg
 
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Tim Williams
 
Birgit Schmidt: RDA for Libraries from an International Perspective
dri_ireland
 
Introduction to open-data
OpenAccessBelgium
 
Research Data Management and Reproducibility
University of Liverpool Library
 
Ad

More from Stefan Dietze (20)

PDF
NEWORDER Project - Science in the online knowledge order
Stefan Dietze
 
PDF
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Stefan Dietze
 
PDF
AI in between online and offline discourse - and what has ChatGPT to do with ...
Stefan Dietze
 
PDF
An interdisciplinary journey with the SAL spaceship – results and challenges ...
Stefan Dietze
 
PDF
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...
Stefan Dietze
 
PDF
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
Stefan Dietze
 
PDF
Using AI to understand everyday learning on the Web
Stefan Dietze
 
PDF
Analysing User Knowledge, Competence and Learning during Online Activities
Stefan Dietze
 
PDF
Analysing & Improving Learning Resources Markup on the Web
Stefan Dietze
 
PDF
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
Stefan Dietze
 
PDF
Big Data in Learning Analytics - Analytics for Everyday Learning
Stefan Dietze
 
PDF
Retrieval, Crawling and Fusion of Entity-centric Data on the Web
Stefan Dietze
 
PDF
Mining and Understanding Activities and Resources on the Web
Stefan Dietze
 
PDF
Towards embedded Markup of Learning Resources on the Web
Stefan Dietze
 
PDF
Semantic Linking & Retrieval for Digital Libraries
Stefan Dietze
 
PDF
Linked Data for Architecture, Engineering and Construction (AEC)
Stefan Dietze
 
PDF
Dietze linked data-vr-es
Stefan Dietze
 
PDF
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Stefan Dietze
 
PDF
Turning Data into Knowledge (KESW2014 Keynote)
Stefan Dietze
 
PDF
From Data to Knowledge - Profiling & Interlinking Web Datasets
Stefan Dietze
 
NEWORDER Project - Science in the online knowledge order
Stefan Dietze
 
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Stefan Dietze
 
AI in between online and offline discourse - and what has ChatGPT to do with ...
Stefan Dietze
 
An interdisciplinary journey with the SAL spaceship – results and challenges ...
Stefan Dietze
 
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...
Stefan Dietze
 
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
Stefan Dietze
 
Using AI to understand everyday learning on the Web
Stefan Dietze
 
Analysing User Knowledge, Competence and Learning during Online Activities
Stefan Dietze
 
Analysing & Improving Learning Resources Markup on the Web
Stefan Dietze
 
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
Stefan Dietze
 
Big Data in Learning Analytics - Analytics for Everyday Learning
Stefan Dietze
 
Retrieval, Crawling and Fusion of Entity-centric Data on the Web
Stefan Dietze
 
Mining and Understanding Activities and Resources on the Web
Stefan Dietze
 
Towards embedded Markup of Learning Resources on the Web
Stefan Dietze
 
Semantic Linking & Retrieval for Digital Libraries
Stefan Dietze
 
Linked Data for Architecture, Engineering and Construction (AEC)
Stefan Dietze
 
Dietze linked data-vr-es
Stefan Dietze
 
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Stefan Dietze
 
Turning Data into Knowledge (KESW2014 Keynote)
Stefan Dietze
 
From Data to Knowledge - Profiling & Interlinking Web Datasets
Stefan Dietze
 

Recently uploaded (20)

PPTX
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
PPTX
short term internship project on Data visualization
JMJCollegeComputerde
 
PDF
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PDF
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
PPTX
Introduction to Data Analytics and Data Science
KavithaCIT
 
PPTX
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
PPTX
Probability systematic sampling methods.pptx
PrakashRajput19
 
PPTX
Introduction to Biostatistics Presentation.pptx
AtemJoshua
 
PDF
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
PDF
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
 
PPTX
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PDF
Fundamentals and Techniques of Biophysics and Molecular Biology (Pranav Kumar...
RohitKumar868624
 
PPTX
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
PPTX
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
PPTX
Presentation on animal welfare a good topic
kidscream385
 
PDF
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
PDF
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
short term internship project on Data visualization
JMJCollegeComputerde
 
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
Introduction to Data Analytics and Data Science
KavithaCIT
 
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
Probability systematic sampling methods.pptx
PrakashRajput19
 
Introduction to Biostatistics Presentation.pptx
AtemJoshua
 
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
 
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
Fundamentals and Techniques of Biophysics and Molecular Biology (Pranav Kumar...
RohitKumar868624
 
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
Presentation on animal welfare a good topic
kidscream385
 
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 

Research Knowledge Graphs at GESIS & NFDI4DataScience

  • 1. Research Knowledge Graphs at GESIS & NFDI4DS „Faire Datenökonomie“, Berlin Stefan Dietze, 15.03.2022
  • 2. Sharing & Reuse of Research Data, Resources, Knowledge ▪ Research Data ▪ Publications ▪ Code/Scripts ▪ ML Models ▪ Methods ▪ Claims ▪ Metrics Relations between scientific resources, data, knowledge Research Data Cycle
  • 3. Common questions for researchers • Which top-tier publications cite which data/method? („dataset authority“) • Which data was used to train/evaluate which method? Which method to produce what data? • Which claims are supported/cited/rejected by what dataset or publication? ▪ Research Data ▪ Publications ▪ Code/Scripts ▪ ML Models ▪ Methods ▪ Claims ▪ Metrics Relations between scientific resources, data, knowledge Sharing & Reuse of Research Data, Resources, Knowledge
  • 4. Challenges • Data & metadata about resources and concepts not represented in structured, machine-interpretable, integrated manner (hidden in publications, web pages etc) • Persistent identifiers (e.g. DOIs) used inconsistently (e.g. on publications/datasets) • Relations and semantics not explicit • Reproducibility crisis in CS/DS/AI ▪ Research Data ▪ Publications ▪ Code/Scripts ▪ ML Models ▪ Methods ▪ Claims ▪ Metrics Relations between scientific resources, data, knowledge Sharing & Reuse of Research Data, Resources, Knowledge
  • 5. Knowledge Graphs for FAIR Research Data • Improving data interoperability and reuse through established W3C standards for data sharing (on the Web), e.g. RDF, JSON, shared vocabularies (e.g. schema.org, DCAT, DDI), APIs for data reuse and linking • Making links between resources and concepts explicit & machine-interpretable (e.g. which publications cite what dataset? Which claim is supported/rejected by publication X/dataset Y?) • Consistent use of persisent IDs (e.g. URIs, DOIs) across all data, e.g. concepts, resources etc („DOIs for all“) Resources ▪ Datasets ▪ Publications ▪ Code ▪ Software Concepts ▪ Terms & Definitions ▪ Claims ▪ Methods ▪ Topics ▪ Entities
  • 6. 6 © Frank van Harmelen KGs in practice
  • 7. 7 Research KGs in Practice: integrated search @ GESIS https://search.gesis.org/ Dataset Rel. Publications
  • 8. 8 From publications to machine-interpretable metadata KGs Disambiguation of dataset & software/script citations https://data.gesis.org/softwarekg ▪ Training deep learning- based model for extraction software & data references in large-scale data (3.5 M publications) ▪ Data lifting into KG ▪ 300+ M triples / statements ▪ Search across data/software/publications (GESIS Search)
  • 9. From publications to machine-interpretable metadata KGs Understanding scientific software/data usage https://data.gesis.org/softwarekg (Schindler et al., CIKM2021) ▪ Understanding SW usage, citation habits and their evolution across disciplines ▪ Rise of data science = rise of software usage
  • 10. From publications to machine-interpretable metadata KGs Understanding scientific software/data usage https://data.gesis.org/softwarekg ▪ Top adopters of data science/AI/software…
  • 11. From publications to machine-interpretable metadata KGs Understanding scientific software/data usage https://data.gesis.org/softwarekg ▪ Top adopters of data science/AI/software… ▪ …follow the worst citation habits
  • 12. http://dbpedia.org/resource/Tim_Berners-Lee wna:positive-emotion onyx:Intensity "0.75" onyx:Intensity "0.0" http://dbpedia.org/resource/Solid wna:negative-emotion From social media to machine-interpretable research data KGs https://data.gesis.org/tweetskb Building a public research knowledge graph from Twitter data
  • 13. From social media to machine-interpretable research data KGs https://data.gesis.org/tweetskb TweetsKB – a large-scale research KG of societal opinions ▪ Harvesting & archiving of 10 Billion tweets (permanent collection from Twitter 1% sample since 2013) ▪ Information extraction pipeline to build a KG of entities, interactions & sentiments (distributed batch processing via Hadoop Map/Reduce) o Entity linking with knowledge graph/DBpedia (“president”/“potus”/”trump” => dbp:DonaldTrump) o Sentiment analysis/annotation o Geotagging o Lifting into knowledge graph schema ▪ Public, large-scale research corpus of public opinions and their evolution => interdisciplinary research P. Fafalios, V. Iosifidis, E. Ntoutsi, and S. Dietze, TweetsKB: A Public and Large-Scale RDF Corpus of Annotated Tweets, ESWC'18.
  • 14. RKG-based social science research using TweetsKB https://dd4p.gesis.org Investigating Vaccine Hesitancy in DACH countries
  • 15. Investigating Vaccine Hesitancy in DACH countries Germany suspends vaccinations with Astra Zeneca RKG-based social science research using TweetsKB https://dd4p.gesis.org Twitter discourse zu “Impfbereitschaft”
  • 16. RKG-based discourse analysis using TweetsKB Vaccine Hesitancy– key topics in “safety” category „Schwangerschaft“ „Kimmich“ „Alter“ „Nebenwirkungen“ „Herzinfarkt“ „Zulassung“ https://dd4p.gesis.org
  • 17. Knowledge Graphs for Research Data: Initiatives 17
  • 18. Summary: Research KGs @ GESIS 18 Tools for constructing scholarly knowledge graphs ● NLP and deep learning-powered methods for extracting large-scale KGs about methods, claims, data, software involved in the scientific process Large-scale scholarly KGs, e.g. ● KGs about scholarly use of software & research data (e.g. SoftwareKG: 1.8 M disambiguated software mentions extracted from 3 M publications, https://data.gesis.org/softwarekg/) ● Web mined KGs of social science research data, e.g. public opinions, claims and attitudes expressed on social media (e.g. TweetsKB: > 10 Bn semantically annotated tweets, sentiments, https://data.gesis.org/tweetskb) Semantic Search powered by KGs and related tools ● RKG-powered search across scholarly publications, datasets, methods and their relations (e.g. GESIS Search, https://search.gesis.org) https://gesis.org/en/kts https://search.gesis.org
  • 19. Outlook: a joint Research KG in NFDI4DS 19 19 Resources ▪ Datasets ▪ Publications ▪ Code ▪ Software Concepts ▪ Terms & Definitions ▪ Claims ▪ Methods ▪ Topics ▪ Entities • Community • Expert-curation/annotation workflow/tools • Focus point & hub • PIDs https://www.nfdi4datascience.de/ • Large-scale knowledge graphs • Automated deep learning- based methods for extracting KGs • Populating ORKG