SlideShare a Scribd company logo
Towards the automatic identification
of the nature of citations
(1) Department of Computer Science and Engineering, University of Bologna, Italy
(2) STLab-ISTC, National Research Council, Italy
30 May 2013
Montpellier, France
ESWC 2013
Motivation
• Bibliographic citations can be seen as tools for:
– linking research: making pointers to related works, to source of
experimental data, to methods used, etc.
– disseminating research: conference proceedings, journals, Web
platforms (e.g. blogs, wikis), Semantic Publishing platforms and
projects (e.g. OpenCitation, OpenBibliography, Lucero)
– exploring research: new ways of browsing article through networks
of citations (e.g. CiteWiz, Citation Sensitive In-browser Summariser)
– evaluating research: measuring the importance of journals (e.g.
impact factor) or the scientific productivity of authors (e.g. h-index)
• Assumption: all these activities can be radically improved by
exploiting the actual function of citations, i.e. author’s
reason for citing a given paper
Goal
• To design a method able to automatically infer the
author’s reason for citing a scientific article
• To implement a tool that is comparable to humans in the
task of identifying the nature of citations
Available online at http://wit.istc.cnr.it:8080/tools/citalo
It extends the research
outlined in earlier work X.
Ontology
learning
Citation type
extraction
Word-sense
disambiguation
Alignment to
CiTO
Sentiment
analysis
Output:
cito:extends
Input: a sentence
containing a reference to
a bibliographic entity
indicated by an “X”
Derive a logical (i.e. an
OWL ontology)
representation of the
sentence through
FRED
Extract candidate types
for the citation by looking
for patterns in FRED
output via SPARQL
Gather the sense of the
candidate types through
IMS with respect to
OntoWordNet
Capture the sentiment
polarity emerging from th
text through AlchemyAPI
Assign CiTO types to the
citation through SPARQL
CONSTRUCT
Result
Similarly to Teufel et al. [19] the most
neutral CiTO property,
citesForInformation, was the most
prevalent function in our dataset too,
as the second most used property
was usedMethodIn
We run CiTalO on the same sample according to 8 different configurations and we
compared the results with humans annotations
No configuration
that emerges as
the absolutely best
one from these
data
Worst
configurations were
those that took into
account all the
proximal synsets
We asked humans to manually annotate 106 citation sentences, contained in scientific ar
according to CiTO properties
Thanks

More Related Content

PPTX
Ontology integration - Heterogeneity, Techniques and more
Adriel Café
 
PDF
Learning ontologies
Alexander De Leon
 
PPT
Ontology Mapping
butest
 
PDF
Standardization of the HIPC Data Templates: The Story So Far
Ahmad C. Bukhari
 
PPTX
Heterogeneous data annotation
Yomna Mahmoud Ibrahim Hassan
 
PDF
[poster] Detecting Incongruity Between News Headline and Body Text via a Deep...
Seoul National University
 
PDF
A non-technical introduction to text mining for information specialists
Tom De Schryver
 
PDF
Ontology Mapping
samhati27
 
Ontology integration - Heterogeneity, Techniques and more
Adriel Café
 
Learning ontologies
Alexander De Leon
 
Ontology Mapping
butest
 
Standardization of the HIPC Data Templates: The Story So Far
Ahmad C. Bukhari
 
Heterogeneous data annotation
Yomna Mahmoud Ibrahim Hassan
 
[poster] Detecting Incongruity Between News Headline and Body Text via a Deep...
Seoul National University
 
A non-technical introduction to text mining for information specialists
Tom De Schryver
 
Ontology Mapping
samhati27
 

What's hot (17)

PPTX
Ontology mapping for the semantic web
Worawith Sangkatip
 
PPT
Big Data & Text Mining
Michel Bruley
 
ODP
2011 03-provenance-workshop-edingurgh
Jun Zhao
 
PDF
Detecting Incongruity Between News Headline and Body Text via a Deep Hierarch...
Seoul National University
 
PPT
MELJUN CORTES research seminar_1__citing_a_source_summer_1516
MELJUN CORTES
 
PPTX
Supporting scientific discovery through linkages of literature and data
Don Pellegrino
 
PPT
Data Integration Ontology Mapping
Pradeep B Pillai
 
PPTX
Between  information  retrieval  services  and bibliometrics  research. New  ...
Andrea Scharnhorst
 
PPT
Week12
Esha Meher
 
PDF
Coursera programming1 2015
Rafal Zdziech
 
PPT
4.4 text mining
Krish_ver2
 
PPT
2011linked science4mccuskermcguinnessfinal
Deborah McGuinness
 
PPT
Ontology engineering: Ontology alignment
Guus Schreiber
 
PPTX
Bibliometric - MIT MetaResources
Micah Altman
 
PPTX
Ontology-based Data Integration
Janna Hastings
 
PDF
The Distribution of References in Scientific Papers: an Analysis of the IMRaD...
Iana Atanassova
 
PPTX
ELIXIR-UK and the ELIXIR Interoperability Platform
ELIXIR UK
 
Ontology mapping for the semantic web
Worawith Sangkatip
 
Big Data & Text Mining
Michel Bruley
 
2011 03-provenance-workshop-edingurgh
Jun Zhao
 
Detecting Incongruity Between News Headline and Body Text via a Deep Hierarch...
Seoul National University
 
MELJUN CORTES research seminar_1__citing_a_source_summer_1516
MELJUN CORTES
 
Supporting scientific discovery through linkages of literature and data
Don Pellegrino
 
Data Integration Ontology Mapping
Pradeep B Pillai
 
Between  information  retrieval  services  and bibliometrics  research. New  ...
Andrea Scharnhorst
 
Week12
Esha Meher
 
Coursera programming1 2015
Rafal Zdziech
 
4.4 text mining
Krish_ver2
 
2011linked science4mccuskermcguinnessfinal
Deborah McGuinness
 
Ontology engineering: Ontology alignment
Guus Schreiber
 
Bibliometric - MIT MetaResources
Micah Altman
 
Ontology-based Data Integration
Janna Hastings
 
The Distribution of References in Scientific Papers: an Analysis of the IMRaD...
Iana Atanassova
 
ELIXIR-UK and the ELIXIR Interoperability Platform
ELIXIR UK
 
Ad

Viewers also liked (9)

PDF
C la informacion_tecnologica_basada_en_patentes
Marcos Iván Borbollah
 
PPTX
Patentes
verroniica
 
PDF
Evaluating citation functions in CiTO: cognitive issues
Andrea Nuzzolese
 
PDF
Knowledge Patterns for the Web: extraction, transformation, and reuse
Andrea Nuzzolese
 
PDF
Semantic Technologies in ST&DL
Andrea Nuzzolese
 
PPTX
Sheldon challenge
Andrea Nuzzolese
 
PDF
Conference Linked Data: the ScholarlyData project
Andrea Nuzzolese
 
PDF
Aemoo: Linked Data Exploration based on Knowledge Patterns
Andrea Nuzzolese
 
C la informacion_tecnologica_basada_en_patentes
Marcos Iván Borbollah
 
Patentes
verroniica
 
Evaluating citation functions in CiTO: cognitive issues
Andrea Nuzzolese
 
Knowledge Patterns for the Web: extraction, transformation, and reuse
Andrea Nuzzolese
 
Semantic Technologies in ST&DL
Andrea Nuzzolese
 
Sheldon challenge
Andrea Nuzzolese
 
Conference Linked Data: the ScholarlyData project
Andrea Nuzzolese
 
Aemoo: Linked Data Exploration based on Knowledge Patterns
Andrea Nuzzolese
 
Ad

Similar to Towards the automatic identification of the nature of citations (20)

PDF
Towards the automatic identification of the nature of citations
University of Bologna
 
PDF
Semantic citation
Deepak K
 
PDF
Annotated Corpus For Citation Context Analysis
Deja Lewis
 
PDF
Applying machine learning techniques to big data in the scholarly domain
Angelo Salatino
 
PDF
Citation semantic based approaches to identify article quality
csandit
 
PDF
CITATION SEMANTIC BASED APPROACHES TO IDENTIFY ARTICLE QUALITY
cscpconf
 
PDF
Characterising citations in scholarly articles: an experiment
University of Bologna
 
PPTX
Design Insights for the Next Wave Ontology Authoring Tools
Markel Vigo
 
PDF
Citation metrics
Vasantha Raju N
 
PDF
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
IJwest
 
PDF
EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts ...
Francesco Osborne
 
PPT
Mathew.ppt
SurbhiTanwar12
 
PDF
A Citation Centric Annotation Scheme For Scientific Articles
Andrea Porter
 
PDF
OpenCitations
University of Bologna
 
PDF
V.3 poster current citations and a future with linked data
Iliadis Dimitrios
 
PPT
Journal Publication-citation and indexing.ppt
PrakashAppa1
 
PPTX
Citation Metrics: Established and Emerging Tools
Linda Galloway
 
PPTX
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
Angelo Salatino
 
PDF
Perspectives on Ontology Learning 1st Edition J. Lehmann
uchmskfmf2066
 
PDF
News construction from microblogging post using open data
Francisco Berrizbeitia
 
Towards the automatic identification of the nature of citations
University of Bologna
 
Semantic citation
Deepak K
 
Annotated Corpus For Citation Context Analysis
Deja Lewis
 
Applying machine learning techniques to big data in the scholarly domain
Angelo Salatino
 
Citation semantic based approaches to identify article quality
csandit
 
CITATION SEMANTIC BASED APPROACHES TO IDENTIFY ARTICLE QUALITY
cscpconf
 
Characterising citations in scholarly articles: an experiment
University of Bologna
 
Design Insights for the Next Wave Ontology Authoring Tools
Markel Vigo
 
Citation metrics
Vasantha Raju N
 
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
IJwest
 
EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts ...
Francesco Osborne
 
Mathew.ppt
SurbhiTanwar12
 
A Citation Centric Annotation Scheme For Scientific Articles
Andrea Porter
 
OpenCitations
University of Bologna
 
V.3 poster current citations and a future with linked data
Iliadis Dimitrios
 
Journal Publication-citation and indexing.ppt
PrakashAppa1
 
Citation Metrics: Established and Emerging Tools
Linda Galloway
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
Angelo Salatino
 
Perspectives on Ontology Learning 1st Edition J. Lehmann
uchmskfmf2066
 
News construction from microblogging post using open data
Francisco Berrizbeitia
 

More from Andrea Nuzzolese (6)

PDF
Loditaly2014 new
Andrea Nuzzolese
 
PDF
Knowledge Representation and Reasoning with Apache Stanbol
Andrea Nuzzolese
 
PDF
Type inference through the analysis of Wikipedia links
Andrea Nuzzolese
 
PDF
Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and U...
Andrea Nuzzolese
 
PPTX
Gathering Lexical Linked Data and Knowledge Patterns from FrameNet
Andrea Nuzzolese
 
PPTX
Aemoo: exploratory search based on knowledge patterns over the Semantic Web
Andrea Nuzzolese
 
Loditaly2014 new
Andrea Nuzzolese
 
Knowledge Representation and Reasoning with Apache Stanbol
Andrea Nuzzolese
 
Type inference through the analysis of Wikipedia links
Andrea Nuzzolese
 
Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and U...
Andrea Nuzzolese
 
Gathering Lexical Linked Data and Knowledge Patterns from FrameNet
Andrea Nuzzolese
 
Aemoo: exploratory search based on knowledge patterns over the Semantic Web
Andrea Nuzzolese
 

Recently uploaded (20)

PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PDF
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
The Future of Artificial Intelligence (AI)
Mukul
 
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 

Towards the automatic identification of the nature of citations

  • 1. Towards the automatic identification of the nature of citations (1) Department of Computer Science and Engineering, University of Bologna, Italy (2) STLab-ISTC, National Research Council, Italy 30 May 2013 Montpellier, France ESWC 2013
  • 2. Motivation • Bibliographic citations can be seen as tools for: – linking research: making pointers to related works, to source of experimental data, to methods used, etc. – disseminating research: conference proceedings, journals, Web platforms (e.g. blogs, wikis), Semantic Publishing platforms and projects (e.g. OpenCitation, OpenBibliography, Lucero) – exploring research: new ways of browsing article through networks of citations (e.g. CiteWiz, Citation Sensitive In-browser Summariser) – evaluating research: measuring the importance of journals (e.g. impact factor) or the scientific productivity of authors (e.g. h-index) • Assumption: all these activities can be radically improved by exploiting the actual function of citations, i.e. author’s reason for citing a given paper
  • 3. Goal • To design a method able to automatically infer the author’s reason for citing a scientific article • To implement a tool that is comparable to humans in the task of identifying the nature of citations
  • 4. Available online at http://wit.istc.cnr.it:8080/tools/citalo It extends the research outlined in earlier work X. Ontology learning Citation type extraction Word-sense disambiguation Alignment to CiTO Sentiment analysis Output: cito:extends Input: a sentence containing a reference to a bibliographic entity indicated by an “X” Derive a logical (i.e. an OWL ontology) representation of the sentence through FRED Extract candidate types for the citation by looking for patterns in FRED output via SPARQL Gather the sense of the candidate types through IMS with respect to OntoWordNet Capture the sentiment polarity emerging from th text through AlchemyAPI Assign CiTO types to the citation through SPARQL CONSTRUCT
  • 5. Result Similarly to Teufel et al. [19] the most neutral CiTO property, citesForInformation, was the most prevalent function in our dataset too, as the second most used property was usedMethodIn We run CiTalO on the same sample according to 8 different configurations and we compared the results with humans annotations No configuration that emerges as the absolutely best one from these data Worst configurations were those that took into account all the proximal synsets We asked humans to manually annotate 106 citation sentences, contained in scientific ar according to CiTO properties