SlideShare a Scribd company logo
Towards Purposeful Reuse of Semantic Datasets via
Goal-Driven Data Summarization
Panos Alexopoulos, Jose Manuel Gomez Perez
6th International Conference on Advances in Semantic Processing
Porto, Portugal, October 3rd, 2013
Introduction

The Linked Data Use Challenge

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

2
Introduction

Motivating Scenario

●Assume that some entity (individual or organization) wants to reuse
public semantic datasets from the Web to:

● Enrich with them its own data.
● Use the data to provide added-value services to its users/clients.
●These organizations can be:
● Technology providers (e.g. iSOCO)
● Information providers (e.g. publishers, media, etc.)
● Knowledge-driven and knowledge-intensive organisations

3
Data Enrichment

Why Data Reuse?

●The problem with semantic data is the high amount of time and effort
required to construct and maintain it.

●The reuse of existing public semantic data can (partially) alleviate
this problem:
● Their volume and diversity are increasing at high rates.
● Their maintenance and evolution is the responsibility of their
publishers, reducing the required efforts and costs for this task in
the organization's side

4
Data Enrichment

Example

●A news organization wants to create and maintain a knowledge base
about European Football.

●The pace at which this knowledge changes is quite fast meaning that
the organization needs to constantly monitor these changes and
update the data.
●Much of this information is already available as public semantic data
(e.g. DBPedia).
●Thus it could be better for the organization to reuse this public data
instead of creating them from scratch.

5
Data Enrichment

Barriers to Data Reuse
● Difficulty for knowledge engineers to decide whether a given dataset is
actually suitable for their needs.

● Semantic datasets typically cover diverse domains
● They do not follow a unified way of organizing the knowledge
● Differ in a number of features including size, coverage, granularity and
descriptiveness
● This makes difficult the following tasks:
● Assessing whether a dataset satisfies particular requirements
● Comparing different datasets to select which one is more suitable for a
given purpose.
6
Data Reuse

Our Approach

●We suggest the provision of the ability to data consumers to derive
semantic data summaries.

●Existing summarization approaches treat the summarization task in
an application and user independent way.
●By contrast, we are interested in facilitating the generation of
requirements-oriented and goal-driven summaries that may be
significantly more helpful to users.

7
Goal-Driven Semantic Data Summarization

Problem Description

●Key question: “Given an application scenario where semantic data is
required, how suitable is a given existing dataset for the purposes of
this scenario?”
●To answer this, users normally need to be able to:
1. Explicitly express the requirements that a dataset needs to
satisfy for a given task or goal.
2. Automatically measure/assess the extent to which a dataset
satisfies each of these requirements and compile a summary
report.

8
Goal-Driven Semantic Data Summarization

Approach

●To implement these two capabilities we follow a checklist-based
approach.

●Checklists are practically lists of action items arranged in a
systematic manner that allow users to record the completion of each
of them.
●They are widely applied across multiple industries, like healthcare or
aviation, to ensure reliable and consistent execution of complex
operations.
●In our case we apply checklists to define and execute custom
dataset summarization tasks in the form of lists of goal-specific
requirements and associated summarization processes.
9
Goal-Driven Semantic Data Summarization

Summarization Task Representation

●To represent custom summarization tasks according to the checklist
paradigm we have adopted the Minim model.

●This defines the following information:
● The Goals the dataset summarization task is designed to serve

● The Requirements against which the summarization task
evaluates the dataset.
● The Data Analysis Operations that the summarization task
employs in order to assess the satisfaction of its requirements

10
Goal-Driven Semantic Data Summarization

Example Goals

●Decide if a dataset is appropriate for a Semantic Annotation scenario.
●Decide if a dataset is appropriate for a Question Answering scenario
●Determine which of two or more similar datasets best represent a
given corpus.

●Detect arising inconsistencies or other quality problems.
●…

11
Goal-Driven Semantic Data Summarization

Example Requirements

●Evaluate the dataset’s coverage of a particular domain/topic:
Aims to measure the extent to which a dataset describes a given
domain or topic.
●Evaluate the dataset’s labeling adequacy and richness: Aims to
measure the extent to which the dataset’s elements (concepts,
instances, relations etc.) are accompanied by representative and
comprehensible labels, in one or more languages.
●Evaluate Connectivity: This requirement checks the existence of
paths between concepts or entities, i.e. whether it is possible to go
from a given concept to another on the graph and in what ways.

12
Goal-Driven Semantic Data Summarization

Example Data Operations

●Check the existence of a particular element (concept, relation,
attribute, instance, axiom) in the dataset.

●Check the dataset’s consistency (e.g. by running a reasoner).
●Measure the number of ambiguous entities in the dataset.

●Measure the number of labeled entities.

13
Goal-Driven Semantic Data Summarization

Application Example

●We applied our framework to assess the suitability of public datasets
for the purposes of reusing to semantically annotate texts describing
football matches from the Spanish League.
●For that, we wanted the dataset to be reused to
● Contain information about all the current teams of the Spanish
football league.
● All its entities to have at least one associated label and
● To relate teams with the players that current play in them.

14
Goal-Driven Semantic Data Summarization

Defined Summarization Task

15
Goal-Driven Semantic Data Summarization

Resulting Summary

● We executed this task against DBPedia and Freebase, automatically producing the
following summary report

● The system provides a yes/no answer as to whether each dataset satisfies each
requirement but also additional information on why this may or may not be the case.
● This is important because:

● A requirement might not be satisfied because of a high threshold
● A requirement might seem to be satisfied, yet that might not be actually true.

16
Ongoing Work

Summary Generation Tool
● We are currently developing a summarization tool that enables the definition
manipulation and execution of summarization tasks as well as the
dashboard-like visualization of their output

17
Thank you!

Questions?

Dr. Panos Alexopoulos
Semantic Applications Research
Manager

Quieres
innovar?

palexopoulos@isoco.com
(t)
+34 913 349 797

iSOCO Barcelona

iSOCO Madrid

iSOCO Pamplona

iSOCO Valencia

iSOCO Colombia

Av. Torre Blanca, 57
Edificio ESADE CREAPOLIS
Oficina 3C 15
08172 Sant Cugat del Vallès
Barcelona, España
(t) +34 935 677 200

Av. del Partenón, 16-18, 1º7ª
Campo de las Naciones
28042 Madrid
España
(t) +34 913 349 797

Parque Tomás
Caballero, 2, 6º4ª
31006 Pamplona
España
(t) +34 948 102 408

C/ Prof. Beltrán Báguena, 4
Oficina 107
46009 Valencia
España
(t) +34 963 467 143

Complejo Ruta N
Calle 67, 52-20
Piso 3, Torre A
Medellín
Colombia
(t) +57 516 7770 ext. 1132

Key Vendor
Virtual Assistant 2013

18

More Related Content

What's hot (20)

PDF
Sentiment Analysis of Feedback Data
ijtsrd
 
PDF
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
Journal For Research
 
DOCX
295B_Report_Sentiment_analysis
Zahid Azam
 
PDF
Methods for Sentiment Analysis: A Literature Study
vivatechijri
 
PDF
Comparative Study on Lexicon-based sentiment analysers over Negative sentiment
AI Publications
 
PDF
project sentiment analysis
sneha penmetsa
 
PDF
IRJET- Survey of Classification of Business Reviews using Sentiment Analysis
IRJET Journal
 
PDF
Bba q&a study final white
Greg Sterling
 
PDF
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
IRJET Journal
 
PPTX
Amazon Product Sentiment review
Lalit Jain
 
PDF
A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet
IJECEIAES
 
PDF
ACM Hypertext and Social Media Conference Tutorial on Knowledge-infused Deep ...
Artificial Intelligence Institute at UofSC
 
PDF
Supervised Sentiment Classification using DTDP algorithm
IJSRD
 
PDF
Sentiment Features based Analysis of Online Reviews
iosrjce
 
PPTX
Sentiment analysis
Makrand Patil
 
PDF
Amazon Product Review Sentiment Analysis with Machine Learning
ijtsrd
 
PDF
LSTM Based Sentiment Analysis
ijtsrd
 
PPTX
Sentiment analysis
Amenda Joy
 
PPTX
Comparative study of various approaches for transaction Fraud Detection using...
Pratibha Singh
 
PDF
Sentiment Analysis on Amazon Movie Reviews Dataset
Maham F'Rajput
 
Sentiment Analysis of Feedback Data
ijtsrd
 
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
Journal For Research
 
295B_Report_Sentiment_analysis
Zahid Azam
 
Methods for Sentiment Analysis: A Literature Study
vivatechijri
 
Comparative Study on Lexicon-based sentiment analysers over Negative sentiment
AI Publications
 
project sentiment analysis
sneha penmetsa
 
IRJET- Survey of Classification of Business Reviews using Sentiment Analysis
IRJET Journal
 
Bba q&a study final white
Greg Sterling
 
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
IRJET Journal
 
Amazon Product Sentiment review
Lalit Jain
 
A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet
IJECEIAES
 
ACM Hypertext and Social Media Conference Tutorial on Knowledge-infused Deep ...
Artificial Intelligence Institute at UofSC
 
Supervised Sentiment Classification using DTDP algorithm
IJSRD
 
Sentiment Features based Analysis of Online Reviews
iosrjce
 
Sentiment analysis
Makrand Patil
 
Amazon Product Review Sentiment Analysis with Machine Learning
ijtsrd
 
LSTM Based Sentiment Analysis
ijtsrd
 
Sentiment analysis
Amenda Joy
 
Comparative study of various approaches for transaction Fraud Detection using...
Pratibha Singh
 
Sentiment Analysis on Amazon Movie Reviews Dataset
Maham F'Rajput
 

Similar to Towards Purposeful Reuse of Semantic Datasets Through Goal-Driven Summarization (20)

PPTX
data summarization.pptx
Alsayed Algergawy
 
PPTX
Coping with Data Variety in the Big Data Era: The Semantic Computing Approach
Andre Freitas
 
PPTX
Era ofdataeconomyv4short
Jun Miyazaki
 
PDF
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...
IRJET Journal
 
PDF
Big dataprocessing cts2015
Artificial Intelligence Institute at UofSC
 
PPTX
Tldr
Narayana Murthy
 
PDF
Big Data Challenges and Trust Management at CTS -2016
Artificial Intelligence Institute at UofSC
 
PDF
On the Management, Analysis and Simulation of our LifeSteps
ytheodoridis
 
PDF
Model Evaluation in the land of Deep Learning
Pramit Choudhary
 
PDF
A Semantic Search Approach to Task-Completion Engines
Darío Garigliotti
 
PDF
Smart Data Webinar (SLIDES): Agile Enterprise Ontology
DATAVERSITY
 
PDF
Wims2012
Elena Simperl
 
PDF
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
Paolo Missier
 
PDF
Text Summarization and Conversion of Speech to Text
IRJET Journal
 
PDF
Building Social Life Networks 130818
Ramesh Jain
 
DOCX
Toward a System Building Agenda for Data Integration(and Dat.docx
juliennehar
 
PPTX
A Semantics-based Approach to Machine Perception
Artificial Intelligence Institute at UofSC
 
PPTX
A Semantics-based Approach to Machine Perception
Cory Andrew Henson
 
PDF
Automatic Text Summarization Using Natural Language Processing (1)
Don Dooley
 
data summarization.pptx
Alsayed Algergawy
 
Coping with Data Variety in the Big Data Era: The Semantic Computing Approach
Andre Freitas
 
Era ofdataeconomyv4short
Jun Miyazaki
 
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...
IRJET Journal
 
Big dataprocessing cts2015
Artificial Intelligence Institute at UofSC
 
Big Data Challenges and Trust Management at CTS -2016
Artificial Intelligence Institute at UofSC
 
On the Management, Analysis and Simulation of our LifeSteps
ytheodoridis
 
Model Evaluation in the land of Deep Learning
Pramit Choudhary
 
A Semantic Search Approach to Task-Completion Engines
Darío Garigliotti
 
Smart Data Webinar (SLIDES): Agile Enterprise Ontology
DATAVERSITY
 
Wims2012
Elena Simperl
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
Paolo Missier
 
Text Summarization and Conversion of Speech to Text
IRJET Journal
 
Building Social Life Networks 130818
Ramesh Jain
 
Toward a System Building Agenda for Data Integration(and Dat.docx
juliennehar
 
A Semantics-based Approach to Machine Perception
Artificial Intelligence Institute at UofSC
 
A Semantics-based Approach to Machine Perception
Cory Andrew Henson
 
Automatic Text Summarization Using Natural Language Processing (1)
Don Dooley
 
Ad

Recently uploaded (20)

PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
Digital Circuits, important subject in CS
contactparinay1
 
Ad

Towards Purposeful Reuse of Semantic Datasets Through Goal-Driven Summarization

  • 1. Towards Purposeful Reuse of Semantic Datasets via Goal-Driven Data Summarization Panos Alexopoulos, Jose Manuel Gomez Perez 6th International Conference on Advances in Semantic Processing Porto, Portugal, October 3rd, 2013
  • 2. Introduction The Linked Data Use Challenge Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ 2
  • 3. Introduction Motivating Scenario ●Assume that some entity (individual or organization) wants to reuse public semantic datasets from the Web to: ● Enrich with them its own data. ● Use the data to provide added-value services to its users/clients. ●These organizations can be: ● Technology providers (e.g. iSOCO) ● Information providers (e.g. publishers, media, etc.) ● Knowledge-driven and knowledge-intensive organisations 3
  • 4. Data Enrichment Why Data Reuse? ●The problem with semantic data is the high amount of time and effort required to construct and maintain it. ●The reuse of existing public semantic data can (partially) alleviate this problem: ● Their volume and diversity are increasing at high rates. ● Their maintenance and evolution is the responsibility of their publishers, reducing the required efforts and costs for this task in the organization's side 4
  • 5. Data Enrichment Example ●A news organization wants to create and maintain a knowledge base about European Football. ●The pace at which this knowledge changes is quite fast meaning that the organization needs to constantly monitor these changes and update the data. ●Much of this information is already available as public semantic data (e.g. DBPedia). ●Thus it could be better for the organization to reuse this public data instead of creating them from scratch. 5
  • 6. Data Enrichment Barriers to Data Reuse ● Difficulty for knowledge engineers to decide whether a given dataset is actually suitable for their needs. ● Semantic datasets typically cover diverse domains ● They do not follow a unified way of organizing the knowledge ● Differ in a number of features including size, coverage, granularity and descriptiveness ● This makes difficult the following tasks: ● Assessing whether a dataset satisfies particular requirements ● Comparing different datasets to select which one is more suitable for a given purpose. 6
  • 7. Data Reuse Our Approach ●We suggest the provision of the ability to data consumers to derive semantic data summaries. ●Existing summarization approaches treat the summarization task in an application and user independent way. ●By contrast, we are interested in facilitating the generation of requirements-oriented and goal-driven summaries that may be significantly more helpful to users. 7
  • 8. Goal-Driven Semantic Data Summarization Problem Description ●Key question: “Given an application scenario where semantic data is required, how suitable is a given existing dataset for the purposes of this scenario?” ●To answer this, users normally need to be able to: 1. Explicitly express the requirements that a dataset needs to satisfy for a given task or goal. 2. Automatically measure/assess the extent to which a dataset satisfies each of these requirements and compile a summary report. 8
  • 9. Goal-Driven Semantic Data Summarization Approach ●To implement these two capabilities we follow a checklist-based approach. ●Checklists are practically lists of action items arranged in a systematic manner that allow users to record the completion of each of them. ●They are widely applied across multiple industries, like healthcare or aviation, to ensure reliable and consistent execution of complex operations. ●In our case we apply checklists to define and execute custom dataset summarization tasks in the form of lists of goal-specific requirements and associated summarization processes. 9
  • 10. Goal-Driven Semantic Data Summarization Summarization Task Representation ●To represent custom summarization tasks according to the checklist paradigm we have adopted the Minim model. ●This defines the following information: ● The Goals the dataset summarization task is designed to serve ● The Requirements against which the summarization task evaluates the dataset. ● The Data Analysis Operations that the summarization task employs in order to assess the satisfaction of its requirements 10
  • 11. Goal-Driven Semantic Data Summarization Example Goals ●Decide if a dataset is appropriate for a Semantic Annotation scenario. ●Decide if a dataset is appropriate for a Question Answering scenario ●Determine which of two or more similar datasets best represent a given corpus. ●Detect arising inconsistencies or other quality problems. ●… 11
  • 12. Goal-Driven Semantic Data Summarization Example Requirements ●Evaluate the dataset’s coverage of a particular domain/topic: Aims to measure the extent to which a dataset describes a given domain or topic. ●Evaluate the dataset’s labeling adequacy and richness: Aims to measure the extent to which the dataset’s elements (concepts, instances, relations etc.) are accompanied by representative and comprehensible labels, in one or more languages. ●Evaluate Connectivity: This requirement checks the existence of paths between concepts or entities, i.e. whether it is possible to go from a given concept to another on the graph and in what ways. 12
  • 13. Goal-Driven Semantic Data Summarization Example Data Operations ●Check the existence of a particular element (concept, relation, attribute, instance, axiom) in the dataset. ●Check the dataset’s consistency (e.g. by running a reasoner). ●Measure the number of ambiguous entities in the dataset. ●Measure the number of labeled entities. 13
  • 14. Goal-Driven Semantic Data Summarization Application Example ●We applied our framework to assess the suitability of public datasets for the purposes of reusing to semantically annotate texts describing football matches from the Spanish League. ●For that, we wanted the dataset to be reused to ● Contain information about all the current teams of the Spanish football league. ● All its entities to have at least one associated label and ● To relate teams with the players that current play in them. 14
  • 15. Goal-Driven Semantic Data Summarization Defined Summarization Task 15
  • 16. Goal-Driven Semantic Data Summarization Resulting Summary ● We executed this task against DBPedia and Freebase, automatically producing the following summary report ● The system provides a yes/no answer as to whether each dataset satisfies each requirement but also additional information on why this may or may not be the case. ● This is important because: ● A requirement might not be satisfied because of a high threshold ● A requirement might seem to be satisfied, yet that might not be actually true. 16
  • 17. Ongoing Work Summary Generation Tool ● We are currently developing a summarization tool that enables the definition manipulation and execution of summarization tasks as well as the dashboard-like visualization of their output 17
  • 18. Thank you! Questions? Dr. Panos Alexopoulos Semantic Applications Research Manager Quieres innovar? [email protected] (t) +34 913 349 797 iSOCO Barcelona iSOCO Madrid iSOCO Pamplona iSOCO Valencia iSOCO Colombia Av. Torre Blanca, 57 Edificio ESADE CREAPOLIS Oficina 3C 15 08172 Sant Cugat del Vallès Barcelona, España (t) +34 935 677 200 Av. del Partenón, 16-18, 1º7ª Campo de las Naciones 28042 Madrid España (t) +34 913 349 797 Parque Tomás Caballero, 2, 6º4ª 31006 Pamplona España (t) +34 948 102 408 C/ Prof. Beltrán Báguena, 4 Oficina 107 46009 Valencia España (t) +34 963 467 143 Complejo Ruta N Calle 67, 52-20 Piso 3, Torre A Medellín Colombia (t) +57 516 7770 ext. 1132 Key Vendor Virtual Assistant 2013 18