SlideShare a Scribd company logo
DBGroup@UNIMO
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 1
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Fabio Benedetti
Department of Engineering “Enzo Ferrari”
University of Modena & Reggio Emilia
D-Day 2015 - Modena
DBGroup@UNIMO
3
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 3
[Schmachtenberg, Max, Christian Bizer, and Heiko Paulheim. "Adoption of the Linked Data Best Practices in
Different Topical Domains." The Semantic Web–ISWC 2014. Springer International Publishing, 2014. 245-260}
DBGroup@UNIMO
4
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 4
*Only 570 datasets belong to the LOD cloud,
the remaining datasets do not contain
ingoing/outgoing links to the LOD Cloud.
2009 2014*
Domain Number % Number %
Cross-domain 41 13.95% 41 4.04%
Geographic 31 10.54% 21 2.07%
Government 49 16.67% 183 18.05%
Life sciences 41 13.95% 83 8.19%
Media 25 8.50% 22 2.17%
Publications 87 29.59% 96 9.47%
Social web 0 0.00% 520 51.28%
User-generated
content 20 6.80% 48 4.73%
Total 294 1014
2009 Domain
Cross-domain
Geographic
Government
Life sciences
Media
Publications
Social web
2014
DBGroup@UNIMO
5
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 5
The Open Access trends encourage the
publication of Open Data in form of
Linked Data
But
discovering LOD sources of interest is a
complex task for a user
Main issues
• Do not exist any standard to document a Dataset
• The structure of the Dataset can be understood only
manually exploring the Dataset
• The Semantic Web technologies are extremely complex for
unskilled user
DBGroup@UNIMO
6
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 6
• To automatically extract and summarize a schema
(Schema Summary) able to describe a LOD Dataset
• Use the Schema Summary to support the user in the
information extraction task
Online & Automatic extraction
• It does not require any additional information by the user
• It works with SPARQL endpoints
– We have to handle the bad performance issues of these Datasets
The Schema Summary has to describe a Dataset
• Ontology/Vocabulary (OWL & RDFS constraints)
• Open Data (i.e. generated from existing RDBMS)
DBGroup@UNIMO
7
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 7
Two main modules
• Extraction & Summarization
• Visualization & Querying
LODeX uses a NoSQL
Database as back-end
Input
URLs of SPARQL endpoints
Output
Interactive Schema Summary
LOD Cloud
SPARQL
Queries
Schema
Summary
NoSQL
LODeX
Post-
processing
Statistical
Indexes
LODeX
Indexes
Extraction
Query
Orchestrator
Schema
Summary
Visualizzation
Schema
Summary
Basic
QueryResults
Endpoint
URLs
Sgvizler
SPARQL
Queries
DBGroup@UNIMO
8
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 8
Statistical Indexes
They are composed by 9 indexes divided in three groups:
• General group
• Intensional group
• Extensional group
The IE process is able to generate the SPARQL queries used to extract the
different indexes.
• Iterative algorithm able to extract the Intensional knowledge
• Pattern Strategy technique
– It is a technique able to produce an higher number of less complex
SPARQL query
The IE process is able to perform online index extraction handling the
performance issues of the SPARQL endpoints
[F. Benedetti, S. Bergamaschi, and L. Po, “Online index extraction from linked open data sources,” 2014, Linked Data for Information
Extraction (LD4IE) Workshop held at International Semantic Web Conference]
DBGroup@UNIMO
9
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 9
The elements composing the Schema Summary are:
• Classes
• Properties
• Attributes
An algorithm combines
the information
contained in the
Statistical Indexes to
produce and store the
Schema Summary
[F. Benedetti, S. Bergamaschi, and L. Po, “A visual summary for linked open data sources,” 2014, International
Semantic Web Conference (Posters & Demos)]
DBGroup@UNIMO
10
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 10
Schema
Summary
SPARQL
compiler
SPARQL
query
Basic
Query
• The User using the Web Application GUI is
driven to building a Basic Query
• A refinement panel helps the user in refine
the Basic Query
A SPARQL compiler automatically generates
the corresponding SPARQL query
Operator supported by the compiler:
• AND
• Optional
• Filter
The query is sent to the SPARQL endpoint
and the results can be visualized in a
tabular, maps or chart view (pie, bar, etc.)
• ORDER BY
• LIMIT
• OFFSET
DBGroup@UNIMO
11
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 11
DBGroup@UNIMO
12
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 12
Try LODeX demo at: http://dbgroup.unimo.it/lodex2
[F. Benedetti, S. Bergamaschi, and L. Po, “Visual Querying LOD sources with LODeX,” 2014, submitted at The
Semantic Web journal]
DBGroup@UNIMO
13
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 13
Test Nov. 2014
Dataset URLs 559
Reachable datasets 302
SPARQL 1.1
compatible
206
Extraction completed 185
Task Correct Answers
Schema Summary browsing 94% (32/34)
Query generation 88% (60/68)
Online survey with 17 anonymous
users:
• 8 Skilled users
• 9 Unskilled user
The survey is divided in two parts:
• Schema Summary browsing
clarity
• Query generation
DBGroup@UNIMO
14
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 14
• Modify the interface of LODeX according to the
results of the online survey
• Extends the VOID descriptor vocabulary in order
to represent the Statistical Indexes and publish our
data as LOD
– Build an observatory for the LOD cloud
• Define clustering techniques to reduce the size of
the Summary for huge dataset
DBGroup@UNIMO
15
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 15
Accepted papers
• Beneventano, D., Bergamaschi, S., Sorrentino, S., Vincini, M., Benedetti, F. “Semantic
annotation of the CEREALAB database by the AGROVOC linked dataset” (2014)
Ecological Informatics journal, . Article in Press.
• F. Benedetti, S. Bergamaschi, and L. Po, “Online index extraction from linked open
data sources” 2014, Linked Data for Information Extraction (LD4IE) Workshop held at
International Semantic Web Conference
• F. Benedetti, S. Bergamaschi, and L. Po, “A visual summary for linked open data
sources” 2014, International Semantic Web Conference (Posters & Demos)
Submitted papers
• F. Benedetti, S. Bergamaschi, and L. Po, “Visual Querying LOD sources with LODeX”
2014, submitted at Semantic Web – Interoperability, Usability, Applicability an IOS
Press Journal
European projects & schools
• Web Science Summer School - Southampton University (20-26 July 2014)
• RDA Research Data Alliance - RDA Fourth Plenary Meeting 22 - 24 September 2014 in
Amsterdam. I won an Early Career Scientist grant and I belong to the Big Data
Analytics Interest group.
• Keystone - COST Action IC1302. Autumn 2014 MC and WG Meetings “QUERYING THE
SEMANTIC WEB” 17-18 October 2014, Riva del Garda, TN.
DBGroup@UNIMO
16
Dot. Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
D Day 2015 – Modena Italy
LODeX: Schema Summarization and automatic SPARQL query
generation for Linked Open Data sources
Thanks for your attention!

More Related Content

Similar to LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources​ (20)

PPTX
Visual Querying LOD sources with LODeX
Fabio Benedetti
 
PDF
Linked Open Data Visualization
Laura Po
 
PPTX
Wi2015 - Clustering of Linked Open Data - the LODeX tool
Laura Po
 
PPT
Lod2
STIinnsbruck
 
PPTX
The Web of Linked Data and its information
Alberto Nogales
 
PPTX
Session 1 and 2 "Challenges and Opportunities with Big Linked Data Visualiza...
Laura Po
 
PPTX
Linked Open Data and Applications
Victor de Boer
 
PPTX
4V - WP3 Progress Report (TIN2013-46238)
Nandana Mihindukulasooriya
 
PDF
Visualize open data with Plone - eea.daviz PLOG 2013
Antonio De Marinis
 
PDF
STI Summit 2011 - Limits of LOD
Semantic Technology Institute International
 
PDF
Linked Data
Anja Jentzsch
 
PPTX
LD4KD 2015 - Demos and tools
Vrije Universiteit Amsterdam
 
PPTX
Legislative data portals and linked data quality
Jose Emilio Labra Gayo
 
PPTX
Towards an architecture and adoption process for Linked Data technologies in ...
Jose Emilio Labra Gayo
 
PPTX
Publishing and Using Linked Open Data - Day 2
Richard Urban
 
PDF
Linked Open Graph: browsing multiple SPARQL entry points to build your own LO...
Paolo Nesi
 
PDF
ALLDATA 2015 - RDF Based Linked Data Management as a DaaS Platform
Seonho Kim
 
PDF
Knowledge discoverylaurahollink
SSSW
 
PPTX
How To Make Linked Data More than Data
Artificial Intelligence Institute at UofSC
 
PPTX
How To Make Linked Data More than Data
Amit Sheth
 
Visual Querying LOD sources with LODeX
Fabio Benedetti
 
Linked Open Data Visualization
Laura Po
 
Wi2015 - Clustering of Linked Open Data - the LODeX tool
Laura Po
 
The Web of Linked Data and its information
Alberto Nogales
 
Session 1 and 2 "Challenges and Opportunities with Big Linked Data Visualiza...
Laura Po
 
Linked Open Data and Applications
Victor de Boer
 
4V - WP3 Progress Report (TIN2013-46238)
Nandana Mihindukulasooriya
 
Visualize open data with Plone - eea.daviz PLOG 2013
Antonio De Marinis
 
STI Summit 2011 - Limits of LOD
Semantic Technology Institute International
 
Linked Data
Anja Jentzsch
 
LD4KD 2015 - Demos and tools
Vrije Universiteit Amsterdam
 
Legislative data portals and linked data quality
Jose Emilio Labra Gayo
 
Towards an architecture and adoption process for Linked Data technologies in ...
Jose Emilio Labra Gayo
 
Publishing and Using Linked Open Data - Day 2
Richard Urban
 
Linked Open Graph: browsing multiple SPARQL entry points to build your own LO...
Paolo Nesi
 
ALLDATA 2015 - RDF Based Linked Data Management as a DaaS Platform
Seonho Kim
 
Knowledge discoverylaurahollink
SSSW
 
How To Make Linked Data More than Data
Artificial Intelligence Institute at UofSC
 
How To Make Linked Data More than Data
Amit Sheth
 

Recently uploaded (20)

PPTX
ION EXCHANGE CHROMATOGRAPHY NEW PPT (JA).pptx
adhagalejotshna
 
PDF
Plant growth promoting bacterial non symbiotic
psuvethapalani
 
DOCX
Analytical methods in CleaningValidation.docx
Markus Janssen
 
PDF
Calcium in a supernova remnant as a fingerprint of a sub-Chandrasekhar-mass e...
Sérgio Sacani
 
PDF
BlackBody Radiation experiment report.pdf
Ghadeer Shaabna
 
PDF
Plankton and Fisheries Bovas Joel Notes.pdf
J. Bovas Joel BFSc
 
PPTX
3-measurement-161127184347.pptx in science
AizaRazonado
 
PDF
Webinar: World's Smallest Pacemaker
Scintica Instrumentation
 
DOCX
Critical Book Review (CBR) - "Hate Speech: Linguistic Perspectives"
Sahmiral Amri Rajagukguk
 
PPTX
Systamatic Acquired Resistence (SAR).pptx
giriprasanthmuthuraj
 
PDF
The ALMA-CRISTAL survey: Gas, dust, and stars in star-forming galaxies when t...
Sérgio Sacani
 
PDF
soil and environmental microbiology.pdf
Divyaprabha67
 
PDF
Carbonate formation and fluctuating habitability on Mars
Sérgio Sacani
 
PDF
EXploring Nanobiotechnology: Bridging Nanoscience and Biology for real world ...
Aamena3
 
DOCX
Paper - Taboo Language (Makalah Presentasi)
Sahmiral Amri Rajagukguk
 
PPTX
770043401-q1-Ppt-pe-and-Health-7-week-1-lesson-1.pptx
AizaRazonado
 
PDF
RANKING THE MICRO LEVEL CRITICAL FACTORS OF ELECTRONIC MEDICAL RECORDS ADOPTI...
hiij
 
PPTX
Raising awareness on the story beyond the surface. A case study on the signif...
Kristel Wautier
 
PDF
The emergence of galactic thin and thick discs across cosmic history
Sérgio Sacani
 
PPTX
CNS.pptx Central nervous system meninges ventricles of brain it's structure a...
Ashwini I Chuncha
 
ION EXCHANGE CHROMATOGRAPHY NEW PPT (JA).pptx
adhagalejotshna
 
Plant growth promoting bacterial non symbiotic
psuvethapalani
 
Analytical methods in CleaningValidation.docx
Markus Janssen
 
Calcium in a supernova remnant as a fingerprint of a sub-Chandrasekhar-mass e...
Sérgio Sacani
 
BlackBody Radiation experiment report.pdf
Ghadeer Shaabna
 
Plankton and Fisheries Bovas Joel Notes.pdf
J. Bovas Joel BFSc
 
3-measurement-161127184347.pptx in science
AizaRazonado
 
Webinar: World's Smallest Pacemaker
Scintica Instrumentation
 
Critical Book Review (CBR) - "Hate Speech: Linguistic Perspectives"
Sahmiral Amri Rajagukguk
 
Systamatic Acquired Resistence (SAR).pptx
giriprasanthmuthuraj
 
The ALMA-CRISTAL survey: Gas, dust, and stars in star-forming galaxies when t...
Sérgio Sacani
 
soil and environmental microbiology.pdf
Divyaprabha67
 
Carbonate formation and fluctuating habitability on Mars
Sérgio Sacani
 
EXploring Nanobiotechnology: Bridging Nanoscience and Biology for real world ...
Aamena3
 
Paper - Taboo Language (Makalah Presentasi)
Sahmiral Amri Rajagukguk
 
770043401-q1-Ppt-pe-and-Health-7-week-1-lesson-1.pptx
AizaRazonado
 
RANKING THE MICRO LEVEL CRITICAL FACTORS OF ELECTRONIC MEDICAL RECORDS ADOPTI...
hiij
 
Raising awareness on the story beyond the surface. A case study on the signif...
Kristel Wautier
 
The emergence of galactic thin and thick discs across cosmic history
Sérgio Sacani
 
CNS.pptx Central nervous system meninges ventricles of brain it's structure a...
Ashwini I Chuncha
 
Ad

LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources​

  • 1. DBGroup@UNIMO Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 1 D Day 2015 – Modena Italy LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources Fabio Benedetti Department of Engineering “Enzo Ferrari” University of Modena & Reggio Emilia D-Day 2015 - Modena
  • 2. DBGroup@UNIMO 3 Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia D Day 2015 – Modena Italy LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 3 [Schmachtenberg, Max, Christian Bizer, and Heiko Paulheim. "Adoption of the Linked Data Best Practices in Different Topical Domains." The Semantic Web–ISWC 2014. Springer International Publishing, 2014. 245-260}
  • 3. DBGroup@UNIMO 4 Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia D Day 2015 – Modena Italy LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 4 *Only 570 datasets belong to the LOD cloud, the remaining datasets do not contain ingoing/outgoing links to the LOD Cloud. 2009 2014* Domain Number % Number % Cross-domain 41 13.95% 41 4.04% Geographic 31 10.54% 21 2.07% Government 49 16.67% 183 18.05% Life sciences 41 13.95% 83 8.19% Media 25 8.50% 22 2.17% Publications 87 29.59% 96 9.47% Social web 0 0.00% 520 51.28% User-generated content 20 6.80% 48 4.73% Total 294 1014 2009 Domain Cross-domain Geographic Government Life sciences Media Publications Social web 2014
  • 4. DBGroup@UNIMO 5 Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia D Day 2015 – Modena Italy LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 5 The Open Access trends encourage the publication of Open Data in form of Linked Data But discovering LOD sources of interest is a complex task for a user Main issues • Do not exist any standard to document a Dataset • The structure of the Dataset can be understood only manually exploring the Dataset • The Semantic Web technologies are extremely complex for unskilled user
  • 5. DBGroup@UNIMO 6 Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia D Day 2015 – Modena Italy LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 6 • To automatically extract and summarize a schema (Schema Summary) able to describe a LOD Dataset • Use the Schema Summary to support the user in the information extraction task Online & Automatic extraction • It does not require any additional information by the user • It works with SPARQL endpoints – We have to handle the bad performance issues of these Datasets The Schema Summary has to describe a Dataset • Ontology/Vocabulary (OWL & RDFS constraints) • Open Data (i.e. generated from existing RDBMS)
  • 6. DBGroup@UNIMO 7 Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia D Day 2015 – Modena Italy LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 7 Two main modules • Extraction & Summarization • Visualization & Querying LODeX uses a NoSQL Database as back-end Input URLs of SPARQL endpoints Output Interactive Schema Summary LOD Cloud SPARQL Queries Schema Summary NoSQL LODeX Post- processing Statistical Indexes LODeX Indexes Extraction Query Orchestrator Schema Summary Visualizzation Schema Summary Basic QueryResults Endpoint URLs Sgvizler SPARQL Queries
  • 7. DBGroup@UNIMO 8 Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia D Day 2015 – Modena Italy LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 8 Statistical Indexes They are composed by 9 indexes divided in three groups: • General group • Intensional group • Extensional group The IE process is able to generate the SPARQL queries used to extract the different indexes. • Iterative algorithm able to extract the Intensional knowledge • Pattern Strategy technique – It is a technique able to produce an higher number of less complex SPARQL query The IE process is able to perform online index extraction handling the performance issues of the SPARQL endpoints [F. Benedetti, S. Bergamaschi, and L. Po, “Online index extraction from linked open data sources,” 2014, Linked Data for Information Extraction (LD4IE) Workshop held at International Semantic Web Conference]
  • 8. DBGroup@UNIMO 9 Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia D Day 2015 – Modena Italy LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 9 The elements composing the Schema Summary are: • Classes • Properties • Attributes An algorithm combines the information contained in the Statistical Indexes to produce and store the Schema Summary [F. Benedetti, S. Bergamaschi, and L. Po, “A visual summary for linked open data sources,” 2014, International Semantic Web Conference (Posters & Demos)]
  • 9. DBGroup@UNIMO 10 Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia D Day 2015 – Modena Italy LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 10 Schema Summary SPARQL compiler SPARQL query Basic Query • The User using the Web Application GUI is driven to building a Basic Query • A refinement panel helps the user in refine the Basic Query A SPARQL compiler automatically generates the corresponding SPARQL query Operator supported by the compiler: • AND • Optional • Filter The query is sent to the SPARQL endpoint and the results can be visualized in a tabular, maps or chart view (pie, bar, etc.) • ORDER BY • LIMIT • OFFSET
  • 10. DBGroup@UNIMO 11 Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia D Day 2015 – Modena Italy LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 11
  • 11. DBGroup@UNIMO 12 Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia D Day 2015 – Modena Italy LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 12 Try LODeX demo at: http://dbgroup.unimo.it/lodex2 [F. Benedetti, S. Bergamaschi, and L. Po, “Visual Querying LOD sources with LODeX,” 2014, submitted at The Semantic Web journal]
  • 12. DBGroup@UNIMO 13 Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia D Day 2015 – Modena Italy LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 13 Test Nov. 2014 Dataset URLs 559 Reachable datasets 302 SPARQL 1.1 compatible 206 Extraction completed 185 Task Correct Answers Schema Summary browsing 94% (32/34) Query generation 88% (60/68) Online survey with 17 anonymous users: • 8 Skilled users • 9 Unskilled user The survey is divided in two parts: • Schema Summary browsing clarity • Query generation
  • 13. DBGroup@UNIMO 14 Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia D Day 2015 – Modena Italy LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 14 • Modify the interface of LODeX according to the results of the online survey • Extends the VOID descriptor vocabulary in order to represent the Statistical Indexes and publish our data as LOD – Build an observatory for the LOD cloud • Define clustering techniques to reduce the size of the Summary for huge dataset
  • 14. DBGroup@UNIMO 15 Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia D Day 2015 – Modena Italy LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 15 Accepted papers • Beneventano, D., Bergamaschi, S., Sorrentino, S., Vincini, M., Benedetti, F. “Semantic annotation of the CEREALAB database by the AGROVOC linked dataset” (2014) Ecological Informatics journal, . Article in Press. • F. Benedetti, S. Bergamaschi, and L. Po, “Online index extraction from linked open data sources” 2014, Linked Data for Information Extraction (LD4IE) Workshop held at International Semantic Web Conference • F. Benedetti, S. Bergamaschi, and L. Po, “A visual summary for linked open data sources” 2014, International Semantic Web Conference (Posters & Demos) Submitted papers • F. Benedetti, S. Bergamaschi, and L. Po, “Visual Querying LOD sources with LODeX” 2014, submitted at Semantic Web – Interoperability, Usability, Applicability an IOS Press Journal European projects & schools • Web Science Summer School - Southampton University (20-26 July 2014) • RDA Research Data Alliance - RDA Fourth Plenary Meeting 22 - 24 September 2014 in Amsterdam. I won an Early Career Scientist grant and I belong to the Big Data Analytics Interest group. • Keystone - COST Action IC1302. Autumn 2014 MC and WG Meetings “QUERYING THE SEMANTIC WEB” 17-18 October 2014, Riva del Garda, TN.
  • 15. DBGroup@UNIMO 16 Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia D Day 2015 – Modena Italy LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources Thanks for your attention!