SlideShare a Scribd company logo
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Institute AIFB
www.kit.edu
Linked Data and Services
Andreas Harth and Barry Norton
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Outline
Motivation
Linked Data Principles
Query Processing over Linked Data
Linked Data Services (LIDS) and Linked Open
Services (LOS)
Conclusion
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Motivation
Semantic Web/Linked Data technologies are well-suited
for data integration
30.01.2015
Data
Integration
Interactive Data
Exploration
Common Data
Format/Access
Protocol
!?
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Linked Data Principles*
1. Use URIs to name things; not only documents, but
also people, locations, concepts, etc.
2. To enable agents (human users and machine agents
alike) to look up those names, use HTTP URIs
3. When someone looks up a URI we provide useful
information; with 'useful' in the strict sense we usually
mean structured data in RDF.
4. Include links to other URIs allowing agents (machines
and humans) to discover more things
(*) http://www.w3.org/DesignIssues/LinkedData.html
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Correspondence between thing-URI and
source-URI
5
User Agent
Web Server
http://www.polleres.net/foaf.rdf#me
http://www.polleres.net/foaf.rdf
HTTP
GET
RDF
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Correspondence between thing-URI and
source-URI
6
User Agent
Web Server
http://dbpedia.org/resource/Gordon_Brown
http://dbpedia.org/data/Gordon_Brown
HTTP
GET
303 HTTP
GET
RDF
http://dbpedia.org/page/Gordon_Brown
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Queries over Linked Data
SELECT ?f ?n WHERE {
an:f#ah foaf:knows ?f.
?f foaf:name ?n.
}
?f ?n
SELECT ?x1 ?x2 WHERE {
dblppub:HoganHP08 dc:creator ?a1.
?x1 owl:sameAs ?a1.
?x2 foaf:knows ?x1.
}
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Data warehousing or materialisation-based approaches
(MAT)
Querying Data Across Sources
9 15.03.2010
CRAWL INDEX SERVE
SELECT *
FROM…
R S
Distributed query processing approaches (DQP)
R S
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
DQP on Linked Data
10 15.03.2010
SELECT *
FROM…
R S
R S
SELECT ?s
WHERE…
TP TP
TP TP
HTTP
GET
HTTP
GET
ODBCODBC
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Query Processing Overview
Andreas Harth
Data Summaries for On-Demand Queries over Linked Data
11 15.03.2010
TP
(an:f#ah foaf:knows ?f)
SELECT ?f ?n WHERE {
an:f#ah foaf:knows ?f.
?f foaf:name ?n.
}
TP
(?f foaf:name ?n)
?f ?n
http://danbri.org/foaf.rdf#danbri Dan Brickley
Select
source(s)
Select
source(s)
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Problem: Source Selection for Triple Patterns
12 15.03.2010
(?s ?p ?o)
(#s ?p ?o)
(?s #p ?o)
(?s ?p #o)
(#s #p ?o)
(#s ?p #o)
(?s #p #o)
(#s #p #o)
Given a triple pattern, which source can contribute bindings
for the triple pattern?
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Keep index of properties and/or classes contained in
sources
(?s #p ?o), (?s rdf:type #o)
Covers only queries containing schema-level elements
Commonly used properties select potentially too many
sources
Schema-Level Indices [Stuckenschmidt et al.
2004]
13 15.03.2010
SELECT ?f ?n WHERE {
an:f#ah foaf:knows ?f.
?f foaf:name ?n.
}
SELECT ?x1 ?x2 WHERE {
dblppub:HoganHP08 dc:creator ?a1.
?x1 owl:sameAs ?a1.
?x2 foaf:knows ?x1.
}
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Exploits correspondence between thing-URI and source-URI
Linked Data sources (aka RDF files) return typically triples with a
subject corresponding to the source
Sometimes the sources return triples with object corresponding to the
source
(#s ?p ?o), (#s #p ?o), (#s #p #o)
(?s ?p #o), (?s #p #o)
Incomplete wrt. patterns but also wrt. to URI reuse across sources
Limited parallelism, unclear how to schedule lookups
Direct Lookup (DL) [Hartig et al. 2009]
SELECT ?f ?n WHERE {
an:f#ah foaf:knows ?f.
?f foaf:name ?n.
}
SELECT ?x1 ?x2 WHERE {
dblppub:HoganHP08 dc:creator ?a1.
?x1 owl:sameAs ?a1.
?x2 foaf:knows ?x1.
}
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Combined description of schema-level and instance-level
Use approximation to reduce index size (incurs false positives)
Possible to use entire query for source selection
Parallel lookups since sources can be determined for the entire query
(?s ?p ?o), (#s ?p ?o), (?s #p ?o), (?s ?p #o), (#s #p
?o), (#s ?p #o), (?s #p #o), (#s #p #o)
and combinations of triple patterns
Approximate Data Summaries
15 15.03.2010
SELECT ?f ?n WHERE {
an:f#ah foaf:knows ?f.
?f foaf:name ?n.
}
SELECT ?x1 ?x2 WHERE {
dblppub:HoganHP08 dc:creator ?a1.
?x1 owl:sameAs ?a1.
?x2 foaf:knows ?x1.
}
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Implementation
Deploy wrappers „in the cloud“
Google App Engine: hosting of Java and Python
webapps on Google’s Cloud infrastructure
Limited amount of processing time (6hrs/day)
Single-threaded applications
Suited for deploying wrappers
e.g. http://twitter2foaf.appspot.com/ converts Twitter
user data to RDF
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Linking Open Data Cloud 2007
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Linking Open Data Cloud 2008
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Linking Open Data Cloud 2009
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Linking Open Data Cloud 2010
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Geonames Services
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Geonames Services
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Geonames Services
{"weatherObservation":
{"clouds":"broken clouds",
"weatherCondition":"drizzle",
"observation":"LESO 251300Z 03007KT
340V040 CAVOK 23/15 Q1010",
"windDirection":30,
"ICAO":"LESO", ...
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
{"weatherObservation":
{"clouds":"broken clouds",
"weatherCondition":"drizzle",
"observation":"LESO 251300Z 03007KT
340V040 CAVOK 23/15 Q1010",
"windDirection":30,
"ICAO":"LESO", ...
Geonames Services
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Linked Open Service Principles
REST Principles
1. Application state and functionality is divided into resources
2. Every resource is uniquely addressable
3. All resources share a uniform interface:
a) A constrained set of well-defined operations
b) A constrained set of content types
Linked Data Principles
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful information, using
the standards (RDF*, SPARQL)
4. Include links to other URIs. so that they can discover more things.
Linked Open Service Principles
1. Describe services as LOD prosumers with input and output
descriptions as SPARQL graph patterns
2. Communicate RDF by RESTful content negotiation
3. The output should make explicit its relation with the input
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
LOS Weather Service
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
LOS Geo Resources
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Resource-Based Linked Open Services
GET
Accept: text/html
303 REDIRECT /page
GET
Accept:
application/rdf+xml
(or text/n3)
303 REDIRECT /data
LinkedDataLinkedService
GET /weather
Accept:
application/rdf+xml
(or text/n3)
200 <rdf:Description>
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Interlinking Data with Data from Services?
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Linked Data Services
We’d like to integrate data services with Linked Data
1. LIDS need to adhere to Linked Data principles
We’d like to use data services in software programs
2. LIDS need machine-readable descriptions of input and
output
Compared to naïve approach: assign URI to service output
Relationship between input and output is explicitly
described
Dynamicity is supported
Multiple or no output resources can be linked to input
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Interlink LIDS and Linked Data
Generate service URIs
with input bindings, from
evaluating :
select Xi where Ti
sameAs: binding for i
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Query Answering using LIDS and Linked Data
Query execution resolves
URIs
=> enlarges data set
LIDS are interlinked
Query is executed again
on new data set
Repeat until no new links
or no new data
Combine results
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Experiment: Query Answering
Input:
List of 562 (potential) universities from Facebook Graph API
Output:
Facebook fans and DBpedia student numbers for 104 universities
PREFIX u: <http://openlids.org/universities.rdf#> SELECT ?n
?f ?s WHERE {
u:list foaf:topic ?u . ?u foaf:name ?n .
?u og:fan_count ?f .?u d:numberOfStudents ?s }
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Linked * Services and PlanetData
Several areas seem likely to produce services:
Stream, inc. Sensor, resources (latest values)
Any others exposing dynamic resources
Dynamic computations, inc. on-the-fly quality
assessments
Other areas seem likely to consider service
technologies and move towards more service-like
HTTP interactions
Access control (OpenID, OAuth, etc.)
Finally, remaining areas could serve to complement
LIDS/LOS alignment
Provenance

More Related Content

What's hot (15)

PPTX
EDF2012 Peter Boncz - LOD benchmarking SRbench
European Data Forum
 
PDF
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Vladimir Alexiev, PhD, PMP
 
PDF
Linked Logainm: Enhancing Library Metadata using Linked Data of Irish Place N...
nunoalexandrelopes
 
PDF
Framester: A Wide Coverage Linguistic Linked Data Hub
Mehwish Alam
 
PPTX
Exploiting PubChem for drug discovery based on natural products
Sunghwan Kim
 
PPT
Annotating Digital Texts in the Brown University Library
Timothy Cole
 
PDF
Data Science Amsterdam - Massively Parallel Processing with Procedural Languages
Ian Huston
 
PDF
Labels in the web of data
Basil Ell
 
PPTX
Democratizing Big Semantic Data management
WU (Vienna University of Economics and Business)
 
PPTX
FAIR History and the Future
Carole Goble
 
PDF
Interactive Knowledge Discovery over Web of Data.
Mehwish Alam
 
PDF
Range Query on Big Data Based on Map Reduce
IJMER
 
PDF
ParlBench: a SPARQL-benchmark for electronic publishing applications.
Tatiana Tarasova
 
PPTX
14. Files - Data Structures using C++ by Varsha Patil
widespreadpromotion
 
PPTX
15. STL - Data Structures using C++ by Varsha Patil
widespreadpromotion
 
EDF2012 Peter Boncz - LOD benchmarking SRbench
European Data Forum
 
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Vladimir Alexiev, PhD, PMP
 
Linked Logainm: Enhancing Library Metadata using Linked Data of Irish Place N...
nunoalexandrelopes
 
Framester: A Wide Coverage Linguistic Linked Data Hub
Mehwish Alam
 
Exploiting PubChem for drug discovery based on natural products
Sunghwan Kim
 
Annotating Digital Texts in the Brown University Library
Timothy Cole
 
Data Science Amsterdam - Massively Parallel Processing with Procedural Languages
Ian Huston
 
Labels in the web of data
Basil Ell
 
Democratizing Big Semantic Data management
WU (Vienna University of Economics and Business)
 
FAIR History and the Future
Carole Goble
 
Interactive Knowledge Discovery over Web of Data.
Mehwish Alam
 
Range Query on Big Data Based on Map Reduce
IJMER
 
ParlBench: a SPARQL-benchmark for electronic publishing applications.
Tatiana Tarasova
 
14. Files - Data Structures using C++ by Varsha Patil
widespreadpromotion
 
15. STL - Data Structures using C++ by Varsha Patil
widespreadpromotion
 

Similar to Linked Data and Services (20)

PDF
Linked Data and Sevices
PlanetData Network of Excellence
 
PDF
20110728 datalift-rpi-troy
François Scharffe
 
PPTX
RDF-Gen: Generating RDF from streaming and archival data
Giorgos Santipantakis
 
PDF
STI Summit 2011 - Linked data-services-streams
Semantic Technology Institute International
 
PPTX
The Statistics of Stairway to Heaven: A Semantic Story About Digital Humanities
Albert Meroño-Peñuela
 
PPTX
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
Armin Haller
 
PDF
Linked Data Basics
Anja Jentzsch
 
PDF
Linked Data Management
Marin Dimitrov
 
PDF
Linked Open Data
Laura Hollink
 
PDF
Clinical Quality Linked Data on health.data.gov
George Thomas
 
PDF
The state of the art in Linked Data
Joshua Shinavier
 
PPTX
Searching Linked Data
Thanh Tran
 
KEY
Biodiversity Informatics on the Semantic Web
University of Wisconsin-Madison
 
PDF
What is New in W3C land?
Ivan Herman
 
PDF
Finding Data Sets
Anja Jentzsch
 
PDF
Open data and linked data
Marie Gustafsson Friberger
 
PDF
Where is the World is my Open Government Data?
Rensselaer Polytechnic Institute
 
PPTX
Cogapp Open Studios 2012 - Adventures with Linked Data
Cogapp
 
PDF
Aaai2012
Elena Simperl
 
PPTX
Why do they call it Linked Data when they want to say...?
Oscar Corcho
 
Linked Data and Sevices
PlanetData Network of Excellence
 
20110728 datalift-rpi-troy
François Scharffe
 
RDF-Gen: Generating RDF from streaming and archival data
Giorgos Santipantakis
 
STI Summit 2011 - Linked data-services-streams
Semantic Technology Institute International
 
The Statistics of Stairway to Heaven: A Semantic Story About Digital Humanities
Albert Meroño-Peñuela
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
Armin Haller
 
Linked Data Basics
Anja Jentzsch
 
Linked Data Management
Marin Dimitrov
 
Linked Open Data
Laura Hollink
 
Clinical Quality Linked Data on health.data.gov
George Thomas
 
The state of the art in Linked Data
Joshua Shinavier
 
Searching Linked Data
Thanh Tran
 
Biodiversity Informatics on the Semantic Web
University of Wisconsin-Madison
 
What is New in W3C land?
Ivan Herman
 
Finding Data Sets
Anja Jentzsch
 
Open data and linked data
Marie Gustafsson Friberger
 
Where is the World is my Open Government Data?
Rensselaer Polytechnic Institute
 
Cogapp Open Studios 2012 - Adventures with Linked Data
Cogapp
 
Aaai2012
Elena Simperl
 
Why do they call it Linked Data when they want to say...?
Oscar Corcho
 
Ad

More from Barry Norton (16)

PPTX
Knowledge Graphs and Milestone
Barry Norton
 
PPTX
ResearchSpace Platform in Use
Barry Norton
 
PPTX
GRAVITATE Search
Barry Norton
 
PPTX
ResearchSpace Collaborative Features
Barry Norton
 
PDF
Book of the Dead Project
Barry Norton
 
PPTX
Data Culture / Culture Data
Barry Norton
 
PDF
Querying Cultural Heritage
Barry Norton
 
PDF
A Data API with Security and Graph-Level Access Control
Barry Norton
 
PDF
GLAMorous LOD and ResearchSpace introduction
Barry Norton
 
PDF
GLAMorous LOD
Barry Norton
 
PDF
Linked Data, Ontologies and Inference
Barry Norton
 
PDF
Integrating Drupal with a Triple Store
Barry Norton
 
PPTX
Crowdsourcing tasks in Linked Data management
Barry Norton
 
PPT
Towards Linked Open Services and Processes
Barry Norton
 
PPTX
Geospatial Linked Open Services
Barry Norton
 
PPTX
Linked Open Services @ SemData2010
Barry Norton
 
Knowledge Graphs and Milestone
Barry Norton
 
ResearchSpace Platform in Use
Barry Norton
 
GRAVITATE Search
Barry Norton
 
ResearchSpace Collaborative Features
Barry Norton
 
Book of the Dead Project
Barry Norton
 
Data Culture / Culture Data
Barry Norton
 
Querying Cultural Heritage
Barry Norton
 
A Data API with Security and Graph-Level Access Control
Barry Norton
 
GLAMorous LOD and ResearchSpace introduction
Barry Norton
 
GLAMorous LOD
Barry Norton
 
Linked Data, Ontologies and Inference
Barry Norton
 
Integrating Drupal with a Triple Store
Barry Norton
 
Crowdsourcing tasks in Linked Data management
Barry Norton
 
Towards Linked Open Services and Processes
Barry Norton
 
Geospatial Linked Open Services
Barry Norton
 
Linked Open Services @ SemData2010
Barry Norton
 
Ad

Recently uploaded (20)

PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 

Linked Data and Services

  • 1. KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Institute AIFB www.kit.edu Linked Data and Services Andreas Harth and Barry Norton
  • 2. KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Outline Motivation Linked Data Principles Query Processing over Linked Data Linked Data Services (LIDS) and Linked Open Services (LOS) Conclusion
  • 3. KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Motivation Semantic Web/Linked Data technologies are well-suited for data integration 30.01.2015 Data Integration Interactive Data Exploration Common Data Format/Access Protocol !?
  • 4. KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Linked Data Principles* 1. Use URIs to name things; not only documents, but also people, locations, concepts, etc. 2. To enable agents (human users and machine agents alike) to look up those names, use HTTP URIs 3. When someone looks up a URI we provide useful information; with 'useful' in the strict sense we usually mean structured data in RDF. 4. Include links to other URIs allowing agents (machines and humans) to discover more things (*) http://www.w3.org/DesignIssues/LinkedData.html
  • 5. KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Correspondence between thing-URI and source-URI 5 User Agent Web Server http://www.polleres.net/foaf.rdf#me http://www.polleres.net/foaf.rdf HTTP GET RDF
  • 6. KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Correspondence between thing-URI and source-URI 6 User Agent Web Server http://dbpedia.org/resource/Gordon_Brown http://dbpedia.org/data/Gordon_Brown HTTP GET 303 HTTP GET RDF http://dbpedia.org/page/Gordon_Brown
  • 7. KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
  • 8. KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Queries over Linked Data SELECT ?f ?n WHERE { an:f#ah foaf:knows ?f. ?f foaf:name ?n. } ?f ?n SELECT ?x1 ?x2 WHERE { dblppub:HoganHP08 dc:creator ?a1. ?x1 owl:sameAs ?a1. ?x2 foaf:knows ?x1. }
  • 9. KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Data warehousing or materialisation-based approaches (MAT) Querying Data Across Sources 9 15.03.2010 CRAWL INDEX SERVE SELECT * FROM… R S Distributed query processing approaches (DQP) R S
  • 10. KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association DQP on Linked Data 10 15.03.2010 SELECT * FROM… R S R S SELECT ?s WHERE… TP TP TP TP HTTP GET HTTP GET ODBCODBC
  • 11. KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Query Processing Overview Andreas Harth Data Summaries for On-Demand Queries over Linked Data 11 15.03.2010 TP (an:f#ah foaf:knows ?f) SELECT ?f ?n WHERE { an:f#ah foaf:knows ?f. ?f foaf:name ?n. } TP (?f foaf:name ?n) ?f ?n http://danbri.org/foaf.rdf#danbri Dan Brickley Select source(s) Select source(s)
  • 12. KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Problem: Source Selection for Triple Patterns 12 15.03.2010 (?s ?p ?o) (#s ?p ?o) (?s #p ?o) (?s ?p #o) (#s #p ?o) (#s ?p #o) (?s #p #o) (#s #p #o) Given a triple pattern, which source can contribute bindings for the triple pattern?
  • 13. KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Keep index of properties and/or classes contained in sources (?s #p ?o), (?s rdf:type #o) Covers only queries containing schema-level elements Commonly used properties select potentially too many sources Schema-Level Indices [Stuckenschmidt et al. 2004] 13 15.03.2010 SELECT ?f ?n WHERE { an:f#ah foaf:knows ?f. ?f foaf:name ?n. } SELECT ?x1 ?x2 WHERE { dblppub:HoganHP08 dc:creator ?a1. ?x1 owl:sameAs ?a1. ?x2 foaf:knows ?x1. }
  • 14. KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Exploits correspondence between thing-URI and source-URI Linked Data sources (aka RDF files) return typically triples with a subject corresponding to the source Sometimes the sources return triples with object corresponding to the source (#s ?p ?o), (#s #p ?o), (#s #p #o) (?s ?p #o), (?s #p #o) Incomplete wrt. patterns but also wrt. to URI reuse across sources Limited parallelism, unclear how to schedule lookups Direct Lookup (DL) [Hartig et al. 2009] SELECT ?f ?n WHERE { an:f#ah foaf:knows ?f. ?f foaf:name ?n. } SELECT ?x1 ?x2 WHERE { dblppub:HoganHP08 dc:creator ?a1. ?x1 owl:sameAs ?a1. ?x2 foaf:knows ?x1. }
  • 15. KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Combined description of schema-level and instance-level Use approximation to reduce index size (incurs false positives) Possible to use entire query for source selection Parallel lookups since sources can be determined for the entire query (?s ?p ?o), (#s ?p ?o), (?s #p ?o), (?s ?p #o), (#s #p ?o), (#s ?p #o), (?s #p #o), (#s #p #o) and combinations of triple patterns Approximate Data Summaries 15 15.03.2010 SELECT ?f ?n WHERE { an:f#ah foaf:knows ?f. ?f foaf:name ?n. } SELECT ?x1 ?x2 WHERE { dblppub:HoganHP08 dc:creator ?a1. ?x1 owl:sameAs ?a1. ?x2 foaf:knows ?x1. }
  • 16. KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Implementation Deploy wrappers „in the cloud“ Google App Engine: hosting of Java and Python webapps on Google’s Cloud infrastructure Limited amount of processing time (6hrs/day) Single-threaded applications Suited for deploying wrappers e.g. http://twitter2foaf.appspot.com/ converts Twitter user data to RDF
  • 17. KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Linking Open Data Cloud 2007
  • 18. KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Linking Open Data Cloud 2008
  • 19. KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Linking Open Data Cloud 2009
  • 20. KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Linking Open Data Cloud 2010
  • 21. KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Geonames Services
  • 22. KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Geonames Services
  • 23. KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Geonames Services {"weatherObservation": {"clouds":"broken clouds", "weatherCondition":"drizzle", "observation":"LESO 251300Z 03007KT 340V040 CAVOK 23/15 Q1010", "windDirection":30, "ICAO":"LESO", ...
  • 24. KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association {"weatherObservation": {"clouds":"broken clouds", "weatherCondition":"drizzle", "observation":"LESO 251300Z 03007KT 340V040 CAVOK 23/15 Q1010", "windDirection":30, "ICAO":"LESO", ... Geonames Services
  • 25. KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Linked Open Service Principles REST Principles 1. Application state and functionality is divided into resources 2. Every resource is uniquely addressable 3. All resources share a uniform interface: a) A constrained set of well-defined operations b) A constrained set of content types Linked Data Principles 1. Use URIs as names for things 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL) 4. Include links to other URIs. so that they can discover more things. Linked Open Service Principles 1. Describe services as LOD prosumers with input and output descriptions as SPARQL graph patterns 2. Communicate RDF by RESTful content negotiation 3. The output should make explicit its relation with the input
  • 26. KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association LOS Weather Service
  • 27. KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association LOS Geo Resources
  • 28. KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Resource-Based Linked Open Services GET Accept: text/html 303 REDIRECT /page GET Accept: application/rdf+xml (or text/n3) 303 REDIRECT /data LinkedDataLinkedService GET /weather Accept: application/rdf+xml (or text/n3) 200 <rdf:Description>
  • 29. KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Interlinking Data with Data from Services?
  • 30. KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Linked Data Services We’d like to integrate data services with Linked Data 1. LIDS need to adhere to Linked Data principles We’d like to use data services in software programs 2. LIDS need machine-readable descriptions of input and output Compared to naïve approach: assign URI to service output Relationship between input and output is explicitly described Dynamicity is supported Multiple or no output resources can be linked to input
  • 31. KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Interlink LIDS and Linked Data Generate service URIs with input bindings, from evaluating : select Xi where Ti sameAs: binding for i
  • 32. KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Query Answering using LIDS and Linked Data Query execution resolves URIs => enlarges data set LIDS are interlinked Query is executed again on new data set Repeat until no new links or no new data Combine results
  • 33. KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Experiment: Query Answering Input: List of 562 (potential) universities from Facebook Graph API Output: Facebook fans and DBpedia student numbers for 104 universities PREFIX u: <http://openlids.org/universities.rdf#> SELECT ?n ?f ?s WHERE { u:list foaf:topic ?u . ?u foaf:name ?n . ?u og:fan_count ?f .?u d:numberOfStudents ?s }
  • 34. KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Linked * Services and PlanetData Several areas seem likely to produce services: Stream, inc. Sensor, resources (latest values) Any others exposing dynamic resources Dynamic computations, inc. on-the-fly quality assessments Other areas seem likely to consider service technologies and move towards more service-like HTTP interactions Access control (OpenID, OAuth, etc.) Finally, remaining areas could serve to complement LIDS/LOS alignment Provenance

Editor's Notes

  • #10: collect the data from all known sources in advance preprocess the combined data store the results in a central database; queries are evaluated using the local database parse, normalise, and split the query into subqueries determine the sources containing results for subqueries and evaluate the subqueries against the sources directly Match with later architecture overview animation MAT: excellent query response times due to the large amount of preprocessing carried out during the load and indexing steps aggregated data is never current as collecting and indexing vast amounts of data is time-consuming from the viewpoint of a single requester with a particular query, there is a large amount of unnecessary data gathering and storage due to the replicated data storage the data providers have to give up their sole sovereignty on their data (e.g., they cannot restrict or log access any more since queries are answered against a copy of the data) DQP: system is more dynamic with up-to-date data new sources can be added easily without time lag for indexing and integrating the data the systems require less storage and processing resources at the query issuing site DQP systems cannot give strict guarantees about query performance since the integration system relies on a large number of potentially unreliable sources Source selection affects efficiency of query execution @Juergen: join processing as scan (DL) or in Jena (QTree)?
  • #11: We want not materialise, but distributed Web Linked Lookups use web architecture (also different to distributed SPARQL) Traditional approaches assume a few data sources with full query processing capabilities (drei riessen bobbel, 100 kleine quellen) Linked Data: very large number of relatively small sources (kilobytes to megabytes) HTTP GET is sole operation We assume relatively stable source URIs Focus on tree-shaped conjunctive queries, full SPARQL can be layered on top
  • #29: The upper right is standard application of Linked Data principles – if you request (state, in the request header, that you accept) HTML, you are redirected to a ‚page‘ URI; if you request RDF, you are redirected to a ‚data‘ URI (i.e. page/data is, in our implementation, appended to the end of the resource‘s URI). This is because the original URI actually identifies the airport but, since the airport is a real thing, not an information resource, you can‘t actually retrieve it in itself, only a related information resource. The bottom right is how we extend in LOS – under the same URI scheme you can ask for a computation relative to the resource by POSTing to a URI representing the weather under it (the airport).