Rethinking Online SPARQL Querying
to Support
Incremental Result Visualization
Olaf Hartig
http://olafhartig.de
@olafhartig
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 2
Prologue
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 3
Live Querying the Web of Data
● Federated query processing
– i.e., querying a federation of SPARQL endpoints
● Linked Data query processing
– i.e., querying Linked Data by relying only on the
Linked Data principles (interface: URI lookups)
– e.g., traversal-based query execution
● Querying other Linked Data fragment servers
– e.g., triple pattern fragments
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 4
Chapter 1
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 5
Can the progress that has been made
on (Read/Write) Linked Data change the
way we interact with the Web […] ?”
“
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 6
Information in Dynamic Web Pages
Support for such an incremental visualization
has not received much attention in existing
work on querying the Web of Data
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 7
“
I think we have not made enough progress to even
enable well-understood interaction techniques that
are widely applied in “traditional” Web applications
Can the progress that has been made
on (Read/Write) Linked Data change the
way we interact with the Web […] ?”
“
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 8
Topics
Opportunities to Optimize the Response
Times of Traversal-based Query Executions
Making the Core Fragment of SPARQL
Suitable for the Task
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 9
Chapter 2
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 10
Implementation Approach
Data Retrieval
Operator
Triple
Pattern
Operator
Triple
Pattern
Operator
Dispatcher
. . .
Triple pattern
( ?v1, knows, ?v2 )
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 11
Data Retrieval Operator
Dispatcher
. . .
GET http://example.org/...
. . . . . . . .
RDF triple
( Bob, knows, Alice )
Triple pattern
( ?v1, knows, ?v2 )
Triple
Pattern
Operator
Triple
Pattern
Operator
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 12
Triple Pattern Operator
Dispatcher
. . .
. . . . . . . . Triple pattern
( ?v1, knows, ?v2 )
RDF triple
( Bob, knows, Alice )
Intermediate Solution
Timestamp: 1
Bindings: ?v1 → Bob, ?v2 → Alice
Flags: [ ∙ | √ | ∙ | ∙ ]
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 13
Dispatcher
. . .
. . . . . . . .
Output
Intermediate Solution
Timestamp: 1
Bindings: ?v1 → Alice, ?v2 → Bob
Flags: [ ∙ | √ | ∙ | ∙ ]
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 14
Output
Triple Pattern Operator cont'd
. . .
. . . . . . . .
?X
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 15
Output
Triple Pattern Operator cont'd
. . .
. . . . . . . .
?
Intermediate Solution
Timestamp: 461
Bindings: ?v1 → Bob, ?v2 → Steve
Flags: [ ∙ | √ | ∙ | ∙ ]
Intermediate Solution
Timestamp: 327
Bindings: ?v1 → Bob, ?v3 → Berlin
Flags: [√ | ∙ | ∙ | ∙ ]
Intermediate Solution
Timestamp: 461
Bindings: ?v1 → Bob, ?v2 → Steve,
?v3 → Berlin
Flags: [√ | √ | ∙ | ∙ ]
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 16
Output
Properties
. . .
. . . . . . . .
TP Operator
Data
Retrieval
Dispatcher
TP Operator
● Supports:
– any reachability-based
query semantics
● Highly flexible
– routing of intermediate
solutions
● Inspired by “Eddies”
– Avnur & Hellerstein,
SIGMOD 2000
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 17
Hypothesis 1
Responses time can be reduced
by applying a suitable routing policy.
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 18
Test of Different Routing Policies
Setup:
● Data retrieval operator simply appends to its lookup queue
● Web simulation environment (test Web: W-62-47, test query: Q1, details: [Hartig and Özsu 2014])
● Each bar represents geometric mean of 5 separate executions
Response time for
last reported solution,
relative to overall QET
Response time for
first reported solution,
relative to overall QET
Routing policy
has no impact!
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 19
Hypothesis 1
Responses time can be reduced
by applying a suitable routing policy.
No!
Why?
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 20
Data Retrieval Dominates!!!
Query 1 Query 4 Query 5 Query 9 Query 10
0.1
1
10
100
1000
10000
100000
10 threads 20 threads cache
avg.queryexec.time(seconds)
logscale!
5 queries of the FedBench benchmark suite,
executed over real Linked Data on the WWW
Different number of lookup threads
used by the data retrieval operator Data retrieval op. equipped with a cache
● Cache populated
by a first execution
● Times measured for
a 2nd, cache-only
execution (i.e., data
retrieval deactivated)
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 21
Hypothesis 2
Response times can be reduced
by choosing a “good” strategy
of prioritizing URI lookups.
. . . . . . . .
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 22
0 1 2 3 4 5 6
0
5
10
15
20
25
30
35
QET
exec1
exec2
exec3
exec4
exec5
Prioritizing Lookups Randomly
result elements
timefrombeginofthequeryexecution
(inminutes)
ca. 25% of QET
ca. 58%
Setup:
● LD10 of the FedBench benchmark suite,
over real Linked Data on the WWW
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 23
Hypothesis 2
Response times can be reduced
by choosing a “good” strategy
of prioritizing URI lookups.
√
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 24
Question
Response times can be reduced
by choosing a “good” strategy
of prioritizing URI lookups.
√
What is
?
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 25
Chapter 3
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 26
Topics
Opportunities to Optimize the Response
Times of Traversal-based Query Executions √
Making the Core Fragment of SPARQL
Suitable for the Task
(by making it monotonic)
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 27
Monotonicity?
● Query Q is monotonic if for every pair ( , ) of
possible databases, it holds that:
● Example: the SPARQL pattern is
P = (a, p,?x) OPT (?x, p,?y)
is not monotonic
– G1 = { (a, p, b) }
– G2 = { (a, p, b), (b, p, c) }
– ⟦P⟧G1 = { μ }, where μ = { ?x → b }
– ⟦P⟧G2 = { μ' }, where μ' = { ?x → b, ?y → c } ≠ μ !
⟹ Q( ) ⊆ Q( )
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 28
What is the Issue?
● For any non-monotonic query, elements of
the result set can be output only after we
have seen all query-relevant parts of the DB
– Hence, since we discover our DB (the Web of Data)
at runtime, we can output result elements only after
completing the discovery process
● Good news: the AND-UNION-FILTER fragment of
SPARQL is monotonic [Arenas and Perez 2011]
● Bad news: for the AND-UNION-FILTER-OPT fragment,
monotonicity is undecidable [Hartig 2014]
– i.e., queries with OPT may be non-monotonic
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 29
What is the Usage of OPT?
● DBpedia
– 46.4% of ca. 1.3M unique queries
(logs from Apr. – Jul. 2010)
Picalausa and Vansummeren, in SWIM 2011
– 16.6% (logs from USEWOD 2011 dataset)
Gallego et al., in USEWOD 2011
– 15% (logs from USEWOD 2011 dataset)
Elbedweihy et al., in COLD 2011
● Semantic Web conference corpus (SWDF)
– 0.4% (logs from USEWOD 2011 dataset)
Gallego et al., in USEWOD 2011
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 30
A Proposal: The OPT
+
Operator
● Query Q is monotonic if for every pair ( , ) of
possible databases, it holds that:
●
● Recall our example: the SPARQL pattern is
P' = (a, p,?x) OPT (?x, p,?y)
is not monotonic
– G1 = { (a, p, b) }, G2 = { (a, p, b), (b, p, c) }
– ⟦P'⟧G1 = { μ }, where μ = { ?x → b }
– ⟦P'⟧G2 = { μ, μ' }, where μ' = { ?x → b, ?y → c } ≠ μ !
● 〚 P1 OPT+
P2 〛 G = ( 〚 P1 〛 G ⋈ 〚 P2 〛 G ) υ ( 〚 P1 〛 G  〚 P2 〛 G )
● 〚 P1 OPT+
P2 〛 G = ( 〚 P1 〛 G ⋈ 〚 P2 〛 G ) υ 〚 P1 〛 G
➔ P1 OPT+
P2 ≡ (P1 AND P2) UNION P1
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 31
A Proposal: The OPT
+
Operator
● Query Q is monotonic if for every pair ( , ) of
possible databases, it holds that:
●
● Recall our example: the SPARQL pattern is
P' = (a, p,?x) OPT+
(?x, p,?y)
is not monotonic √
– G1 = { (a, p, b) }, G2 = { (a, p, b), (b, p, c) }
– ⟦P'⟧G1 = { μ }, where μ = { ?x → b }
– ⟦P'⟧G2 = { μ, μ' }, where μ' = { ?x → b, ?y → c } ≠ μ !
● 〚 P1 OPT+
P2 〛 G = ( 〚 P1 〛 G ⋈ 〚 P2 〛 G ) υ ( 〚 P1 〛 G  〚 P2 〛 G )
● 〚 P1 OPT+
P2 〛 G = ( 〚 P1 〛 G ⋈ 〚 P2 〛 G ) υ 〚 P1 〛 G
➔ P1 OPT+
P2 ≡ (P1 AND P2) UNION P1
√
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 32
A Proposal: The OPT
+
Operator
● 〚 P1 OPT+
P2 〛 G = ( 〚 P1 〛 G ⋈ 〚 P2 〛 G ) υ ( 〚 P1 〛 G  〚 P2 〛 G )
● 〚 P1 OPT+
P2 〛 G = ( 〚 P1 〛 G ⋈ 〚 P2 〛 G ) υ 〚 P1 〛 G
➔ P1 OPT+
P2 ≡ (P1 AND P2) UNION P1
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 33
Epilogue
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 34
Conclusions
● Returning result elements early has not yet
received sufficient attention in existing work
on live querying the Web of Data
● Prioritizing data retrieval can reduce response
times of traversal-based query executions
What approaches are suitable and effective?
Similar for federated query processing, LDFs?
● Language features have to be chosen with care
Their impact has to be studied
Dedicated optimization techniques are possible

More Related Content

PDF
Tutorial "Linked Data Query Processing" Part 5 "Query Planning and Optimizati...
PDF
Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW...
PDF
Tutorial "Linked Data Query Processing" Part 4 "Execution Process" (WWW 2013 ...
PDF
LDQL: A Query Language for the Web of Linked Data
PDF
Tutorial "Linked Data Query Processing" Part 1 "Introduction" (WWW 2013 Ed.)
PDF
Tutorial "Linked Data Query Processing" Part 3 "Source Selection Strategies" ...
PDF
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
PDF
A Context-Based Semantics for SPARQL Property Paths over the Web
Tutorial "Linked Data Query Processing" Part 5 "Query Planning and Optimizati...
Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW...
Tutorial "Linked Data Query Processing" Part 4 "Execution Process" (WWW 2013 ...
LDQL: A Query Language for the Web of Linked Data
Tutorial "Linked Data Query Processing" Part 1 "Introduction" (WWW 2013 Ed.)
Tutorial "Linked Data Query Processing" Part 3 "Source Selection Strategies" ...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
A Context-Based Semantics for SPARQL Property Paths over the Web

What's hot (18)

PPTX
Medical Heritage Library (MHL) on ArchiveSpark
PDF
Querying Linked Data with SPARQL
PDF
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
PPTX
A Workshop on R
PPTX
Querying the Web of Data
PDF
SF Python Meetup: TextRank in Python
PDF
Creating Open Data with Open Source (beta2)
PPTX
SPARQL Cheat Sheet
PPTX
LD4KD 2015 - Demos and tools
PDF
ParlBench: a SPARQL-benchmark for electronic publishing applications.
PPTX
Linked Data and Services
PDF
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
PDF
Big data analysis in python @ PyCon.tw 2013
PDF
final_copy_camera_ready_paper (7)
PPTX
Wi2015 - Clustering of Linked Open Data - the LODeX tool
PPTX
RDF Stream Processing Tutorial: RSP implementations
PDF
Introduction to data analysis using R
PPTX
RDF-Gen: Generating RDF from streaming and archival data
Medical Heritage Library (MHL) on ArchiveSpark
Querying Linked Data with SPARQL
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
A Workshop on R
Querying the Web of Data
SF Python Meetup: TextRank in Python
Creating Open Data with Open Source (beta2)
SPARQL Cheat Sheet
LD4KD 2015 - Demos and tools
ParlBench: a SPARQL-benchmark for electronic publishing applications.
Linked Data and Services
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Big data analysis in python @ PyCon.tw 2013
final_copy_camera_ready_paper (7)
Wi2015 - Clustering of Linked Open Data - the LODeX tool
RDF Stream Processing Tutorial: RSP implementations
Introduction to data analysis using R
RDF-Gen: Generating RDF from streaming and archival data
Ad

Viewers also liked (7)

PDF
Querying Linked Data with SPARQL (2010)
ODP
An Overview on PROV-AQ: Provenance Access and Query
ODP
If you love something... set it free
PDF
(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)
PDF
Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversa...
PDF
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 2 (...
PDF
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Querying Linked Data with SPARQL (2010)
An Overview on PROV-AQ: Provenance Access and Query
If you love something... set it free
(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)
Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversa...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 2 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Ad

Similar to Rethinking Online SPARQL Querying to Support Incremental Result Visualization (20)

PDF
Apache Spark - Intro to Large-scale recommendations with Apache Spark and Python
PPT
A Pragmatic Approach to Semantic Repositories Benchmarking
PPTX
GPORCA: Query Optimization as a Service
PDF
WBDB 2015 Performance Evaluation of Spark SQL using BigBench
PDF
Nationwide Splunk Ninjas!
PDF
The Power of Machine Learning and Graphs
PPTX
Predicting query performance and explaining results to assist Linked Data con...
PDF
Ontology-based data access: why it is so cool!
PPTX
SPARQL and RDF query optimization
PPTX
balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information
PDF
Robert Meyer- pypet
PDF
Scaling TensorFlow with Hops, Global AI Conference Santa Clara
PDF
Auto-Pilot for Apache Spark Using Machine Learning
PDF
Koalas: How Well Does Koalas Work?
PPTX
Power of SPL Breakout Session
PDF
(ATS6-PLAT03) What's behind Discngine collections
PDF
Using graphs for recommendations
PDF
On the Role of the GRAPH Clause in the Performance of Federated SPARQL Queries
PDF
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
PDF
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Apache Spark - Intro to Large-scale recommendations with Apache Spark and Python
A Pragmatic Approach to Semantic Repositories Benchmarking
GPORCA: Query Optimization as a Service
WBDB 2015 Performance Evaluation of Spark SQL using BigBench
Nationwide Splunk Ninjas!
The Power of Machine Learning and Graphs
Predicting query performance and explaining results to assist Linked Data con...
Ontology-based data access: why it is so cool!
SPARQL and RDF query optimization
balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information
Robert Meyer- pypet
Scaling TensorFlow with Hops, Global AI Conference Santa Clara
Auto-Pilot for Apache Spark Using Machine Learning
Koalas: How Well Does Koalas Work?
Power of SPL Breakout Session
(ATS6-PLAT03) What's behind Discngine collections
Using graphs for recommendations
On the Role of the GRAPH Clause in the Performance of Federated SPARQL Queries
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy

More from Olaf Hartig (15)

PDF
The Impact of Data Caching of on Query Execution for Linked Data
PDF
How Caching Improves Efficiency and Result Completeness for Querying Linked Data
PDF
A Main Memory Index Structure to Query Linked Data
PDF
Towards a Data-Centric Notion of Trust in the Semantic Web (A Position Statem...
PDF
Brief Introduction to the Provenance Vocabulary (for W3C prov-xg)
PDF
Answers to usual issues in getting started with consuming Linked Data (2010)
PDF
Linked Data on the Web
PDF
Executing SPARQL Queries of the Web of Linked Data
PDF
Using Web Data Provenance for Quality Assessment
PDF
Answers to usual issues in getting started with consuming Linked Data
PDF
Querying Trust in RDF Data with tSPARQL
PDF
Database Researchers Map
PDF
Provenance Information in the Web of Data
PDF
The SPARQL Query Graph Model for Query Optimization
PDF
The Semantics of SPARQL
The Impact of Data Caching of on Query Execution for Linked Data
How Caching Improves Efficiency and Result Completeness for Querying Linked Data
A Main Memory Index Structure to Query Linked Data
Towards a Data-Centric Notion of Trust in the Semantic Web (A Position Statem...
Brief Introduction to the Provenance Vocabulary (for W3C prov-xg)
Answers to usual issues in getting started with consuming Linked Data (2010)
Linked Data on the Web
Executing SPARQL Queries of the Web of Linked Data
Using Web Data Provenance for Quality Assessment
Answers to usual issues in getting started with consuming Linked Data
Querying Trust in RDF Data with tSPARQL
Database Researchers Map
Provenance Information in the Web of Data
The SPARQL Query Graph Model for Query Optimization
The Semantics of SPARQL

Recently uploaded (20)

PPTX
SCIENCE 4 Q2W5 PPT.pptx Lesson About Plnts and animals and their habitat
PPTX
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
PPT
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
PDF
GROUP 2 ORIGINAL PPT. pdf Hhfiwhwifhww0ojuwoadwsfjofjwsofjw
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PPT
Animal tissues, epithelial, muscle, connective, nervous tissue
PPTX
Understanding the Circulatory System……..
PPTX
ap-psych-ch-1-introduction-to-psychology-presentation.pptx
PDF
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
PPTX
TORCH INFECTIONS in pregnancy with toxoplasma
PDF
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
PPT
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
PDF
Unit 5 Preparations, Reactions, Properties and Isomersim of Organic Compounds...
PDF
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
PPTX
Probability.pptx pearl lecture first year
PPT
THE CELL THEORY AND ITS FUNDAMENTALS AND USE
PPT
LEC Synthetic Biology and its application.ppt
PDF
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
PDF
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
PPTX
Presentation1 INTRODUCTION TO ENZYMES.pptx
SCIENCE 4 Q2W5 PPT.pptx Lesson About Plnts and animals and their habitat
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
GROUP 2 ORIGINAL PPT. pdf Hhfiwhwifhww0ojuwoadwsfjofjwsofjw
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
Animal tissues, epithelial, muscle, connective, nervous tissue
Understanding the Circulatory System……..
ap-psych-ch-1-introduction-to-psychology-presentation.pptx
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
TORCH INFECTIONS in pregnancy with toxoplasma
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
Unit 5 Preparations, Reactions, Properties and Isomersim of Organic Compounds...
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
Probability.pptx pearl lecture first year
THE CELL THEORY AND ITS FUNDAMENTALS AND USE
LEC Synthetic Biology and its application.ppt
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
Presentation1 INTRODUCTION TO ENZYMES.pptx

Rethinking Online SPARQL Querying to Support Incremental Result Visualization

  • 1. Rethinking Online SPARQL Querying to Support Incremental Result Visualization Olaf Hartig http://olafhartig.de @olafhartig
  • 2. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 2 Prologue
  • 3. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 3 Live Querying the Web of Data ● Federated query processing – i.e., querying a federation of SPARQL endpoints ● Linked Data query processing – i.e., querying Linked Data by relying only on the Linked Data principles (interface: URI lookups) – e.g., traversal-based query execution ● Querying other Linked Data fragment servers – e.g., triple pattern fragments
  • 4. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 4 Chapter 1
  • 5. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 5 Can the progress that has been made on (Read/Write) Linked Data change the way we interact with the Web […] ?” “
  • 6. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 6 Information in Dynamic Web Pages Support for such an incremental visualization has not received much attention in existing work on querying the Web of Data
  • 7. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 7 “ I think we have not made enough progress to even enable well-understood interaction techniques that are widely applied in “traditional” Web applications Can the progress that has been made on (Read/Write) Linked Data change the way we interact with the Web […] ?” “
  • 8. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 8 Topics Opportunities to Optimize the Response Times of Traversal-based Query Executions Making the Core Fragment of SPARQL Suitable for the Task
  • 9. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 9 Chapter 2
  • 10. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 10 Implementation Approach Data Retrieval Operator Triple Pattern Operator Triple Pattern Operator Dispatcher . . . Triple pattern ( ?v1, knows, ?v2 )
  • 11. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 11 Data Retrieval Operator Dispatcher . . . GET http://example.org/... . . . . . . . . RDF triple ( Bob, knows, Alice ) Triple pattern ( ?v1, knows, ?v2 ) Triple Pattern Operator Triple Pattern Operator
  • 12. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 12 Triple Pattern Operator Dispatcher . . . . . . . . . . . Triple pattern ( ?v1, knows, ?v2 ) RDF triple ( Bob, knows, Alice ) Intermediate Solution Timestamp: 1 Bindings: ?v1 → Bob, ?v2 → Alice Flags: [ ∙ | √ | ∙ | ∙ ]
  • 13. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 13 Dispatcher . . . . . . . . . . . Output Intermediate Solution Timestamp: 1 Bindings: ?v1 → Alice, ?v2 → Bob Flags: [ ∙ | √ | ∙ | ∙ ]
  • 14. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 14 Output Triple Pattern Operator cont'd . . . . . . . . . . . ?X
  • 15. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 15 Output Triple Pattern Operator cont'd . . . . . . . . . . . ? Intermediate Solution Timestamp: 461 Bindings: ?v1 → Bob, ?v2 → Steve Flags: [ ∙ | √ | ∙ | ∙ ] Intermediate Solution Timestamp: 327 Bindings: ?v1 → Bob, ?v3 → Berlin Flags: [√ | ∙ | ∙ | ∙ ] Intermediate Solution Timestamp: 461 Bindings: ?v1 → Bob, ?v2 → Steve, ?v3 → Berlin Flags: [√ | √ | ∙ | ∙ ]
  • 16. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 16 Output Properties . . . . . . . . . . . TP Operator Data Retrieval Dispatcher TP Operator ● Supports: – any reachability-based query semantics ● Highly flexible – routing of intermediate solutions ● Inspired by “Eddies” – Avnur & Hellerstein, SIGMOD 2000
  • 17. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 17 Hypothesis 1 Responses time can be reduced by applying a suitable routing policy.
  • 18. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 18 Test of Different Routing Policies Setup: ● Data retrieval operator simply appends to its lookup queue ● Web simulation environment (test Web: W-62-47, test query: Q1, details: [Hartig and Özsu 2014]) ● Each bar represents geometric mean of 5 separate executions Response time for last reported solution, relative to overall QET Response time for first reported solution, relative to overall QET Routing policy has no impact!
  • 19. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 19 Hypothesis 1 Responses time can be reduced by applying a suitable routing policy. No! Why?
  • 20. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 20 Data Retrieval Dominates!!! Query 1 Query 4 Query 5 Query 9 Query 10 0.1 1 10 100 1000 10000 100000 10 threads 20 threads cache avg.queryexec.time(seconds) logscale! 5 queries of the FedBench benchmark suite, executed over real Linked Data on the WWW Different number of lookup threads used by the data retrieval operator Data retrieval op. equipped with a cache ● Cache populated by a first execution ● Times measured for a 2nd, cache-only execution (i.e., data retrieval deactivated)
  • 21. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 21 Hypothesis 2 Response times can be reduced by choosing a “good” strategy of prioritizing URI lookups. . . . . . . . .
  • 22. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 22 0 1 2 3 4 5 6 0 5 10 15 20 25 30 35 QET exec1 exec2 exec3 exec4 exec5 Prioritizing Lookups Randomly result elements timefrombeginofthequeryexecution (inminutes) ca. 25% of QET ca. 58% Setup: ● LD10 of the FedBench benchmark suite, over real Linked Data on the WWW
  • 23. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 23 Hypothesis 2 Response times can be reduced by choosing a “good” strategy of prioritizing URI lookups. √
  • 24. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 24 Question Response times can be reduced by choosing a “good” strategy of prioritizing URI lookups. √ What is ?
  • 25. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 25 Chapter 3
  • 26. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 26 Topics Opportunities to Optimize the Response Times of Traversal-based Query Executions √ Making the Core Fragment of SPARQL Suitable for the Task (by making it monotonic)
  • 27. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 27 Monotonicity? ● Query Q is monotonic if for every pair ( , ) of possible databases, it holds that: ● Example: the SPARQL pattern is P = (a, p,?x) OPT (?x, p,?y) is not monotonic – G1 = { (a, p, b) } – G2 = { (a, p, b), (b, p, c) } – ⟦P⟧G1 = { μ }, where μ = { ?x → b } – ⟦P⟧G2 = { μ' }, where μ' = { ?x → b, ?y → c } ≠ μ ! ⟹ Q( ) ⊆ Q( )
  • 28. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 28 What is the Issue? ● For any non-monotonic query, elements of the result set can be output only after we have seen all query-relevant parts of the DB – Hence, since we discover our DB (the Web of Data) at runtime, we can output result elements only after completing the discovery process ● Good news: the AND-UNION-FILTER fragment of SPARQL is monotonic [Arenas and Perez 2011] ● Bad news: for the AND-UNION-FILTER-OPT fragment, monotonicity is undecidable [Hartig 2014] – i.e., queries with OPT may be non-monotonic
  • 29. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 29 What is the Usage of OPT? ● DBpedia – 46.4% of ca. 1.3M unique queries (logs from Apr. – Jul. 2010) Picalausa and Vansummeren, in SWIM 2011 – 16.6% (logs from USEWOD 2011 dataset) Gallego et al., in USEWOD 2011 – 15% (logs from USEWOD 2011 dataset) Elbedweihy et al., in COLD 2011 ● Semantic Web conference corpus (SWDF) – 0.4% (logs from USEWOD 2011 dataset) Gallego et al., in USEWOD 2011
  • 30. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 30 A Proposal: The OPT + Operator ● Query Q is monotonic if for every pair ( , ) of possible databases, it holds that: ● ● Recall our example: the SPARQL pattern is P' = (a, p,?x) OPT (?x, p,?y) is not monotonic – G1 = { (a, p, b) }, G2 = { (a, p, b), (b, p, c) } – ⟦P'⟧G1 = { μ }, where μ = { ?x → b } – ⟦P'⟧G2 = { μ, μ' }, where μ' = { ?x → b, ?y → c } ≠ μ ! ● 〚 P1 OPT+ P2 〛 G = ( 〚 P1 〛 G ⋈ 〚 P2 〛 G ) υ ( 〚 P1 〛 G 〚 P2 〛 G ) ● 〚 P1 OPT+ P2 〛 G = ( 〚 P1 〛 G ⋈ 〚 P2 〛 G ) υ 〚 P1 〛 G ➔ P1 OPT+ P2 ≡ (P1 AND P2) UNION P1
  • 31. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 31 A Proposal: The OPT + Operator ● Query Q is monotonic if for every pair ( , ) of possible databases, it holds that: ● ● Recall our example: the SPARQL pattern is P' = (a, p,?x) OPT+ (?x, p,?y) is not monotonic √ – G1 = { (a, p, b) }, G2 = { (a, p, b), (b, p, c) } – ⟦P'⟧G1 = { μ }, where μ = { ?x → b } – ⟦P'⟧G2 = { μ, μ' }, where μ' = { ?x → b, ?y → c } ≠ μ ! ● 〚 P1 OPT+ P2 〛 G = ( 〚 P1 〛 G ⋈ 〚 P2 〛 G ) υ ( 〚 P1 〛 G 〚 P2 〛 G ) ● 〚 P1 OPT+ P2 〛 G = ( 〚 P1 〛 G ⋈ 〚 P2 〛 G ) υ 〚 P1 〛 G ➔ P1 OPT+ P2 ≡ (P1 AND P2) UNION P1 √
  • 32. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 32 A Proposal: The OPT + Operator ● 〚 P1 OPT+ P2 〛 G = ( 〚 P1 〛 G ⋈ 〚 P2 〛 G ) υ ( 〚 P1 〛 G 〚 P2 〛 G ) ● 〚 P1 OPT+ P2 〛 G = ( 〚 P1 〛 G ⋈ 〚 P2 〛 G ) υ 〚 P1 〛 G ➔ P1 OPT+ P2 ≡ (P1 AND P2) UNION P1
  • 33. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 33 Epilogue
  • 34. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 34 Conclusions ● Returning result elements early has not yet received sufficient attention in existing work on live querying the Web of Data ● Prioritizing data retrieval can reduce response times of traversal-based query executions What approaches are suitable and effective? Similar for federated query processing, LDFs? ● Language features have to be chosen with care Their impact has to be studied Dedicated optimization techniques are possible