SlideShare a Scribd company logo
INI Lab. 
An Optimal and Progressive 
Algorithm for Skyline Queries 
Dimitris Papadias, Yufei Tao, Greg Fu, Bernhard Seeger 
ACM SIGMOD’ 2003 
Presenters 
KYEONG SEOK HYUN, 
WOO-SUNG CHOI, 
JA-YEON KIM,
Abstract 
 An Optimal 
 and Progressive Algorithm 
 for Skyline Queries 
 Using R-Tree
contents 
 1. Introduction 
 2. Related Work 
 2.1 Block Nested Loop (BNL) 
 2.5 Nearest Neighbor (NN) 
 3. Branch and Bound Skyline Algorithm 
 With I/O analysis 
 5. Experimental Evaluation
Skyline 
Problem definition
Which one do you prefer? 
http://emperia.egloos.com/m/2516211 
http://drmoontv.blogspot.kr/2013/03/blog-post_17.html 
http://www.huffingtonpost.kr/2014/11/13/story_n_6150254.html 
5,000 Won 
40,000 Won 
4,500 Won 
http://flickrhivemind.net/User/Trollface%20T-Shirts/Interesting 
혜자 >> 창렬
preliminaries 
Formal definition of Dominates (≪) 
 Given a set of d-dimensional points 푇 
We say that a point t1 ∈ 푇 DOMINATES another point t2 ∈ 푇 
 If and only if 
 ∀푖 ∈ 1, 2, 3, … , 푑 , 푡1 푖 ≧ 푡2[푖] 
 ∃푗 ∈ 1, 2, 3, … , 푑 , 푡1 푗 > 푡2[푗] 
 and Denoted by t2 ≪ t1 
 (simply saying, t1 이 이득) 
Definition from http://www.comp.nus.edu.sg/~atung/publication/k_dominant.pdf 
Note that 
the meaning of ‘dominates’ may differ 
according to type of application
Which one do you prefer? 
http://emperia.egloos.com/m/2516211 
http://drmoontv.blogspot.kr/2013/03/blog-post_17.html 
http://www.huffingtonpost.kr/2014/11/13/story_n_6150254.html 
5,000 Won 
40,000 Won 
4,500 Won 
4,500 Won 
http://flickrhivemind.net/User/Trollface%20T-Shirts/Interesting 
Still 혜자 >> 창렬
 Hotel(attraction, 1/price, 1/distance) 
 Two Hotel 
 A : `80`, `1/15,000`, `1/500m` 
 B : `30`, `1/20,000`, `1/1500m` 
 퐵 ≪ 퐴 
 Why? 
 30<80 
 1/20,000 < 1/15,000 
 1/1,500m < 1/500m 
A 
1/price 
attraction 
B 
Dominates! 
≪ 
B A 
for example,
Very important 
Problem Definition 
(mathematical) 
 The Skyline operator 
 Input - Given a set of objects P = {푝1, 푝2, … , 푝푁} 
 Output – {푝푖 | 푝푖 ∈ 푃 푎푛푑 ∄ 푝∗ ∈ 푃 푠. 푡. 푝푖 ≪ 푝∗} 
A 
B 
C 
Dominating Area(B) 
D 
E 
F 
“퐵 ∈ 푂푢푝푢푡, 
s푖푛푐푒 푛표 표푡ℎ푒푟 푝표푖푛푡 푃 ≫ 퐵”, correct 
x axis 
y axis 
G 
Common misconceptions 
“퐵 ∈ 푂푢푝푢푡 s푖푛푐푒 퐵 ≫ 퐶 , D, F” , wrong
Naïve approach 
for processing skyline queries
Exhaustive Test 
 Suppose there are n objects in the given set 
 퐷푥 = {표1, 표2, … , 표푛} 
 Algorithm -Naïve 1 
 푓표푟 푒푎푐ℎ 표푏푗푒푐푡 표푥 ∈ 퐷 
 푏표표푙푒푎푛 푖푠퐷표푚푖푛푎푡푒푑 = 푓푎푙푠푒 
 푓표푟 푒푎푐ℎ 표푏푗푒푐푡 표푦 ∈ 퐷 
 푖푓 ¬(표푥 = 표푦) 퐴푁퐷 ¬ 표푥 ≪ 표푦 푡ℎ푒푛 푐표푛푡푖푛푢푒; 
 푒푙푠푒 
 푡ℎ푒푛 푖푠퐷표푚푖푛푎푡푒푑 = 푡푟푢푒; 
 break; 
 푖푓 ! 푖푠퐷표푚푖푛푎푡푒푑 푆 ∪ {표푥} 
A 
B 
F 
C 
D 
G 
E
 Suppose there are n objects in the given set 
 퐷푥 = {표1, 표2, … , 표푛} 
 Algorithm -Naïve 1 
 푓표푟 푒푎푐ℎ 표푏푗푒푐푡 표푥 ∈ 퐷 
 푏표표푙푒푎푛 푖푠퐷표푚푖푛푎푡푒푑 = 푓푎푙푠푒 
 푓표푟 푒푎푐ℎ 표푏푗푒푐푡 표푦 ∈ 퐷 
 푖푓 ¬(표푥 = 표푦) 퐴푁퐷 ¬ 표푥 ≪ 표푦 푡ℎ푒푛 푐표푛푡푖푛푢푒; 
 푒푙푠푒 
 푡ℎ푒푛 푖푠퐷표푚푖푛푎푡푒푑 = 푡푟푢푒; 
 break; 
 푖푓 ! 푖푠퐷표푚푖푛푎푡푒푑 푆 ∪ {표푥} 
Exhaustive Test 
Nested Loop Structure 
Modification: (Algorithm -Naïve 2) 
Idea 1. Use Nested Loop Structure 
Idea 2. Take advantage of ‘Block-transfer’ 
towards better re-usability! 
Block A 
Block B 
A 
B 
C 
D 
E 
F 
G 
         
The Inherited Limitation of these approaches 
1. It needs full-scan over the data 
2. Though, query result contains 
only a small fraction of the dataset 
3. That is, these approaches are wasteful
R-Tree Index Approach 
for processing skyline queries
Preliminaries 
R-Tree 
 Nearest Neighbor Query
Preliminaries 
R-Tree: Balanced tree for indexing multi-dimensional object 
 Support Dynamic operation (insert, update, delete) 
R-Tree Index 
Approach
R-Tree 
VS 
B-Tree 
 B+-Tree 
 Balanced 
 Requiring that all leaves be at the 
same depth 
 Leaf nodes contain one 
dimensional value 
R-Tree 
 Similar to B+-Tree 
 Leaf nodes contain d-dimensional 
value 
R-Tree Index 
Approach 
http://courses.cs.washington.edu/courses/cse444/09sp/hw/hw3/hw3.html
Spatial objects (or d-dimensional objects or geometric objects) 
 d-dimensional object? 
 R-Tree Used for the Organization of 
a set of d-dimensional objects 
 How? 
 Main Idea 
 Minimum Bounding Rectangles (MBRs) 
<Objects in 2-dimension space> 
http://caversham.otago.ac.nz/research/geog.php
Quiz 
What is the minimum number of points for representing 
a rectangle? 
 Assumption: each rectangle is parallel to the coordinate axes 
18 
6 8 
7 
4 
x 
y 
0 
R-Tree Index 
Approach
Demonstration 
R-Tree Simulator
Nearest Neighbor (NN) 
Query Processing 
using R-Tree 
Nearest Neighbor Query 
 Input 
 Given a set of objects P = {푝1, 푝2, … , 푝푁} 
 Query Point - q 
 Output – {푝푖 | 푝푖 ∈ 푃 푎푛푑 ∄ 푝∗ ∈ 푃 푠. 푡. 퐿푝 푝푖 , 푞 > 퐿푝(푝∗, 푞)} 
0 x 
y 
See how it works in appendix 
R-Tree Index 
Approach
Root node 0 1 
MINMAXDIST(X,1) 
0 x 
y 
MINDIST(X, 0) 
MINDIST(X,1) 
MINMAXDIST(X, 0) 
Key IDEA! 
 Pruning! 
http://ko.aliexpress.com/store/category/pruning-tools/519349_100005637.html 
http://www.installitdirect.com/blog/easy-tips-for-pruning-your-plants/
http://www.davey.com/
Back to the original question 
Skyline with R-Tree
R-Tree Index Approach 
 Let’s process skyline objects using R-Tree 
 Strategy 1 – Use traditional tech. (i.e. NN Query) 
 Strategy 2 – This paper 
 Strategy 1 
 Partition the data using NN Query recursively 
 Distance metric: 퐿1 푛표푟푚 
 First NN Query -> start from the ideal point (i.e. zero point)
Strategy 1 
Recursive NN Query
Dominating Area(i) 
example 
a 
x axis 
y axis 
b 
c 
d 
e 
f 
g 
i 
m 
n 
k 
i 
IDEAL
To-do Area 1 
To-do Area 2 
example 
a 
x axis 
y axis 
b 
i 
k 
IDEAL 
Dominating Area(i) 
TO-DO Area 2 
TO-DO Area 1
Next, test these area 
(only to find nothing) 
To--do Arrea 2 
To-do Area 1 
example 
a 
x axis 
y axis 
b 
i 
k 
Dominating Area(i) 
TO-DO Area 2 
TO-DO Area 1 
Dominating Area(k) 
IDEAL 
` 
`
To-do Area 1 
example 
x axis 
i 
k 
Dominating Area(i) 
TO-DO Area 1 
Dominating Area(k) 
a 
To-do Area 1 
y axis 
b 
IDEAL 
Dominating 
Area(a)
Dominating Area(k) 
Result 
   
Dominating Area(i) 
IDEAL 
Dominating 
Area(a) 
x axis 
y axis 
i 
k 
a
Limitation 
of Strategy 1 
 Generally speaking, 
 In a d-dimensional space, 
 Each skyline object discovered causes d recursive partitioning phase 
Dominated
Limitation 
of Strategy 1 
 Generally speaking, 
 In a d-dimensional space, 
 Each skyline object discovered causes d recursive partitioning phase 
Area 1 
Dominated 
Area 2 
Dominated 
Dominated 
Area 3
What if? 
 In general, for d>2 
 The overlapping of the partitions 
 Necessitates DUPLICATE ELIMINATION 
Area 
1 
Domin 
ated Area 
2 
Domin 
ated 
Domin 
ated 
Area 
3
Disadvantage ! 
 Strategy 1 needs an additional phase 
 For removing redundant outputs 
 4 elimination methods 
 Laisser-faire 
 Propagate 
 Merge 
 Fine-grained Partitioning 
 They works 
 Problem: sub-optimal
Strategy 2 
Branch & Bound Skyline Algorithm
Idea! 
 Similar to previous NN Query 
 Branch & Bound Skyline (BBS) 
http://greatleadersserve.org/leadership/big-idea-great-leaders-serve/
h 
example 
a 
x axis 
y axis 
b 
c 
d 
e 
f 
g 
i m 
n 
k 
l 
IDEAL 
L1E2 
L1E1 
L2E4 
L2E2 
L2E3 
L2E1 Root 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L1E1 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L1E2 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L2E1 
a b c null 
L2E2 
c h i null 
L2E3 
d g m null 
L2E4 
f k l n
example 
h 
a 
x axis 
y axis 
b 
c 
d 
e 
f 
g 
i m 
n 
k 
l 
IDEAL 
L1E2 
L1E1 
L2E4 
L2E2 
L2E3 
L2E1 
L1E1 L1E2 
Queue 
L1E2, 4 L1E1, 10 
Root 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L1E1 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L1E2 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L2E1 
a b c null 
L2E2 
c h i null 
L2E3 
d g m null 
L2E4 
f k l n 
Result
example 
h 
a 
x axis 
y axis 
b 
c 
d 
e 
f 
g 
i m 
n 
k 
l 
IDEAL 
L1E2 
L1E1 
L2E4 
L2E2 
L2E3 
L2E1 
9 
L1E2, 4 L1E2 
Queue 
L2E2, 5 
3 5 7 
L1E1, 10 
L2E3, 7 L2E4, 8 
2 
1 
1 
Root 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L1E1 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L1E2 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L2E1 
a b c null 
L2E2 
c h i null 
L2E3 
d g m null 
L2E4 
f k l n 
Result
example 
h 
a 
x axis 
y axis 
b 
c 
d 
e 
f 
g 
i m 
n 
k 
l 
IDEAL 
L1E2 
L1E1 
L2E4 
L2E2 
L2E3 
L2E1 
Root 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L1E1 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L1E2 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L2E1 
a b c null 
L2E2 
c h i null 
L2E3 
d g m null 
L2E4 
f k l n 
Queue 
3 5 7 
9 
2 
1 
1 
L2E2, 5 L2E3, 7 L2E4, 8 L1E1, 10 
c, 12 h, 7 i, 5 
Result
example 
h 
a 
x axis 
y axis 
b 
c 
d 
e 
f 
g 
i m 
n 
k 
l 
IDEAL 
L1E2 
L1E1 
L2E4 
L2E2 
L2E3 
L2E1 
Root 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L1E1 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L1E2 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L2E1 
a b c null 
L2E2 
c h i null 
L2E3 
d g m null 
L2E4 
f k l n 
Queue 
3 5 7 
9 
2 
1 
1 
i, 5 h, 7 L2E4, 8 L1E1, 10 c, 12 
Result 
L2E3, 7
example 
h 
a 
x axis 
y axis 
b 
c 
d 
e 
f 
g 
i m 
n 
k 
l 
IDEAL 
L1E2 
L1E1 
L2E4 
L2E2 
L2E3 
L2E1 
Root 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L1E1 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L1E2 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L2E1 
a b c null 
L2E2 
c h i null 
L2E3 
d g m null 
L2E4 
f k l n 
Queue 
3 5 7 
9 
2 
1 
1 
h, 7 L2E4, 8 L1E1, 10 c, 12 
i, 5 
Result 
L2E3, 7
example 
h 
a 
x axis 
y axis 
b 
c 
d 
e 
f 
g 
i m 
n 
k 
l 
IDEAL 
L1E2 
L1E1 
L2E4 
L2E2 
L2E3 
L2E1 
Root 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L1E1 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L1E2 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L2E1 
a b c null 
L2E2 
c h i null 
L2E3 
d g m null 
L2E4 
f k l n 
Queue 
3 5 7 
9 
2 
1 
1 
L2E4, 8 L1E1, 10 c, 12 
i, 5 
Result 
k, 10 f n i
example 
h 
a 
x axis 
y axis 
b 
c 
d 
e 
f 
g 
i m 
n 
k 
l 
IDEAL 
L1E2 
L1E1 
L2E4 
L2E2 
L2E3 
L2E1 
Root 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L1E1 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L1E2 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L2E1 
a b c null 
L2E2 
c h i null 
L2E3 
d g m null 
L2E4 
f k l n 
Queue 
3 5 7 
9 
2 
1 
1 
i, 5 
Result 
a, 10 k, 10
Analysis 
Strategy 1
Analysis 
of Strategy 1 
 Notation 
Variable Description 
s #of Skyline obj 
e Empty Query 
ne Non-empty Query 
r Redendent Query 
d d-dimension 
h Height of the given R-Tree 
Recursion Tree 
… 
d new 
recursive NN 
… … 
 푛푒 = 푠 + 푟 
 푒 = 푛푒 ∙ 푑 − 1 + 1, 푠푖푛푐푒 푛푒 + 푒 = 푛푒 ∙ 푑 + 1(푟표표푡) 
 푒 = 푠 + 푟 푑 − 1 + 1 
 푁퐴푁푁 ≥ 푒 + 푠 + 푟 ∗ ℎ = 푠 + 푟 푑 − 1 + 1 + 푠 + 푟 ℎ > 푠 ∙ ℎ ∙ 푑
Analysis 
Strategy 2
Analysis 
of Strategy 2 
(brief version) 
 Notation 
Variable Description 
s #of Skyline obj 
h Height of the given R-Tree 
 푠 ∙ ℎ ≥ 푁퐴퐵퐵푆 
 푁퐴푁푁 > 푠 ∙ ℎ ∙ 푑 > 푁퐴퐵퐵푆
Is it the optimal solution? 
BBS Algorithm
Proof 1. 
Termination 
& 
Correctness 
 Lemma 1. BBS visits entries in ascending order 
 Of their distance to the ‘ideal point’ 
 Lemma 2. Any data point added into Result_Set 
 Is guaranteed to be a final skyline point 
 Proof. 
 Suppose not then 푝푗 was added into Result_Set but not a final skyline point 
 Then, ∃ 푝∗ ∈ 퐷퐵 푠. 푡, 푝∗ ≫ 푝푗 , which means L1 ideal, p∗ < L1(ideal, pj) 
 However, observe that 푝∗ must be visited before 푝푗 by lemma 1. 
 Contradiction: 푝푗 should have been pruned, which contradicts the assumption. 
 Lemma 3. All data point will be examined, unless one of its ancestor 
nodes has been pruned.
Lemmas for the theorem 
Lemma 4. Any skyline algorithm 
based on R-Tree must access all the 
nodes whose mbrs intersects the SSR 
 Lemma 5. If an entry e doesn’t 
intersect the SSR 
 Then ∃푝∗ 푠. 푡. 퐿1 푖푑푒푎푙, 푝∗ < 
퐿1(푖푑푒푎푙, 푒. 푙푒푓푡푑표푤푛) 
 Theorem: The # of node accesses 
performed by BBS is OPTIMAL 
Dominating Area(B) 
A 
B 
F 
C 
D 
E 
x axis 
y axis 
G 
SSR
Proof of the theorem 
 Proof 1. BBS only accesses nodes that 
may contain skyline points. 
 That is, BBS only accesses nodes 
whose mbrs intersect the SSR 
 Suppose not 
 Node e that doesn’t intersect the SSR 
 ∃푝∗ by lemma 5 
 Contradicts, by lemma 1 
 Proof 2. BBS visits nodes at most 
once. (trivial) 
Dominating Area(B) 
A 
B 
F 
C 
D 
E 
x axis 
y axis 
G 
SSR
To quantify 
the actual cost 
A  Skip the details  
B 
C 
Dominating Area(B) 
D 
E 
F 
x axis 
y axis 
G 
SSR
Experimental Evaluation
Experimental Evaluation
Dimensionality
Cardinality 
 3d dataset
Progressive behavior 
 N=1M, d=3
Constrained 
skyline queries 
 N=1M, d=3 
h 
a 
x axis 
y axis 
b 
c 
d 
e 
f 
g 
i m 
n 
k 
l 
IDEAL 
L1E2 
L1E1 
L2E4 
L2E2 
L2E3 
L2E1 
Constrain

More Related Content

What's hot (20)

PPTX
Linear algebra for deep learning
Swayam Mittal
 
PDF
Multimedia
Ashish Yaduvanshi
 
PPTX
Applied Artificial Intelligence Unit 5 Semester 3 MSc IT Part 2 Mumbai Univer...
Madhav Mishra
 
PDF
Logistic regression : Use Case | Background | Advantages | Disadvantages
Rajat Sharma
 
PDF
AI_ 3 & 4 Knowledge Representation issues
Khushali Kathiriya
 
PDF
Multi-armed Bandits
Dongmin Lee
 
PPTX
Probabilistic logic
Rushdi Shams
 
PPTX
Representing uncertainty in expert systems
bhupendra kumar
 
PDF
Systemy rekomendacji, Algorytmy rankingu Top-N rekomendacji bazujące na nieja...
Bartlomiej Twardowski
 
POT
Multi Objective Optimization
Nawroz University
 
PPTX
Decision trees for machine learning
Amr BARAKAT
 
PDF
Boston ML - Architecting Recommender Systems
James Kirk
 
PPTX
Machine Learning lecture4(logistic regression)
cairo university
 
PPTX
Machine learning session4(linear regression)
Abhimanyu Dwivedi
 
PPTX
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
Madhav Mishra
 
ODP
Production system in ai
sabin kafle
 
PPTX
Recommendation system (1).pptx
prathammishra28
 
PPTX
Content based filtering
Bendito Freitas Ribeiro
 
PDF
Text classification & sentiment analysis
M. Atif Qureshi
 
PPTX
Personalized Page Generation for Browsing Recommendations
Justin Basilico
 
Linear algebra for deep learning
Swayam Mittal
 
Multimedia
Ashish Yaduvanshi
 
Applied Artificial Intelligence Unit 5 Semester 3 MSc IT Part 2 Mumbai Univer...
Madhav Mishra
 
Logistic regression : Use Case | Background | Advantages | Disadvantages
Rajat Sharma
 
AI_ 3 & 4 Knowledge Representation issues
Khushali Kathiriya
 
Multi-armed Bandits
Dongmin Lee
 
Probabilistic logic
Rushdi Shams
 
Representing uncertainty in expert systems
bhupendra kumar
 
Systemy rekomendacji, Algorytmy rankingu Top-N rekomendacji bazujące na nieja...
Bartlomiej Twardowski
 
Multi Objective Optimization
Nawroz University
 
Decision trees for machine learning
Amr BARAKAT
 
Boston ML - Architecting Recommender Systems
James Kirk
 
Machine Learning lecture4(logistic regression)
cairo university
 
Machine learning session4(linear regression)
Abhimanyu Dwivedi
 
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
Madhav Mishra
 
Production system in ai
sabin kafle
 
Recommendation system (1).pptx
prathammishra28
 
Content based filtering
Bendito Freitas Ribeiro
 
Text classification & sentiment analysis
M. Atif Qureshi
 
Personalized Page Generation for Browsing Recommendations
Justin Basilico
 

Viewers also liked (7)

PPTX
Dsh data sensitive hashing for high dimensional k-nn search
WooSung Choi
 
PDF
Economic Development and Satellite Images
Paul Raschky
 
PPTX
[Vldb 2013] skyline operator on anti correlated distributions
WooSung Choi
 
PPTX
Learning a nonlinear embedding by preserving class neibourhood structure 최종
WooSung Choi
 
PDF
Skyline queries
Victor Torras
 
PPTX
Locality sensitive hashing
Sameera Horawalavithana
 
PDF
Probabilistic data structures. Part 4. Similarity
Andrii Gakhov
 
Dsh data sensitive hashing for high dimensional k-nn search
WooSung Choi
 
Economic Development and Satellite Images
Paul Raschky
 
[Vldb 2013] skyline operator on anti correlated distributions
WooSung Choi
 
Learning a nonlinear embedding by preserving class neibourhood structure 최종
WooSung Choi
 
Skyline queries
Victor Torras
 
Locality sensitive hashing
Sameera Horawalavithana
 
Probabilistic data structures. Part 4. Similarity
Andrii Gakhov
 
Ad

Similar to An optimal and progressive algorithm for skyline queries slide (20)

PPT
PAM.ppt
janaki raman
 
PDF
kdtrees.pdf
swapnilslide2019
 
PPT
Indexing Data Structure
Vivek Kantariya
 
PDF
Skyline Query Processing using Filtering in Distributed Environment
IJMER
 
PPT
test pre
farazch
 
PDF
Data structures
NYversity
 
PDF
Data structures
IT Training and Job Placement
 
PPTX
Are there trends, changes in the mi.pptx
priyaaajadhav31
 
PPTX
Lecture 16 data structures and algorithms
Aakash deep Singhal
 
PPTX
Data analytics concepts
Hiranthi Tennakoon
 
KEY
Cassandra deep-dive @ NoSQLNow!
Acunu
 
PPT
prune and search method is explained here
vidhyapm2
 
PDF
DynamicProgramming.pdf
ssuser3a8f33
 
PDF
Gwt sdm public
Yasuo Tabei
 
PDF
Dynamic programming
Amit Kumar Rathi
 
PDF
The skyline operator lee, woonghee
Woonghee Lee
 
PDF
Algorithms summary (English version)
Young-Min kang
 
PPTX
Unit-2 for AIML including type of searches
SaranshGupta923104
 
PPTX
Chapter 5.pptx
Tekle12
 
PPT
DynamicProgramming.ppt
DavidMaina47
 
PAM.ppt
janaki raman
 
kdtrees.pdf
swapnilslide2019
 
Indexing Data Structure
Vivek Kantariya
 
Skyline Query Processing using Filtering in Distributed Environment
IJMER
 
test pre
farazch
 
Data structures
NYversity
 
Are there trends, changes in the mi.pptx
priyaaajadhav31
 
Lecture 16 data structures and algorithms
Aakash deep Singhal
 
Data analytics concepts
Hiranthi Tennakoon
 
Cassandra deep-dive @ NoSQLNow!
Acunu
 
prune and search method is explained here
vidhyapm2
 
DynamicProgramming.pdf
ssuser3a8f33
 
Gwt sdm public
Yasuo Tabei
 
Dynamic programming
Amit Kumar Rathi
 
The skyline operator lee, woonghee
Woonghee Lee
 
Algorithms summary (English version)
Young-Min kang
 
Unit-2 for AIML including type of searches
SaranshGupta923104
 
Chapter 5.pptx
Tekle12
 
DynamicProgramming.ppt
DavidMaina47
 
Ad

Recently uploaded (20)

PDF
How to Avoid 7 Costly Mainframe Migration Mistakes
JP Infra Pvt Ltd
 
PPTX
Introduction to Artificial Intelligence.pptx
StarToon1
 
PDF
Building Production-Ready AI Agents with LangGraph.pdf
Tamanna
 
PPT
Data base management system Transactions.ppt
gandhamcharan2006
 
PPTX
Human-Action-Recognition-Understanding-Behavior.pptx
nreddyjanga
 
PDF
AUDITABILITY & COMPLIANCE OF AI SYSTEMS IN HEALTHCARE
GAHI Youssef
 
PPTX
TSM_08_0811111111111111111111111111111111111111111111111
csomonasteriomoscow
 
PPTX
Climate Action.pptx action plan for climate
justfortalabat
 
PDF
Choosing the Right Database for Indexing.pdf
Tamanna
 
PPTX
Slide studies GC- CRC - PC - HNC baru.pptx
LLen8
 
PDF
Context Engineering vs. Prompt Engineering, A Comprehensive Guide.pdf
Tamanna
 
PPTX
Resmed Rady Landis May 4th - analytics.pptx
Adrian Limanto
 
PDF
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
PDF
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
PPTX
fashion industry boom.pptx an economics project
TGMPandeyji
 
PDF
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
PDF
Web Scraping with Google Gemini 2.0 .pdf
Tamanna
 
PPTX
AI Project Cycle and Ethical Frameworks.pptx
RiddhimaVarshney1
 
PDF
T2_01 Apuntes La Materia.pdfxxxxxxxxxxxxxxxxxxxxxxxxxxxxxskksk
mathiasdasilvabarcia
 
PDF
List of all the AI prompt cheat codes.pdf
Avijit Kumar Roy
 
How to Avoid 7 Costly Mainframe Migration Mistakes
JP Infra Pvt Ltd
 
Introduction to Artificial Intelligence.pptx
StarToon1
 
Building Production-Ready AI Agents with LangGraph.pdf
Tamanna
 
Data base management system Transactions.ppt
gandhamcharan2006
 
Human-Action-Recognition-Understanding-Behavior.pptx
nreddyjanga
 
AUDITABILITY & COMPLIANCE OF AI SYSTEMS IN HEALTHCARE
GAHI Youssef
 
TSM_08_0811111111111111111111111111111111111111111111111
csomonasteriomoscow
 
Climate Action.pptx action plan for climate
justfortalabat
 
Choosing the Right Database for Indexing.pdf
Tamanna
 
Slide studies GC- CRC - PC - HNC baru.pptx
LLen8
 
Context Engineering vs. Prompt Engineering, A Comprehensive Guide.pdf
Tamanna
 
Resmed Rady Landis May 4th - analytics.pptx
Adrian Limanto
 
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
fashion industry boom.pptx an economics project
TGMPandeyji
 
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
Web Scraping with Google Gemini 2.0 .pdf
Tamanna
 
AI Project Cycle and Ethical Frameworks.pptx
RiddhimaVarshney1
 
T2_01 Apuntes La Materia.pdfxxxxxxxxxxxxxxxxxxxxxxxxxxxxxskksk
mathiasdasilvabarcia
 
List of all the AI prompt cheat codes.pdf
Avijit Kumar Roy
 

An optimal and progressive algorithm for skyline queries slide

  • 1. INI Lab. An Optimal and Progressive Algorithm for Skyline Queries Dimitris Papadias, Yufei Tao, Greg Fu, Bernhard Seeger ACM SIGMOD’ 2003 Presenters KYEONG SEOK HYUN, WOO-SUNG CHOI, JA-YEON KIM,
  • 2. Abstract  An Optimal  and Progressive Algorithm  for Skyline Queries  Using R-Tree
  • 3. contents  1. Introduction  2. Related Work  2.1 Block Nested Loop (BNL)  2.5 Nearest Neighbor (NN)  3. Branch and Bound Skyline Algorithm  With I/O analysis  5. Experimental Evaluation
  • 5. Which one do you prefer? http://emperia.egloos.com/m/2516211 http://drmoontv.blogspot.kr/2013/03/blog-post_17.html http://www.huffingtonpost.kr/2014/11/13/story_n_6150254.html 5,000 Won 40,000 Won 4,500 Won http://flickrhivemind.net/User/Trollface%20T-Shirts/Interesting 혜자 >> 창렬
  • 6. preliminaries Formal definition of Dominates (≪)  Given a set of d-dimensional points 푇 We say that a point t1 ∈ 푇 DOMINATES another point t2 ∈ 푇  If and only if  ∀푖 ∈ 1, 2, 3, … , 푑 , 푡1 푖 ≧ 푡2[푖]  ∃푗 ∈ 1, 2, 3, … , 푑 , 푡1 푗 > 푡2[푗]  and Denoted by t2 ≪ t1  (simply saying, t1 이 이득) Definition from http://www.comp.nus.edu.sg/~atung/publication/k_dominant.pdf Note that the meaning of ‘dominates’ may differ according to type of application
  • 7. Which one do you prefer? http://emperia.egloos.com/m/2516211 http://drmoontv.blogspot.kr/2013/03/blog-post_17.html http://www.huffingtonpost.kr/2014/11/13/story_n_6150254.html 5,000 Won 40,000 Won 4,500 Won 4,500 Won http://flickrhivemind.net/User/Trollface%20T-Shirts/Interesting Still 혜자 >> 창렬
  • 8.  Hotel(attraction, 1/price, 1/distance)  Two Hotel  A : `80`, `1/15,000`, `1/500m`  B : `30`, `1/20,000`, `1/1500m`  퐵 ≪ 퐴  Why?  30<80  1/20,000 < 1/15,000  1/1,500m < 1/500m A 1/price attraction B Dominates! ≪ B A for example,
  • 9. Very important Problem Definition (mathematical)  The Skyline operator  Input - Given a set of objects P = {푝1, 푝2, … , 푝푁}  Output – {푝푖 | 푝푖 ∈ 푃 푎푛푑 ∄ 푝∗ ∈ 푃 푠. 푡. 푝푖 ≪ 푝∗} A B C Dominating Area(B) D E F “퐵 ∈ 푂푢푝푢푡, s푖푛푐푒 푛표 표푡ℎ푒푟 푝표푖푛푡 푃 ≫ 퐵”, correct x axis y axis G Common misconceptions “퐵 ∈ 푂푢푝푢푡 s푖푛푐푒 퐵 ≫ 퐶 , D, F” , wrong
  • 10. Naïve approach for processing skyline queries
  • 11. Exhaustive Test  Suppose there are n objects in the given set  퐷푥 = {표1, 표2, … , 표푛}  Algorithm -Naïve 1  푓표푟 푒푎푐ℎ 표푏푗푒푐푡 표푥 ∈ 퐷  푏표표푙푒푎푛 푖푠퐷표푚푖푛푎푡푒푑 = 푓푎푙푠푒  푓표푟 푒푎푐ℎ 표푏푗푒푐푡 표푦 ∈ 퐷  푖푓 ¬(표푥 = 표푦) 퐴푁퐷 ¬ 표푥 ≪ 표푦 푡ℎ푒푛 푐표푛푡푖푛푢푒;  푒푙푠푒  푡ℎ푒푛 푖푠퐷표푚푖푛푎푡푒푑 = 푡푟푢푒;  break;  푖푓 ! 푖푠퐷표푚푖푛푎푡푒푑 푆 ∪ {표푥} A B F C D G E
  • 12.  Suppose there are n objects in the given set  퐷푥 = {표1, 표2, … , 표푛}  Algorithm -Naïve 1  푓표푟 푒푎푐ℎ 표푏푗푒푐푡 표푥 ∈ 퐷  푏표표푙푒푎푛 푖푠퐷표푚푖푛푎푡푒푑 = 푓푎푙푠푒  푓표푟 푒푎푐ℎ 표푏푗푒푐푡 표푦 ∈ 퐷  푖푓 ¬(표푥 = 표푦) 퐴푁퐷 ¬ 표푥 ≪ 표푦 푡ℎ푒푛 푐표푛푡푖푛푢푒;  푒푙푠푒  푡ℎ푒푛 푖푠퐷표푚푖푛푎푡푒푑 = 푡푟푢푒;  break;  푖푓 ! 푖푠퐷표푚푖푛푎푡푒푑 푆 ∪ {표푥} Exhaustive Test Nested Loop Structure Modification: (Algorithm -Naïve 2) Idea 1. Use Nested Loop Structure Idea 2. Take advantage of ‘Block-transfer’ towards better re-usability! Block A Block B A B C D E F G          The Inherited Limitation of these approaches 1. It needs full-scan over the data 2. Though, query result contains only a small fraction of the dataset 3. That is, these approaches are wasteful
  • 13. R-Tree Index Approach for processing skyline queries
  • 14. Preliminaries R-Tree  Nearest Neighbor Query
  • 15. Preliminaries R-Tree: Balanced tree for indexing multi-dimensional object  Support Dynamic operation (insert, update, delete) R-Tree Index Approach
  • 16. R-Tree VS B-Tree  B+-Tree  Balanced  Requiring that all leaves be at the same depth  Leaf nodes contain one dimensional value R-Tree  Similar to B+-Tree  Leaf nodes contain d-dimensional value R-Tree Index Approach http://courses.cs.washington.edu/courses/cse444/09sp/hw/hw3/hw3.html
  • 17. Spatial objects (or d-dimensional objects or geometric objects)  d-dimensional object?  R-Tree Used for the Organization of a set of d-dimensional objects  How?  Main Idea  Minimum Bounding Rectangles (MBRs) <Objects in 2-dimension space> http://caversham.otago.ac.nz/research/geog.php
  • 18. Quiz What is the minimum number of points for representing a rectangle?  Assumption: each rectangle is parallel to the coordinate axes 18 6 8 7 4 x y 0 R-Tree Index Approach
  • 20. Nearest Neighbor (NN) Query Processing using R-Tree Nearest Neighbor Query  Input  Given a set of objects P = {푝1, 푝2, … , 푝푁}  Query Point - q  Output – {푝푖 | 푝푖 ∈ 푃 푎푛푑 ∄ 푝∗ ∈ 푃 푠. 푡. 퐿푝 푝푖 , 푞 > 퐿푝(푝∗, 푞)} 0 x y See how it works in appendix R-Tree Index Approach
  • 21. Root node 0 1 MINMAXDIST(X,1) 0 x y MINDIST(X, 0) MINDIST(X,1) MINMAXDIST(X, 0) Key IDEA!  Pruning! http://ko.aliexpress.com/store/category/pruning-tools/519349_100005637.html http://www.installitdirect.com/blog/easy-tips-for-pruning-your-plants/
  • 23. Back to the original question Skyline with R-Tree
  • 24. R-Tree Index Approach  Let’s process skyline objects using R-Tree  Strategy 1 – Use traditional tech. (i.e. NN Query)  Strategy 2 – This paper  Strategy 1  Partition the data using NN Query recursively  Distance metric: 퐿1 푛표푟푚  First NN Query -> start from the ideal point (i.e. zero point)
  • 26. Dominating Area(i) example a x axis y axis b c d e f g i m n k i IDEAL
  • 27. To-do Area 1 To-do Area 2 example a x axis y axis b i k IDEAL Dominating Area(i) TO-DO Area 2 TO-DO Area 1
  • 28. Next, test these area (only to find nothing) To--do Arrea 2 To-do Area 1 example a x axis y axis b i k Dominating Area(i) TO-DO Area 2 TO-DO Area 1 Dominating Area(k) IDEAL ` `
  • 29. To-do Area 1 example x axis i k Dominating Area(i) TO-DO Area 1 Dominating Area(k) a To-do Area 1 y axis b IDEAL Dominating Area(a)
  • 30. Dominating Area(k) Result    Dominating Area(i) IDEAL Dominating Area(a) x axis y axis i k a
  • 31. Limitation of Strategy 1  Generally speaking,  In a d-dimensional space,  Each skyline object discovered causes d recursive partitioning phase Dominated
  • 32. Limitation of Strategy 1  Generally speaking,  In a d-dimensional space,  Each skyline object discovered causes d recursive partitioning phase Area 1 Dominated Area 2 Dominated Dominated Area 3
  • 33. What if?  In general, for d>2  The overlapping of the partitions  Necessitates DUPLICATE ELIMINATION Area 1 Domin ated Area 2 Domin ated Domin ated Area 3
  • 34. Disadvantage !  Strategy 1 needs an additional phase  For removing redundant outputs  4 elimination methods  Laisser-faire  Propagate  Merge  Fine-grained Partitioning  They works  Problem: sub-optimal
  • 35. Strategy 2 Branch & Bound Skyline Algorithm
  • 36. Idea!  Similar to previous NN Query  Branch & Bound Skyline (BBS) http://greatleadersserve.org/leadership/big-idea-great-leaders-serve/
  • 37. h example a x axis y axis b c d e f g i m n k l IDEAL L1E2 L1E1 L2E4 L2E2 L2E3 L2E1 Root Ptr 1 Ptr 2 Ptr 3 Ptr 4 L1E1 Ptr 1 Ptr 2 Ptr 3 Ptr 4 L1E2 Ptr 1 Ptr 2 Ptr 3 Ptr 4 L2E1 a b c null L2E2 c h i null L2E3 d g m null L2E4 f k l n
  • 38. example h a x axis y axis b c d e f g i m n k l IDEAL L1E2 L1E1 L2E4 L2E2 L2E3 L2E1 L1E1 L1E2 Queue L1E2, 4 L1E1, 10 Root Ptr 1 Ptr 2 Ptr 3 Ptr 4 L1E1 Ptr 1 Ptr 2 Ptr 3 Ptr 4 L1E2 Ptr 1 Ptr 2 Ptr 3 Ptr 4 L2E1 a b c null L2E2 c h i null L2E3 d g m null L2E4 f k l n Result
  • 39. example h a x axis y axis b c d e f g i m n k l IDEAL L1E2 L1E1 L2E4 L2E2 L2E3 L2E1 9 L1E2, 4 L1E2 Queue L2E2, 5 3 5 7 L1E1, 10 L2E3, 7 L2E4, 8 2 1 1 Root Ptr 1 Ptr 2 Ptr 3 Ptr 4 L1E1 Ptr 1 Ptr 2 Ptr 3 Ptr 4 L1E2 Ptr 1 Ptr 2 Ptr 3 Ptr 4 L2E1 a b c null L2E2 c h i null L2E3 d g m null L2E4 f k l n Result
  • 40. example h a x axis y axis b c d e f g i m n k l IDEAL L1E2 L1E1 L2E4 L2E2 L2E3 L2E1 Root Ptr 1 Ptr 2 Ptr 3 Ptr 4 L1E1 Ptr 1 Ptr 2 Ptr 3 Ptr 4 L1E2 Ptr 1 Ptr 2 Ptr 3 Ptr 4 L2E1 a b c null L2E2 c h i null L2E3 d g m null L2E4 f k l n Queue 3 5 7 9 2 1 1 L2E2, 5 L2E3, 7 L2E4, 8 L1E1, 10 c, 12 h, 7 i, 5 Result
  • 41. example h a x axis y axis b c d e f g i m n k l IDEAL L1E2 L1E1 L2E4 L2E2 L2E3 L2E1 Root Ptr 1 Ptr 2 Ptr 3 Ptr 4 L1E1 Ptr 1 Ptr 2 Ptr 3 Ptr 4 L1E2 Ptr 1 Ptr 2 Ptr 3 Ptr 4 L2E1 a b c null L2E2 c h i null L2E3 d g m null L2E4 f k l n Queue 3 5 7 9 2 1 1 i, 5 h, 7 L2E4, 8 L1E1, 10 c, 12 Result L2E3, 7
  • 42. example h a x axis y axis b c d e f g i m n k l IDEAL L1E2 L1E1 L2E4 L2E2 L2E3 L2E1 Root Ptr 1 Ptr 2 Ptr 3 Ptr 4 L1E1 Ptr 1 Ptr 2 Ptr 3 Ptr 4 L1E2 Ptr 1 Ptr 2 Ptr 3 Ptr 4 L2E1 a b c null L2E2 c h i null L2E3 d g m null L2E4 f k l n Queue 3 5 7 9 2 1 1 h, 7 L2E4, 8 L1E1, 10 c, 12 i, 5 Result L2E3, 7
  • 43. example h a x axis y axis b c d e f g i m n k l IDEAL L1E2 L1E1 L2E4 L2E2 L2E3 L2E1 Root Ptr 1 Ptr 2 Ptr 3 Ptr 4 L1E1 Ptr 1 Ptr 2 Ptr 3 Ptr 4 L1E2 Ptr 1 Ptr 2 Ptr 3 Ptr 4 L2E1 a b c null L2E2 c h i null L2E3 d g m null L2E4 f k l n Queue 3 5 7 9 2 1 1 L2E4, 8 L1E1, 10 c, 12 i, 5 Result k, 10 f n i
  • 44. example h a x axis y axis b c d e f g i m n k l IDEAL L1E2 L1E1 L2E4 L2E2 L2E3 L2E1 Root Ptr 1 Ptr 2 Ptr 3 Ptr 4 L1E1 Ptr 1 Ptr 2 Ptr 3 Ptr 4 L1E2 Ptr 1 Ptr 2 Ptr 3 Ptr 4 L2E1 a b c null L2E2 c h i null L2E3 d g m null L2E4 f k l n Queue 3 5 7 9 2 1 1 i, 5 Result a, 10 k, 10
  • 46. Analysis of Strategy 1  Notation Variable Description s #of Skyline obj e Empty Query ne Non-empty Query r Redendent Query d d-dimension h Height of the given R-Tree Recursion Tree … d new recursive NN … …  푛푒 = 푠 + 푟  푒 = 푛푒 ∙ 푑 − 1 + 1, 푠푖푛푐푒 푛푒 + 푒 = 푛푒 ∙ 푑 + 1(푟표표푡)  푒 = 푠 + 푟 푑 − 1 + 1  푁퐴푁푁 ≥ 푒 + 푠 + 푟 ∗ ℎ = 푠 + 푟 푑 − 1 + 1 + 푠 + 푟 ℎ > 푠 ∙ ℎ ∙ 푑
  • 48. Analysis of Strategy 2 (brief version)  Notation Variable Description s #of Skyline obj h Height of the given R-Tree  푠 ∙ ℎ ≥ 푁퐴퐵퐵푆  푁퐴푁푁 > 푠 ∙ ℎ ∙ 푑 > 푁퐴퐵퐵푆
  • 49. Is it the optimal solution? BBS Algorithm
  • 50. Proof 1. Termination & Correctness  Lemma 1. BBS visits entries in ascending order  Of their distance to the ‘ideal point’  Lemma 2. Any data point added into Result_Set  Is guaranteed to be a final skyline point  Proof.  Suppose not then 푝푗 was added into Result_Set but not a final skyline point  Then, ∃ 푝∗ ∈ 퐷퐵 푠. 푡, 푝∗ ≫ 푝푗 , which means L1 ideal, p∗ < L1(ideal, pj)  However, observe that 푝∗ must be visited before 푝푗 by lemma 1.  Contradiction: 푝푗 should have been pruned, which contradicts the assumption.  Lemma 3. All data point will be examined, unless one of its ancestor nodes has been pruned.
  • 51. Lemmas for the theorem Lemma 4. Any skyline algorithm based on R-Tree must access all the nodes whose mbrs intersects the SSR  Lemma 5. If an entry e doesn’t intersect the SSR  Then ∃푝∗ 푠. 푡. 퐿1 푖푑푒푎푙, 푝∗ < 퐿1(푖푑푒푎푙, 푒. 푙푒푓푡푑표푤푛)  Theorem: The # of node accesses performed by BBS is OPTIMAL Dominating Area(B) A B F C D E x axis y axis G SSR
  • 52. Proof of the theorem  Proof 1. BBS only accesses nodes that may contain skyline points.  That is, BBS only accesses nodes whose mbrs intersect the SSR  Suppose not  Node e that doesn’t intersect the SSR  ∃푝∗ by lemma 5  Contradicts, by lemma 1  Proof 2. BBS visits nodes at most once. (trivial) Dominating Area(B) A B F C D E x axis y axis G SSR
  • 53. To quantify the actual cost A  Skip the details  B C Dominating Area(B) D E F x axis y axis G SSR
  • 59. Constrained skyline queries  N=1M, d=3 h a x axis y axis b c d e f g i m n k l IDEAL L1E2 L1E1 L2E4 L2E2 L2E3 L2E1 Constrain