SlideShare a Scribd company logo
Algorithms for Spatial Joins
and Spatial Query Processing
and Optimization
-Natasha Mandal
Applications of Spatial Queries
O Spatial Database Systems
O Geographical Information Systems
O Urban Planning
O CAD/CAM systems
O Image Databases
POINT
QUERIES
NEAREST
NEIGHBOR
QUERY
RANGE
QUERY
MAP
OVERLAY
Goals
O Understand more about Query Processing in
SDBMS
O Learn more about Spatial Operations in SDBMS
O Learn about Optimization in SDBMS
What is Query Processing?
Why Optimize?
O Queries are expressed in a high-level declarative
language such as SQL.
O The database software is supposed to map the
query into a sequence of operations supported by
spatial indexes and storage structures.
O Goals:
 Process a query accurately
 Do this in the minimum amount of time possible
What is Query Processing?
Why Optimize?
O Queries are composed of a basic set of relations.
O Query processing and optimization are divided into
two steps:
 Design and fine-tune algorithms for each of the
basic relational operators.
 Map high-level queries into a composition of these
basic relational operators and optimize (using
information in the first step).
Challenges in Spatial Databases
 Unlike relational databases, spatial databases
have no fixed set of operators that serve as
building blocks for query evaluation (ex. Overlap
and Intersect may return a similar result).
 Spatial databases have large volumes of complex
objects (with spatial extensions) which cannot be
sorted in a one-dimensional array.
 The assumption that I/O costs dominate CPU
costs is no longer valid since computationally
expensive algorithms are used to test for spatial
predicates.
Spatial Operations
O Spatial Operations can be classified into four
groups:
 Update - Modify, Create etc.
 Selection –
o Point Query (𝑃𝑄): Given a query point 𝑝, find all spatial
objects 𝑂 that contain it:
𝑃𝑄 𝑝 = {𝑂|𝑝 ∈ 𝑂. 𝐺 ≠ ∅}
where 𝑂. 𝐺 is the geometry of the object 𝑂.
Ex. “Find all river flood-plains which contain the CITY” [CITY
is assumed to be a point type]
o Range Query (𝑅𝑄): Given a query polygon 𝑃, find all spatial
objects 𝑂 which intersect 𝑃. [If 𝑃 is a rectangle, 𝑅𝑄 is a
window query]
𝑅𝑄(𝑃)={𝑂│𝑂.‫ܩ‬ ∧ 𝑃.‫}∅≠ܩ‬
Ex. “Get all forests which overlap with flood plain of River
Nile”
Spatial Operations
 Spatial Join – This relation holds when two
tables 𝑅 and 𝑆 are joined on a spatial predicate
𝜃 . Map Overlay is an important variant of
Spatial Join.
𝑅 ⋈ 𝜃 𝑆 = 𝑜1, 𝑜2 𝑜1 ∈ 𝑅, 𝑜2 ∈ 𝑆, 𝜃 𝑜1. 𝐺, 𝑜2. 𝐺
Some example 𝜃 predicates are intersect, contains,
is enclosed by, distance, northwest, adjacent,
meets, overlap etc.
Spatial Operations
Ex. “Find all forest stands and river plains which
overlap”
SELECT FS.name, FP.name
FROM Forest Stand FS, Flood Plain FP
WHERE overlap(FS.G, FP.G)
 Spatial Aggregate – These are usually variants of
the nearest neighbor search.
𝑁𝑁𝑄 𝑜′ = {𝑜|∀𝑜": 𝑑𝑖𝑠𝑡(𝑜′. 𝐺, 𝑜. 𝐺) ≤ 𝑑𝑖𝑠𝑡(𝑜′. 𝐺, 𝑜". 𝐺)}
Two-Step Query Processing of
Object Operations
O Filter Step: Spatial Objects are represented by
simpler approximations such as MBR or different
predicates. No tuple from the final answer using
exact geometry should be eliminated in the filter
step.
For ex. touch(River.Flood-Plain, :CITY) may be
replaced by overlap(MBR(River.Flood-Plain),
MBR(:CITY))
Two-Step Query Processing of
Object Operations
 Refinement Step: The exact geometry of each
element from the candidate set and the exact
predicate are examined. This may require a CPU
intensive application and may be processed
outside the spatial database (in a GIS).
Filtering – MBRs
Geometric Filter (Approximations) – Convex Hull,
Minimum Enclosed Circle etc.
Exact Geometry – Plane Sweep etc.
Algorithms for Query Processing and Optimization of Spatial Operations
Techniques for Spatial Selection
O What are the alternative ways of processing a
query? It depends on how the file containing the
relations being queried is organized.
 Unsorted Data and No Index – Use brute force to
scan the whole file and test each record for the
predicate.
 Spatial Indexing – Can be used to access geometric
data. The MBRs of spatial attributes of a relation
can be indexed.
 Space filling curves – These can be used to map
points of multidimensional space into one
dimensional space. A B-Tree index can be imposed
on ordered entries to enhance the search.
General Spatial Selection
O A selection condition can be a combination of
several “primitive” selection conditions.
O For spatial selections, the order in which the
individual conditions in CNF is processed is
important because different spatial conditions
have different processing costs.
O Predicates can be applied in ascending order of
𝑅𝑎𝑛𝑘.
𝑅𝑎𝑛𝑘 =
𝑠𝑒𝑙𝑒𝑐𝑡𝑖𝑣𝑖𝑡𝑦 − 1
𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡𝑖𝑎𝑙 𝑐𝑜𝑠𝑡
General Spatial Selection
𝑠𝑒𝑙𝑒𝑐𝑡𝑖𝑣𝑖𝑡𝑦 𝑝 =
𝑐𝑎𝑟𝑑𝑖𝑛𝑎𝑙𝑖𝑡𝑦(𝑜𝑢𝑡𝑝𝑢𝑡(𝑝))
𝑐𝑎𝑟𝑑𝑖𝑛𝑎𝑙𝑖𝑡𝑦(𝑖𝑛𝑝𝑢𝑡(𝑝))
𝑑𝑖𝑓𝑓𝑒𝑛𝑡𝑖𝑎𝑙 𝑐𝑜𝑠𝑡 = per tuple cost of a predicate. It
remains constant throughout the life of the function
and can be stored in the system catalog (along with
selectivity).
Spatial Join
O Spatial Join can be an expensive operation and
the presence of indices can help in the fast
processing of queries.
Classification of spatial join methods
Both inputs are indexed One input is indexed Neither input is indexed
 transformation to z-values
 spatial join index
 tree matching
 index nested loops
 seeded tree join
 build and match
 sort and match
 slot index spatial join
 spatial hash join
 partition based spatial merge join
 size separation spatial join
 scalable sweeping-based spatial join
The R-Tree Join
O This algorithm can be used when both the inputs
are indexed.
O It is based on the enclosure property of trees: if
two nodes do not intersect, then there are no
rectangles below them that can intersect.
O RJ starts from the roots of the trees to be joined
and finds pairs of overlapping entries.
O For each such pair, the algorithm is recursively
called until the leaf levels where overlapping pairs
constitute solutions.
O The following algorithm assumes both the R-Trees
are of equal height (this can easily be extended).
The R-Tree Join
Alg. RJ(Rtree_Node ni, RTNode nj)
for each entry ej,y ∈ nj, do
{
for each entry ei,x ∈ ni with ei,x ⋂, ej,y ≠ ∅ do
{
if ni is a leaf node /* nj is also a leaf node */
then Output (ei,x, ej,y );
else /* intermediate nodes */
{
ReadPage(ei,x. ref); ReadPage(ej,y.ref);
RJ(ei,x ref, ej,y ref);
}
}
} /* end for */
The R-Tree Join
The R-Tree Join
 Optimizations for CPU speed:
 Search Space Restriction
 Plane Sweep – sorting in one dimension
reduces time for finding overlapping pairs
 Optimizations for I/O speed:
 Plane Sweep - consecutive computed
pairs overlap with high probability
 Breadth-first traversal that sorts the output
at each level in order to reduce the
number of page accesses.
Spatial Hash Join
O This algorithm can be used to compute
the join of two non-indexed datasets 𝑅
(build input i.e. smaller relation) and 𝑆
(probe input).
O 𝑅 is partitioned into 𝐾 buckets.
 The initial buckets are points determined
based on sampling.
 Each object is inserted into the bucket that
is enlarged the least.
Spatial Hash Join
O 𝑆 is hashed into buckets with the same extent
as 𝑅's buckets
 An object is inserted into all buckets that intersect
it.
 Some objects may be assigned to multiple buckets
(replication) and some may not be inserted at all
(filtering).
O The two bucket sets are joined; each bucket from
R is matched with only one bucket from S, thus
requiring a single scan of both files.
O If for some pair neither bucket fits in memory, an
R-tree is built for one of them, and the bucket-to-
bucket join is executed in an index nested loop
fashion.
Spatial Hash Join
Slot Index Spatial Join
O This algorithm is applicable when there is an
R-tree for one of the inputs (𝑅).
O If 𝐾 is the desired number of partitions, SISJ
will find the topmost level of the tree such that
the number of entries is larger than or equal
to 𝐾. These entries are then grouped into 𝐾
(possibly overlapping) partitions called slots.
 Each slot contains the MBR of the indexed R-
tree entries, along with a list of pointers to
these entries.
Slot Index Spatial Join
 SISJ starts with a single empty slot and inserts
entries into the slot that is enlarged the least.
 When the maximum capacity of a slot is reached
(determined by 𝐾 and the total number of entries),
either some entries are deleted and reinserted or
the slot is split according to the R*-tree splitting
policy.
O The second dataset 𝑆 is hashed into buckets with
the same extents as the slots.
 If an object from 𝑆 does not intersect any bucket, it
is filtered.
 If it intersects more than one bucket, it is replicated.
Slot Index Spatial Join
O The join phase
 All data from the R-tree of 𝑅 indexed by a slot
are loaded and joined with the corresponding
hash-bucket from 𝑆 using plane sweep.
 If the data to be joined does not fit in memory,
they can be joined using an algorithm which
employs external sorting and then plane
sweep.
 During the join phase of SISJ, when no data
from 𝑆 is inserted into a bucket, the sub-tree
data under the corresponding slot is not
loaded (slot filtering).
Slot Index Spatial Join
Query Optimization
O The metric used for an evaluation plan is time
required to execute the query. For spatial
databases this would include I/O and CPU costs.
O A query optimizer (a module in the database
software) generates different evaluation plans and
determines the appropriate execution strategy.
O The idea is to avoid the worst plans and choose a
good one (seldom the best one).
O The procedures of query optimizer can be divided
into two parts - 𝑙𝑜𝑔𝑖𝑐𝑎𝑙 𝑡𝑟𝑎𝑛𝑠𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 and
𝑑𝑦𝑛𝑎𝑚𝑖𝑐 𝑝𝑟𝑜𝑔𝑟𝑎𝑚𝑚𝑖𝑛𝑔.
Algorithms for Query Processing and Optimization of Spatial Operations
Logical Transformation
O Parsing
 The parser checks the syntax and transforms the
statement into a query tree.
 Parsers for spatial databases have to be more
sophisticated to identify and manage user-defined
data types.
 The leaf nodes of the query tree correspond to the
relations involved and the internal nodes correspond
to the operations.
 Query processing starts at the leaf nodes and
proceeds up until the operation at the root node has
been performed.
Logical Transformation
SELECT L.Name FROM
Lake L, Facilities Fa
WHERE Area(L.G)>20
AND Fa.Name
=“Campground” AND
Distance(Fa.G, L.G)<50
𝜋 𝐿.𝑁𝑎𝑚𝑒
𝜎𝐴𝑟𝑒𝑎.𝐺>20
𝜎 𝐹𝑎.𝑁𝑎𝑚𝑒="𝐶𝑎𝑚𝑝𝑔𝑟𝑜𝑢𝑛𝑑"
⋈ 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝐹𝑎.𝐺,𝐿.𝐺 <50
𝐿𝑎𝑘𝑒 𝐿 𝐹𝑎𝑐𝑖𝑙𝑖𝑡𝑖𝑒𝑠 𝐹𝑎
Logical Transformation
O Logical Transformation
 The query tree generated by parser is mapped onto
equivalent query trees (based on a formal set of
rules inherited from relational algebra).
 After equivalent trees are enumerated, we can apply
heuristics to filter out non-candidates.
 Clear-cut heuristic may not apply for spatial
databases due to user-defined functions etc.
 𝑅𝑎𝑛𝑘 can be used as a heuristic. 𝑆𝑒𝑙𝑒𝑐𝑡𝑖𝑣𝑖𝑡𝑦 and
𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡𝑖𝑎𝑙 𝑐𝑜𝑠𝑡 can be stored in the System
Catalog.
Logical Transformation
O Equivalence Rules:
 Selections
o 𝜎𝑐1∧𝑐2∧⋯𝑐𝑛(𝑅) ≡ 𝜎𝑐1(𝜎𝑐2 … 𝜎𝑐𝑛 𝑅 … ) – Can push all
non-spatial conditions towards the right.
o 𝜎𝑐1(𝜎𝑐2 𝑅 ) ≡ 𝜎𝑐2(𝜎𝑐1 𝑅 )
 Projections
o 𝜋 𝑎1(𝑅) ≡ 𝜋 𝑎1 𝜋 𝑎2 … 𝜋 𝑎𝑛 𝑅 … if 𝑎𝑖 ⊂ 𝑎𝑖+1for 𝑖 =
1, … 𝑛 − 1
 Cross Product and Joins
o 𝑅 ⋈ 𝑆 ≡ 𝑆 ⋈ 𝑅
o 𝑅 ⋈ (𝑆 ⋈ 𝑇) ≡ (𝑅 ⋈ 𝑆) ⋈ 𝑇
Logical Transformation
 Selection, Projection and Joins
o If the selection condition involves attributes retained by
the projection operator
𝜋 𝑎(𝜎𝑐 𝑅 ) ≡ 𝜎𝑐(𝜋 𝑎 𝑅 )
o If a selection condition involves only an attribute that is
present in 𝑅 and not in 𝑆 then
𝜎𝑐(𝑅 ⋈ 𝑆) ≡ 𝜎𝑐(𝑅) ⋈ 𝑆
o Projection can be computed with a join:
𝜋 𝑎(𝑅 ⋈ 𝑆) ≡ 𝜋 𝑎1(𝑅) ⋈ 𝜋 𝑎2(𝑆)
where 𝑎1 ⊆ 𝑎 which appears in 𝑅 and 𝑎2 ⊆ 𝑎 which
appears in 𝑆
Cost Based Optimization:
Dynamic Programming
O Dynamic Programming is used to determine the
optimal execution strategy from a set of execution
plans.
O The optimal solution minimizes the cost function.
O We focus on each node of query tree and enumerate
the different execution strategies available to process
the node. The different processing strategies for each
node when combined for the whole query constitutes
the plan space.
O The cardinality of plan space might be high and the
optimization time must be kept minimum. This
suggests that we should select a good (not the best)
plan.
Cost Based Optimization:
Dynamic Programming
O The factors that a good cost function must take
into account are:
o Access cost – Searching for and transferring data
from secondary storage.
o Storage cost – Storing intermediate temporary
relations produced by an execution strategy.
o Computation cost – CPU cost of performing in-
memory operations.
o Communication cost – Transferring information
between the client and server.
Cost Based Optimization:
Dynamic Programming
O Systems Catalog
 It contains the information required by the cost
function to design an optimal execution strategy.
 It includes:
o the size of each file
o the number of records in each file
o number of blocks over which records are spread
o information about indexes and indexing attributes
o 𝑠𝑒𝑙𝑒𝑐𝑡𝑖𝑣𝑖𝑡𝑦 and 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡𝑖𝑎𝑙 𝑐𝑜𝑠𝑡
o can materialize expensive, user-defined functions
and index their values for fast retrieval
Cost Based Optimization:
Dynamic Programming
O Cost Functions
𝑐𝑜𝑠𝑡 = 𝐸𝑥𝑝 𝑟𝑒𝑐𝑜𝑟𝑑𝑠_𝑒𝑥𝑎𝑚𝑖𝑛𝑒𝑑 + 𝐾 ∗ 𝐸𝑥𝑝(𝑝𝑎𝑔𝑒𝑠_𝑟𝑒𝑎𝑑)
 𝐸𝑥𝑝 𝑟𝑒𝑐𝑜𝑟𝑑𝑠_𝑒𝑥𝑎𝑚𝑖𝑛𝑒𝑑 = expected number of records read
[measure of CPU time]
 𝐸𝑥𝑝(𝑝𝑎𝑔𝑒𝑠_𝑟𝑒𝑎𝑑)= expected number of pages read from
storage [measure of I/O time]
 𝐾= measure of how important CPU resources are relative to
I/O resources
O Decomposition and Merge in Hybrid Architecture
 A query is decomposed into spatial and non-spatial part.
 Subqueries are optimized in separate modules and are
merged.
Conclusion
O We learnt about the 2-Step Query Processing
paradigm.
O We reviewed algorithms for Spatial Operations like
Spatial Join.
O We learnt how Dynamic Programming can be
used to optimize queries based on the cost
function.
Algorithms for Query Processing and Optimization of Spatial Operations

More Related Content

What's hot (20)

PDF
Data base management system LAB MANUAL KCS 551.pdf
VandanaTripathi32
 
PPTX
Introduction to Keras
John Ramey
 
PPTX
FDS_dept_ppt.pptx
SatyajitPatil42
 
PPTX
Dbscan algorithom
Mahbubur Rahman Shimul
 
PPT
Multi core-architecture
Piyush Mittal
 
PDF
database management system lab files
Nitesh Dubey
 
PPTX
Dynamic multi level indexing Using B-Trees And B+ Trees
Pooja Dixit
 
PDF
9 virtual memory management
Dr. Loganathan R
 
PPTX
INTERCONNECTION STRUCTURE
VENNILAV6
 
PPTX
Dynamic storage allocation techniques in Compiler design
kunjan shah
 
PPTX
04 Classification in Data Mining
Valerii Klymchuk
 
PPT
Program control
Rahul Narang
 
PDF
Logic microoperations
Nitesh Singh
 
PDF
Address in the target code in Compiler Construction
Muhammad Haroon
 
PPT
Data transfer and manipulation
Sanjeev Patel
 
PDF
Bus structure in Computer Organization.pdf
mvpk14486
 
PPTX
Applications of paralleL processing
Page Maker
 
PPTX
Computer Organization and Architecture.pptx
AshokRachapalli1
 
PPTX
Artificial intelligence NEURAL NETWORKS
REHMAT ULLAH
 
PPTX
Evaluating hypothesis
swapnac12
 
Data base management system LAB MANUAL KCS 551.pdf
VandanaTripathi32
 
Introduction to Keras
John Ramey
 
FDS_dept_ppt.pptx
SatyajitPatil42
 
Dbscan algorithom
Mahbubur Rahman Shimul
 
Multi core-architecture
Piyush Mittal
 
database management system lab files
Nitesh Dubey
 
Dynamic multi level indexing Using B-Trees And B+ Trees
Pooja Dixit
 
9 virtual memory management
Dr. Loganathan R
 
INTERCONNECTION STRUCTURE
VENNILAV6
 
Dynamic storage allocation techniques in Compiler design
kunjan shah
 
04 Classification in Data Mining
Valerii Klymchuk
 
Program control
Rahul Narang
 
Logic microoperations
Nitesh Singh
 
Address in the target code in Compiler Construction
Muhammad Haroon
 
Data transfer and manipulation
Sanjeev Patel
 
Bus structure in Computer Organization.pdf
mvpk14486
 
Applications of paralleL processing
Page Maker
 
Computer Organization and Architecture.pptx
AshokRachapalli1
 
Artificial intelligence NEURAL NETWORKS
REHMAT ULLAH
 
Evaluating hypothesis
swapnac12
 

Similar to Algorithms for Query Processing and Optimization of Spatial Operations (20)

PPTX
Spatial databases
Neha Kulkarni
 
PPTX
Spot db consistency checking and optimization in spatial database
Pratik Udapure
 
PPTX
Optimizing spatial database
Ishraq Al Fataftah
 
PDF
Cg4201552556
IJERA Editor
 
PPT
ADVANCE DATABASE MANAGEMENT SYSTEM CONCEPTS & ARCHITECTURE by vikas jagtap
Vikas Jagtap
 
PPT
What is a spatial database system? How is it different from a RDBMS?
LonJames2
 
PPT
Spatial Database Systems
Asifuzzaman Hridoy
 
PPTX
Spatial Databases
Pratibha Chaudhary
 
PPT
PAM.ppt
janaki raman
 
PPT
Indexing Data Structure
Vivek Kantariya
 
PDF
Spatio textual similarity join
IJDKP
 
PDF
Performing Fast Spatial Query Search by Using Ultimate Code Words
BRNSSPublicationHubI
 
PDF
3 Spatial query spatial query spatial query.pdf
temesgenabebe1
 
PPT
HGrid A Data Model for Large Geospatial Data Sets in HBase
Dan Han
 
PPT
Building a Spatial Database in PostgreSQL
Kudos S.A.S
 
PPTX
SQLPASS AD404-M Spatial Index MRys
Michael Rys
 
ODP
The PostgreSQL Query Planner
Command Prompt., Inc
 
PPTX
GIS Ppt 5.pptx: SPACIAL DATA ANALSYSISIS
mulugeta48
 
PDF
Spatial index(2)
Mohsen Rashidian
 
PPTX
Join operation
Jeeva Nanthini
 
Spatial databases
Neha Kulkarni
 
Spot db consistency checking and optimization in spatial database
Pratik Udapure
 
Optimizing spatial database
Ishraq Al Fataftah
 
Cg4201552556
IJERA Editor
 
ADVANCE DATABASE MANAGEMENT SYSTEM CONCEPTS & ARCHITECTURE by vikas jagtap
Vikas Jagtap
 
What is a spatial database system? How is it different from a RDBMS?
LonJames2
 
Spatial Database Systems
Asifuzzaman Hridoy
 
Spatial Databases
Pratibha Chaudhary
 
PAM.ppt
janaki raman
 
Indexing Data Structure
Vivek Kantariya
 
Spatio textual similarity join
IJDKP
 
Performing Fast Spatial Query Search by Using Ultimate Code Words
BRNSSPublicationHubI
 
3 Spatial query spatial query spatial query.pdf
temesgenabebe1
 
HGrid A Data Model for Large Geospatial Data Sets in HBase
Dan Han
 
Building a Spatial Database in PostgreSQL
Kudos S.A.S
 
SQLPASS AD404-M Spatial Index MRys
Michael Rys
 
The PostgreSQL Query Planner
Command Prompt., Inc
 
GIS Ppt 5.pptx: SPACIAL DATA ANALSYSISIS
mulugeta48
 
Spatial index(2)
Mohsen Rashidian
 
Join operation
Jeeva Nanthini
 
Ad

Recently uploaded (20)

PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Biography of Daniel Podor.pdf
Daniel Podor
 
PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Biography of Daniel Podor.pdf
Daniel Podor
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
Ad

Algorithms for Query Processing and Optimization of Spatial Operations

  • 1. Algorithms for Spatial Joins and Spatial Query Processing and Optimization -Natasha Mandal
  • 2. Applications of Spatial Queries O Spatial Database Systems O Geographical Information Systems O Urban Planning O CAD/CAM systems O Image Databases
  • 5. Goals O Understand more about Query Processing in SDBMS O Learn more about Spatial Operations in SDBMS O Learn about Optimization in SDBMS
  • 6. What is Query Processing? Why Optimize? O Queries are expressed in a high-level declarative language such as SQL. O The database software is supposed to map the query into a sequence of operations supported by spatial indexes and storage structures. O Goals:  Process a query accurately  Do this in the minimum amount of time possible
  • 7. What is Query Processing? Why Optimize? O Queries are composed of a basic set of relations. O Query processing and optimization are divided into two steps:  Design and fine-tune algorithms for each of the basic relational operators.  Map high-level queries into a composition of these basic relational operators and optimize (using information in the first step).
  • 8. Challenges in Spatial Databases  Unlike relational databases, spatial databases have no fixed set of operators that serve as building blocks for query evaluation (ex. Overlap and Intersect may return a similar result).  Spatial databases have large volumes of complex objects (with spatial extensions) which cannot be sorted in a one-dimensional array.  The assumption that I/O costs dominate CPU costs is no longer valid since computationally expensive algorithms are used to test for spatial predicates.
  • 9. Spatial Operations O Spatial Operations can be classified into four groups:  Update - Modify, Create etc.  Selection – o Point Query (𝑃𝑄): Given a query point 𝑝, find all spatial objects 𝑂 that contain it: 𝑃𝑄 𝑝 = {𝑂|𝑝 ∈ 𝑂. 𝐺 ≠ ∅} where 𝑂. 𝐺 is the geometry of the object 𝑂. Ex. “Find all river flood-plains which contain the CITY” [CITY is assumed to be a point type] o Range Query (𝑅𝑄): Given a query polygon 𝑃, find all spatial objects 𝑂 which intersect 𝑃. [If 𝑃 is a rectangle, 𝑅𝑄 is a window query] 𝑅𝑄(𝑃)={𝑂│𝑂.‫ܩ‬ ∧ 𝑃.‫}∅≠ܩ‬ Ex. “Get all forests which overlap with flood plain of River Nile”
  • 10. Spatial Operations  Spatial Join – This relation holds when two tables 𝑅 and 𝑆 are joined on a spatial predicate 𝜃 . Map Overlay is an important variant of Spatial Join. 𝑅 ⋈ 𝜃 𝑆 = 𝑜1, 𝑜2 𝑜1 ∈ 𝑅, 𝑜2 ∈ 𝑆, 𝜃 𝑜1. 𝐺, 𝑜2. 𝐺 Some example 𝜃 predicates are intersect, contains, is enclosed by, distance, northwest, adjacent, meets, overlap etc.
  • 11. Spatial Operations Ex. “Find all forest stands and river plains which overlap” SELECT FS.name, FP.name FROM Forest Stand FS, Flood Plain FP WHERE overlap(FS.G, FP.G)  Spatial Aggregate – These are usually variants of the nearest neighbor search. 𝑁𝑁𝑄 𝑜′ = {𝑜|∀𝑜": 𝑑𝑖𝑠𝑡(𝑜′. 𝐺, 𝑜. 𝐺) ≤ 𝑑𝑖𝑠𝑡(𝑜′. 𝐺, 𝑜". 𝐺)}
  • 12. Two-Step Query Processing of Object Operations O Filter Step: Spatial Objects are represented by simpler approximations such as MBR or different predicates. No tuple from the final answer using exact geometry should be eliminated in the filter step. For ex. touch(River.Flood-Plain, :CITY) may be replaced by overlap(MBR(River.Flood-Plain), MBR(:CITY))
  • 13. Two-Step Query Processing of Object Operations  Refinement Step: The exact geometry of each element from the candidate set and the exact predicate are examined. This may require a CPU intensive application and may be processed outside the spatial database (in a GIS). Filtering – MBRs Geometric Filter (Approximations) – Convex Hull, Minimum Enclosed Circle etc. Exact Geometry – Plane Sweep etc.
  • 15. Techniques for Spatial Selection O What are the alternative ways of processing a query? It depends on how the file containing the relations being queried is organized.  Unsorted Data and No Index – Use brute force to scan the whole file and test each record for the predicate.  Spatial Indexing – Can be used to access geometric data. The MBRs of spatial attributes of a relation can be indexed.  Space filling curves – These can be used to map points of multidimensional space into one dimensional space. A B-Tree index can be imposed on ordered entries to enhance the search.
  • 16. General Spatial Selection O A selection condition can be a combination of several “primitive” selection conditions. O For spatial selections, the order in which the individual conditions in CNF is processed is important because different spatial conditions have different processing costs. O Predicates can be applied in ascending order of 𝑅𝑎𝑛𝑘. 𝑅𝑎𝑛𝑘 = 𝑠𝑒𝑙𝑒𝑐𝑡𝑖𝑣𝑖𝑡𝑦 − 1 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡𝑖𝑎𝑙 𝑐𝑜𝑠𝑡
  • 17. General Spatial Selection 𝑠𝑒𝑙𝑒𝑐𝑡𝑖𝑣𝑖𝑡𝑦 𝑝 = 𝑐𝑎𝑟𝑑𝑖𝑛𝑎𝑙𝑖𝑡𝑦(𝑜𝑢𝑡𝑝𝑢𝑡(𝑝)) 𝑐𝑎𝑟𝑑𝑖𝑛𝑎𝑙𝑖𝑡𝑦(𝑖𝑛𝑝𝑢𝑡(𝑝)) 𝑑𝑖𝑓𝑓𝑒𝑛𝑡𝑖𝑎𝑙 𝑐𝑜𝑠𝑡 = per tuple cost of a predicate. It remains constant throughout the life of the function and can be stored in the system catalog (along with selectivity).
  • 18. Spatial Join O Spatial Join can be an expensive operation and the presence of indices can help in the fast processing of queries. Classification of spatial join methods Both inputs are indexed One input is indexed Neither input is indexed  transformation to z-values  spatial join index  tree matching  index nested loops  seeded tree join  build and match  sort and match  slot index spatial join  spatial hash join  partition based spatial merge join  size separation spatial join  scalable sweeping-based spatial join
  • 19. The R-Tree Join O This algorithm can be used when both the inputs are indexed. O It is based on the enclosure property of trees: if two nodes do not intersect, then there are no rectangles below them that can intersect. O RJ starts from the roots of the trees to be joined and finds pairs of overlapping entries. O For each such pair, the algorithm is recursively called until the leaf levels where overlapping pairs constitute solutions. O The following algorithm assumes both the R-Trees are of equal height (this can easily be extended).
  • 20. The R-Tree Join Alg. RJ(Rtree_Node ni, RTNode nj) for each entry ej,y ∈ nj, do { for each entry ei,x ∈ ni with ei,x ⋂, ej,y ≠ ∅ do { if ni is a leaf node /* nj is also a leaf node */ then Output (ei,x, ej,y ); else /* intermediate nodes */ { ReadPage(ei,x. ref); ReadPage(ej,y.ref); RJ(ei,x ref, ej,y ref); } } } /* end for */
  • 22. The R-Tree Join  Optimizations for CPU speed:  Search Space Restriction  Plane Sweep – sorting in one dimension reduces time for finding overlapping pairs  Optimizations for I/O speed:  Plane Sweep - consecutive computed pairs overlap with high probability  Breadth-first traversal that sorts the output at each level in order to reduce the number of page accesses.
  • 23. Spatial Hash Join O This algorithm can be used to compute the join of two non-indexed datasets 𝑅 (build input i.e. smaller relation) and 𝑆 (probe input). O 𝑅 is partitioned into 𝐾 buckets.  The initial buckets are points determined based on sampling.  Each object is inserted into the bucket that is enlarged the least.
  • 24. Spatial Hash Join O 𝑆 is hashed into buckets with the same extent as 𝑅's buckets  An object is inserted into all buckets that intersect it.  Some objects may be assigned to multiple buckets (replication) and some may not be inserted at all (filtering). O The two bucket sets are joined; each bucket from R is matched with only one bucket from S, thus requiring a single scan of both files. O If for some pair neither bucket fits in memory, an R-tree is built for one of them, and the bucket-to- bucket join is executed in an index nested loop fashion.
  • 26. Slot Index Spatial Join O This algorithm is applicable when there is an R-tree for one of the inputs (𝑅). O If 𝐾 is the desired number of partitions, SISJ will find the topmost level of the tree such that the number of entries is larger than or equal to 𝐾. These entries are then grouped into 𝐾 (possibly overlapping) partitions called slots.  Each slot contains the MBR of the indexed R- tree entries, along with a list of pointers to these entries.
  • 27. Slot Index Spatial Join  SISJ starts with a single empty slot and inserts entries into the slot that is enlarged the least.  When the maximum capacity of a slot is reached (determined by 𝐾 and the total number of entries), either some entries are deleted and reinserted or the slot is split according to the R*-tree splitting policy. O The second dataset 𝑆 is hashed into buckets with the same extents as the slots.  If an object from 𝑆 does not intersect any bucket, it is filtered.  If it intersects more than one bucket, it is replicated.
  • 28. Slot Index Spatial Join O The join phase  All data from the R-tree of 𝑅 indexed by a slot are loaded and joined with the corresponding hash-bucket from 𝑆 using plane sweep.  If the data to be joined does not fit in memory, they can be joined using an algorithm which employs external sorting and then plane sweep.  During the join phase of SISJ, when no data from 𝑆 is inserted into a bucket, the sub-tree data under the corresponding slot is not loaded (slot filtering).
  • 30. Query Optimization O The metric used for an evaluation plan is time required to execute the query. For spatial databases this would include I/O and CPU costs. O A query optimizer (a module in the database software) generates different evaluation plans and determines the appropriate execution strategy. O The idea is to avoid the worst plans and choose a good one (seldom the best one). O The procedures of query optimizer can be divided into two parts - 𝑙𝑜𝑔𝑖𝑐𝑎𝑙 𝑡𝑟𝑎𝑛𝑠𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 and 𝑑𝑦𝑛𝑎𝑚𝑖𝑐 𝑝𝑟𝑜𝑔𝑟𝑎𝑚𝑚𝑖𝑛𝑔.
  • 32. Logical Transformation O Parsing  The parser checks the syntax and transforms the statement into a query tree.  Parsers for spatial databases have to be more sophisticated to identify and manage user-defined data types.  The leaf nodes of the query tree correspond to the relations involved and the internal nodes correspond to the operations.  Query processing starts at the leaf nodes and proceeds up until the operation at the root node has been performed.
  • 33. Logical Transformation SELECT L.Name FROM Lake L, Facilities Fa WHERE Area(L.G)>20 AND Fa.Name =“Campground” AND Distance(Fa.G, L.G)<50 𝜋 𝐿.𝑁𝑎𝑚𝑒 𝜎𝐴𝑟𝑒𝑎.𝐺>20 𝜎 𝐹𝑎.𝑁𝑎𝑚𝑒="𝐶𝑎𝑚𝑝𝑔𝑟𝑜𝑢𝑛𝑑" ⋈ 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝐹𝑎.𝐺,𝐿.𝐺 <50 𝐿𝑎𝑘𝑒 𝐿 𝐹𝑎𝑐𝑖𝑙𝑖𝑡𝑖𝑒𝑠 𝐹𝑎
  • 34. Logical Transformation O Logical Transformation  The query tree generated by parser is mapped onto equivalent query trees (based on a formal set of rules inherited from relational algebra).  After equivalent trees are enumerated, we can apply heuristics to filter out non-candidates.  Clear-cut heuristic may not apply for spatial databases due to user-defined functions etc.  𝑅𝑎𝑛𝑘 can be used as a heuristic. 𝑆𝑒𝑙𝑒𝑐𝑡𝑖𝑣𝑖𝑡𝑦 and 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡𝑖𝑎𝑙 𝑐𝑜𝑠𝑡 can be stored in the System Catalog.
  • 35. Logical Transformation O Equivalence Rules:  Selections o 𝜎𝑐1∧𝑐2∧⋯𝑐𝑛(𝑅) ≡ 𝜎𝑐1(𝜎𝑐2 … 𝜎𝑐𝑛 𝑅 … ) – Can push all non-spatial conditions towards the right. o 𝜎𝑐1(𝜎𝑐2 𝑅 ) ≡ 𝜎𝑐2(𝜎𝑐1 𝑅 )  Projections o 𝜋 𝑎1(𝑅) ≡ 𝜋 𝑎1 𝜋 𝑎2 … 𝜋 𝑎𝑛 𝑅 … if 𝑎𝑖 ⊂ 𝑎𝑖+1for 𝑖 = 1, … 𝑛 − 1  Cross Product and Joins o 𝑅 ⋈ 𝑆 ≡ 𝑆 ⋈ 𝑅 o 𝑅 ⋈ (𝑆 ⋈ 𝑇) ≡ (𝑅 ⋈ 𝑆) ⋈ 𝑇
  • 36. Logical Transformation  Selection, Projection and Joins o If the selection condition involves attributes retained by the projection operator 𝜋 𝑎(𝜎𝑐 𝑅 ) ≡ 𝜎𝑐(𝜋 𝑎 𝑅 ) o If a selection condition involves only an attribute that is present in 𝑅 and not in 𝑆 then 𝜎𝑐(𝑅 ⋈ 𝑆) ≡ 𝜎𝑐(𝑅) ⋈ 𝑆 o Projection can be computed with a join: 𝜋 𝑎(𝑅 ⋈ 𝑆) ≡ 𝜋 𝑎1(𝑅) ⋈ 𝜋 𝑎2(𝑆) where 𝑎1 ⊆ 𝑎 which appears in 𝑅 and 𝑎2 ⊆ 𝑎 which appears in 𝑆
  • 37. Cost Based Optimization: Dynamic Programming O Dynamic Programming is used to determine the optimal execution strategy from a set of execution plans. O The optimal solution minimizes the cost function. O We focus on each node of query tree and enumerate the different execution strategies available to process the node. The different processing strategies for each node when combined for the whole query constitutes the plan space. O The cardinality of plan space might be high and the optimization time must be kept minimum. This suggests that we should select a good (not the best) plan.
  • 38. Cost Based Optimization: Dynamic Programming O The factors that a good cost function must take into account are: o Access cost – Searching for and transferring data from secondary storage. o Storage cost – Storing intermediate temporary relations produced by an execution strategy. o Computation cost – CPU cost of performing in- memory operations. o Communication cost – Transferring information between the client and server.
  • 39. Cost Based Optimization: Dynamic Programming O Systems Catalog  It contains the information required by the cost function to design an optimal execution strategy.  It includes: o the size of each file o the number of records in each file o number of blocks over which records are spread o information about indexes and indexing attributes o 𝑠𝑒𝑙𝑒𝑐𝑡𝑖𝑣𝑖𝑡𝑦 and 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡𝑖𝑎𝑙 𝑐𝑜𝑠𝑡 o can materialize expensive, user-defined functions and index their values for fast retrieval
  • 40. Cost Based Optimization: Dynamic Programming O Cost Functions 𝑐𝑜𝑠𝑡 = 𝐸𝑥𝑝 𝑟𝑒𝑐𝑜𝑟𝑑𝑠_𝑒𝑥𝑎𝑚𝑖𝑛𝑒𝑑 + 𝐾 ∗ 𝐸𝑥𝑝(𝑝𝑎𝑔𝑒𝑠_𝑟𝑒𝑎𝑑)  𝐸𝑥𝑝 𝑟𝑒𝑐𝑜𝑟𝑑𝑠_𝑒𝑥𝑎𝑚𝑖𝑛𝑒𝑑 = expected number of records read [measure of CPU time]  𝐸𝑥𝑝(𝑝𝑎𝑔𝑒𝑠_𝑟𝑒𝑎𝑑)= expected number of pages read from storage [measure of I/O time]  𝐾= measure of how important CPU resources are relative to I/O resources O Decomposition and Merge in Hybrid Architecture  A query is decomposed into spatial and non-spatial part.  Subqueries are optimized in separate modules and are merged.
  • 41. Conclusion O We learnt about the 2-Step Query Processing paradigm. O We reviewed algorithms for Spatial Operations like Spatial Join. O We learnt how Dynamic Programming can be used to optimize queries based on the cost function.