SlideShare a Scribd company logo
Query Optimization
Succinctly: Making the execution of queries optimally fast
Brandon Latronica - 2017
The pathway of a database command:
The pathway of a database command:
Query?
Just a request information
from a database.
We can use a language to
do it...name the most
famous for a DBMS...SQL!
The pathway of a database command:
Parsing and translation:
translate the query into its
internal form. This is then
translated into relational
algebra. Parser checks
syntax, verifies relations
The pathway of a database command:
Relational Algebra:
The conversion of query
syntax (SQL, etc) into some
type of internal, DBMS
relational algebra. Why?
CAS systems are easier for
computers, while words are
better for humans!
The pathway of a database command:
Optimization:
Last stop. The best plan is
determined and then is
pushed for execution via
database calls which results
in a query output.
Basic Overview of Query Optimization
● Cost difference between evaluation plans for a query can be huge!
● Sometimes seconds vs. days!
● Steps in cost-based query optimization
1. Generate logically equivalent expressions using equivalence rules (Car travel paths)
2. Annotate resultant expressions to get alternative query plans
3. Choose the cheapest plan based on estimated cost
● Estimation of plan cost based on:
● Statistical information about relations. Eg: number of tuples, number of distinct values for an
attribute,etc.
● Statistics estimation for intermediate results to compute cost of complex expressions
● Cost formulae for algorithms, computed using statistics
Seconds vs days?
Query Optimization - Yannis E. Ioannidis
Seconds vs days?
Algebra on real numbers? We know that.
And so on...
(x2
+ 2x - 8) / (x - 2) = ?
= (x - 2)(x + 4) / (x - 2)
= 1 * (x + 4)
= (x + 4)
(8,6) : (2,1)
Many terms and operations, to few.
Boolean Algebra? We know that.
AB V ( BC(B V C) ) = ?
= AB V BBC V BCC [Distrib]
= AB V BC V BC [Idem]
= AB V BC [- Distrib]
= B(A V C)
(6,5) : (3,2)
Many terms and operations, to
few.
σ : Select operator that selects specific filters requirements for information --
Ex: σname = "Dan" (customer) will select all in customer whose name that match "Dan".
Π : Project operator that displays all information from a specified area areas --
Ex: Πname, balance(customer) will show all names and balances from customer
In a relation table, a PROJECT eliminates columns while SELECT eliminates rows!
Natural join (⋈ ) is a binary operator that is written as (R S) where R and S are⋈
relations. The result of the natural join is the set of all combinations of tuples(that is,
ordered lists) in R and S that are equal on their common attribute names.
Theta join ( ⋈ θ ) is an operation that consists of all combinations of tuples in R and S
that satisfy θ. The result of the θ-join is defined only if the headers of S and R are
disjoint, that is, do not contain a common attribute.
Relational Algebra in a DB; some operators.
Algebra Transformations
Two relational algebra expressions are said to be equivalent if the two expressions generate
the same set of tuples on every legal database instance
● Note: order of tuples is irrelevant
● we don’t care if they generate different results on databases that violate integrity
constraints
An equivalence rule says that expressions of two forms are equivalent
● Can replace expression of first form by second, or vice versa
Equivalence rules - Relational Algebra
1. Conjunctive selection operations can be deconstructed into a sequence of
individual selections.
2. Selection operations are commutative.
3. Only the last in a sequence of projection operations is needed, the others can be
omitted.
4. Selections can be combined with Cartesian products and theta joins.
a. σθ(E1 X E2) = E1 θ E2
b. σθ1(E1 θ2 E2) = E1 θ1 θ2∧ E2
Equivalence rules - more.
5. Theta-join operations (and natural joins) are commutative.
E1 θ E2 = E2 θ E1
6. (a) Natural join operations are associative:
(E1 E2) E3 = E1 (E2 E3)
(b) Theta joins are associative in the following manner:
(E1 θ1 E2) θ2 θ∧ 3 E3 = E1 θ1 θ∧ 3 (E2 θ2 E3)
where θ2 involves attributes from only E2 and E3.
And many more...
Pictorial Reduction Example
Πname, title(σdept_name= “Music”∧year = 2009 (instructor (teaches Πcourse_id, title (course))))
Good ways to order the joins?
● For all relations r1, r2, and r3,
(r1 r2) r3 = r1 (r2 r3 )
(Join Associativity)
● If r2 r3 is quite large and r1 r2 is small, we choose (r1 r2) r3
so that we compute and store a smaller temporary relation.
Good rule to always avoid Cartesian products when searching for an optimal plan.
Counting alternative plans: How many?
● Query optimizers use equivalence rules to systematically generate expressions equivalent to
the given expression
● Can generate all equivalent expressions as follows:
● Repeat until no new equivalent expressions are generated:
*Apply all applicable equivalence rules on every subexpression of every equivalent
expression which is found.
*Add newly generated expressions to the set of equivalent expressions
● The above approach is expensive in terms of memory and compute time
● Two approaches:
-Optimized plan generation based on transformation rules.
-Special case approach for queries with only selections, projections and joins.
● Consider finding the best join-order for r1 r2 . . . rn.
● There are (bushy tree) (2(n – 1))!/(n – 1)! different join orders for above expression. With
n = 7, the number is 665280, with n = 10, the number is greater than 176 billion!
● No need to generate all the join orders. Using dynamic programming, the least-cost join
order for any subset of {r1, r2, . . . rn} is computed only once and stored for future use.
Dynamic programming? A method for solving a complex problem by breaking it down into
a collection of simpler subproblems, solving each of those subproblems just once, and
storing their solutions – ideally, using a memory-based data structure. The next time the
same subproblem occurs, instead of recomputing its solution, one simply looks up the
previously computed solution.
Practical query optimizers incorporate elements of the following two broad approaches:
1. Search all the plans and choose the best plan in a cost-based fashion. Use dynamic
programing to store and recall past found optimal plans and subplans!
2. Uses heuristics to choose a plan.
Cost and choice.
Heuristics: Being Pragmatic
● Cost-based optimization is expensive, even with dynamic programming.
● Systems may use heuristics to reduce the number of choices that must be made in a cost-
based fashion.
● Heuristic optimization transforms the query-tree by using a set of rules that typically (but not
in all cases) improve execution performance:
● Perform selection early (reduces the number of tuples)
● Perform projection early (reduces the number of attributes)
● Perform most restrictive selection and join operations (i.e. with smallest result size)
before other similar operations.
● Some systems use only heuristics, others combine heuristics with partial cost-based
optimization.
Statistical estimation of data size?
What if, on a particular database relation, we stored information regarding the content of
the table?
Database indices?
A database index is a data structure that improves the speed of
data retrieval operations on a database table at the cost of
additional writes and storage space to maintain the index data
structure. Indexes are used to quickly locate data without
having to search every row in a database table every time a
database table is accessed.
A B+ tree is an n-ary tree with a variable but often large
number of children per node. A B+ tree consists of a root,
internal nodes and leaves.[1] The root may be either a leaf or a
node with two or more children.
Hash table is a data structure used to implement an
associative array, a structure that can map keys to values. A
hash table uses a hash function to compute an index into an
array of buckets or slots, from which the desired value can be
found.
O log(n)
O (1)
Other Indices and reductions
Offset B+- Tree is a type of index used to park new and update data on table set outside the body of the
main table. Offset table << Main Table! Used with columnar TBAT files
Why not columnar DBs? TBAT are.

More Related Content

What's hot (20)

PDF
Presentation database security audit vault & database firewall
xKinAnx
 
PPTX
Understanding isolation levels
Hieu Nguyen Trung
 
PDF
Query optimization in SQL
Abdul Rehman
 
PPTX
Query processing and Query Optimization
Niraj Gandha
 
PPT
MySQL Atchitecture and Concepts
Tuyen Vuong
 
PPTX
Query processing and Query Optimization
Niraj Gandha
 
PPTX
Functional dependency
Sakshi Jaiswal
 
PPTX
Presto: SQL-on-anything
DataWorks Summit
 
PPTX
Sql server
Fajar Baskoro
 
PDF
Big Data technology Landscape
ShivanandaVSeeri
 
PDF
Percona Live 2012PPT: introduction-to-mysql-replication
mysqlops
 
PPTX
Nosql databases
ateeq ateeq
 
PPT
Query optimization
dixitdavey
 
PDF
MariaDB ColumnStore
MariaDB plc
 
DOCX
MySQL_SQL_Tunning_v0.1.3.docx
NeoClova
 
PPTX
Weblogic application server
Anuj Tomar
 
PDF
"Changing Role of the DBA" Skills to Have, to Obtain & to Nurture - Updated 2...
Markus Michalewicz
 
PPTX
Fundamentals of Data Modeling and Database Design by Dr. Kamal Gulati
Amity University | FMS - DU | IMT | Stratford University | KKMI International Institute | AIMA | DTU
 
PPT
Oracle backup and recovery
Yogiji Creations
 
PPTX
Functional dependencies in Database Management System
Kevin Jadiya
 
Presentation database security audit vault & database firewall
xKinAnx
 
Understanding isolation levels
Hieu Nguyen Trung
 
Query optimization in SQL
Abdul Rehman
 
Query processing and Query Optimization
Niraj Gandha
 
MySQL Atchitecture and Concepts
Tuyen Vuong
 
Query processing and Query Optimization
Niraj Gandha
 
Functional dependency
Sakshi Jaiswal
 
Presto: SQL-on-anything
DataWorks Summit
 
Sql server
Fajar Baskoro
 
Big Data technology Landscape
ShivanandaVSeeri
 
Percona Live 2012PPT: introduction-to-mysql-replication
mysqlops
 
Nosql databases
ateeq ateeq
 
Query optimization
dixitdavey
 
MariaDB ColumnStore
MariaDB plc
 
MySQL_SQL_Tunning_v0.1.3.docx
NeoClova
 
Weblogic application server
Anuj Tomar
 
"Changing Role of the DBA" Skills to Have, to Obtain & to Nurture - Updated 2...
Markus Michalewicz
 
Fundamentals of Data Modeling and Database Design by Dr. Kamal Gulati
Amity University | FMS - DU | IMT | Stratford University | KKMI International Institute | AIMA | DTU
 
Oracle backup and recovery
Yogiji Creations
 
Functional dependencies in Database Management System
Kevin Jadiya
 

Viewers also liked (16)

PPTX
Timestamped Binary Association Table - IEEE Big Data Congress 2015
"FENG "GEORGE"" YU
 
DOC
Meeting the challenges of globalization3
Brian Berger
 
PPSX
company profile for SH
Madiha Asif
 
DOCX
Kiran DBA
kiran salla
 
PDF
Fintech Infographic- Spain November 2015
Isabel Fernández Peñuelas
 
PPTX
Open Learning Pitch
Elizabeth Lerman
 
DOC
Alba 111 modelo rendición cuentas
ALBA SUAREZ
 
PDF
behavioural_term_paper
Sebastian Flennerhag
 
DOCX
Señales preventivas
daniel mateo mejia perez
 
PPTX
VoiceThread Basics
tevansb
 
PPTX
Mk dasar2 komputer
ricazuma
 
PDF
05 e
noeiinoii
 
PPT
Etica apresentação
Lurdes Pinto
 
DOCX
Resume
manimalah
 
DOCX
Nueva arquitectura al servicio del hombre
luisannacc
 
PPTX
Evolución de las telecomunicaciones
Alberto Garcia
 
Timestamped Binary Association Table - IEEE Big Data Congress 2015
"FENG "GEORGE"" YU
 
Meeting the challenges of globalization3
Brian Berger
 
company profile for SH
Madiha Asif
 
Kiran DBA
kiran salla
 
Fintech Infographic- Spain November 2015
Isabel Fernández Peñuelas
 
Open Learning Pitch
Elizabeth Lerman
 
Alba 111 modelo rendición cuentas
ALBA SUAREZ
 
behavioural_term_paper
Sebastian Flennerhag
 
Señales preventivas
daniel mateo mejia perez
 
VoiceThread Basics
tevansb
 
Mk dasar2 komputer
ricazuma
 
05 e
noeiinoii
 
Etica apresentação
Lurdes Pinto
 
Resume
manimalah
 
Nueva arquitectura al servicio del hombre
luisannacc
 
Evolución de las telecomunicaciones
Alberto Garcia
 
Ad

Similar to Query Optimization - Brandon Latronica (20)

PPTX
Transaction Management, Recovery and Query Processing.pptx
Roshni814224
 
PPT
Query optimization and processing for advanced database systems
meharikiros2
 
PPT
ch02-240507064009-ac337bf1 .ppt
iamayesha2526
 
PPT
QPOfutyfurfugfuyttruft7rfu65rfuyt PPT - Copy.ppt
ahmed518927
 
PDF
CH5_Query Processing and Optimization.pdf
amariyarana
 
PPTX
CS 542 -- Query Execution
J Singh
 
PPTX
Chapter 4 - Query Processing and Optimization.pptx
ahmed518927
 
PPTX
Relational Algebra Operator With Example
exitjogja
 
PPT
Distributed query processing for Advance database technology .ppt
janaki raman
 
PPT
Query Decomposition and data localization
Hafiz faiz
 
PPTX
DB LECTURE 5 QUERY PROCESSING.pptx
grahamoyigo19
 
PPT
R programming slides
Pankaj Saini
 
PDF
Query trees
Shefa Idrees
 
PDF
Java 8
vilniusjug
 
PPT
1b_query_optimization_sil_7ed_ch16.ppt
ArunachalamSelva
 
PPT
Stack squeues lists
James Wong
 
PPT
Stacksqueueslists
Fraboni Ec
 
PPT
Stacks queues lists
Tony Nguyen
 
PPT
Stacks queues lists
Luis Goldster
 
Transaction Management, Recovery and Query Processing.pptx
Roshni814224
 
Query optimization and processing for advanced database systems
meharikiros2
 
ch02-240507064009-ac337bf1 .ppt
iamayesha2526
 
QPOfutyfurfugfuyttruft7rfu65rfuyt PPT - Copy.ppt
ahmed518927
 
CH5_Query Processing and Optimization.pdf
amariyarana
 
CS 542 -- Query Execution
J Singh
 
Chapter 4 - Query Processing and Optimization.pptx
ahmed518927
 
Relational Algebra Operator With Example
exitjogja
 
Distributed query processing for Advance database technology .ppt
janaki raman
 
Query Decomposition and data localization
Hafiz faiz
 
DB LECTURE 5 QUERY PROCESSING.pptx
grahamoyigo19
 
R programming slides
Pankaj Saini
 
Query trees
Shefa Idrees
 
Java 8
vilniusjug
 
1b_query_optimization_sil_7ed_ch16.ppt
ArunachalamSelva
 
Stack squeues lists
James Wong
 
Stacksqueueslists
Fraboni Ec
 
Stacks queues lists
Tony Nguyen
 
Stacks queues lists
Luis Goldster
 
Ad

Recently uploaded (20)

PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 

Query Optimization - Brandon Latronica

  • 1. Query Optimization Succinctly: Making the execution of queries optimally fast Brandon Latronica - 2017
  • 2. The pathway of a database command:
  • 3. The pathway of a database command: Query? Just a request information from a database. We can use a language to do it...name the most famous for a DBMS...SQL!
  • 4. The pathway of a database command: Parsing and translation: translate the query into its internal form. This is then translated into relational algebra. Parser checks syntax, verifies relations
  • 5. The pathway of a database command: Relational Algebra: The conversion of query syntax (SQL, etc) into some type of internal, DBMS relational algebra. Why? CAS systems are easier for computers, while words are better for humans!
  • 6. The pathway of a database command: Optimization: Last stop. The best plan is determined and then is pushed for execution via database calls which results in a query output.
  • 7. Basic Overview of Query Optimization ● Cost difference between evaluation plans for a query can be huge! ● Sometimes seconds vs. days! ● Steps in cost-based query optimization 1. Generate logically equivalent expressions using equivalence rules (Car travel paths) 2. Annotate resultant expressions to get alternative query plans 3. Choose the cheapest plan based on estimated cost ● Estimation of plan cost based on: ● Statistical information about relations. Eg: number of tuples, number of distinct values for an attribute,etc. ● Statistics estimation for intermediate results to compute cost of complex expressions ● Cost formulae for algorithms, computed using statistics
  • 8. Seconds vs days? Query Optimization - Yannis E. Ioannidis
  • 10. Algebra on real numbers? We know that. And so on... (x2 + 2x - 8) / (x - 2) = ? = (x - 2)(x + 4) / (x - 2) = 1 * (x + 4) = (x + 4) (8,6) : (2,1) Many terms and operations, to few.
  • 11. Boolean Algebra? We know that. AB V ( BC(B V C) ) = ? = AB V BBC V BCC [Distrib] = AB V BC V BC [Idem] = AB V BC [- Distrib] = B(A V C) (6,5) : (3,2) Many terms and operations, to few.
  • 12. σ : Select operator that selects specific filters requirements for information -- Ex: σname = "Dan" (customer) will select all in customer whose name that match "Dan". Π : Project operator that displays all information from a specified area areas -- Ex: Πname, balance(customer) will show all names and balances from customer In a relation table, a PROJECT eliminates columns while SELECT eliminates rows! Natural join (⋈ ) is a binary operator that is written as (R S) where R and S are⋈ relations. The result of the natural join is the set of all combinations of tuples(that is, ordered lists) in R and S that are equal on their common attribute names. Theta join ( ⋈ θ ) is an operation that consists of all combinations of tuples in R and S that satisfy θ. The result of the θ-join is defined only if the headers of S and R are disjoint, that is, do not contain a common attribute. Relational Algebra in a DB; some operators.
  • 13. Algebra Transformations Two relational algebra expressions are said to be equivalent if the two expressions generate the same set of tuples on every legal database instance ● Note: order of tuples is irrelevant ● we don’t care if they generate different results on databases that violate integrity constraints An equivalence rule says that expressions of two forms are equivalent ● Can replace expression of first form by second, or vice versa
  • 14. Equivalence rules - Relational Algebra 1. Conjunctive selection operations can be deconstructed into a sequence of individual selections. 2. Selection operations are commutative. 3. Only the last in a sequence of projection operations is needed, the others can be omitted. 4. Selections can be combined with Cartesian products and theta joins. a. σθ(E1 X E2) = E1 θ E2 b. σθ1(E1 θ2 E2) = E1 θ1 θ2∧ E2
  • 15. Equivalence rules - more. 5. Theta-join operations (and natural joins) are commutative. E1 θ E2 = E2 θ E1 6. (a) Natural join operations are associative: (E1 E2) E3 = E1 (E2 E3) (b) Theta joins are associative in the following manner: (E1 θ1 E2) θ2 θ∧ 3 E3 = E1 θ1 θ∧ 3 (E2 θ2 E3) where θ2 involves attributes from only E2 and E3. And many more...
  • 16. Pictorial Reduction Example Πname, title(σdept_name= “Music”∧year = 2009 (instructor (teaches Πcourse_id, title (course))))
  • 17. Good ways to order the joins? ● For all relations r1, r2, and r3, (r1 r2) r3 = r1 (r2 r3 ) (Join Associativity) ● If r2 r3 is quite large and r1 r2 is small, we choose (r1 r2) r3 so that we compute and store a smaller temporary relation. Good rule to always avoid Cartesian products when searching for an optimal plan.
  • 18. Counting alternative plans: How many? ● Query optimizers use equivalence rules to systematically generate expressions equivalent to the given expression ● Can generate all equivalent expressions as follows: ● Repeat until no new equivalent expressions are generated: *Apply all applicable equivalence rules on every subexpression of every equivalent expression which is found. *Add newly generated expressions to the set of equivalent expressions ● The above approach is expensive in terms of memory and compute time ● Two approaches: -Optimized plan generation based on transformation rules. -Special case approach for queries with only selections, projections and joins.
  • 19. ● Consider finding the best join-order for r1 r2 . . . rn. ● There are (bushy tree) (2(n – 1))!/(n – 1)! different join orders for above expression. With n = 7, the number is 665280, with n = 10, the number is greater than 176 billion! ● No need to generate all the join orders. Using dynamic programming, the least-cost join order for any subset of {r1, r2, . . . rn} is computed only once and stored for future use. Dynamic programming? A method for solving a complex problem by breaking it down into a collection of simpler subproblems, solving each of those subproblems just once, and storing their solutions – ideally, using a memory-based data structure. The next time the same subproblem occurs, instead of recomputing its solution, one simply looks up the previously computed solution. Practical query optimizers incorporate elements of the following two broad approaches: 1. Search all the plans and choose the best plan in a cost-based fashion. Use dynamic programing to store and recall past found optimal plans and subplans! 2. Uses heuristics to choose a plan. Cost and choice.
  • 20. Heuristics: Being Pragmatic ● Cost-based optimization is expensive, even with dynamic programming. ● Systems may use heuristics to reduce the number of choices that must be made in a cost- based fashion. ● Heuristic optimization transforms the query-tree by using a set of rules that typically (but not in all cases) improve execution performance: ● Perform selection early (reduces the number of tuples) ● Perform projection early (reduces the number of attributes) ● Perform most restrictive selection and join operations (i.e. with smallest result size) before other similar operations. ● Some systems use only heuristics, others combine heuristics with partial cost-based optimization.
  • 21. Statistical estimation of data size? What if, on a particular database relation, we stored information regarding the content of the table?
  • 22. Database indices? A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure. Indexes are used to quickly locate data without having to search every row in a database table every time a database table is accessed. A B+ tree is an n-ary tree with a variable but often large number of children per node. A B+ tree consists of a root, internal nodes and leaves.[1] The root may be either a leaf or a node with two or more children. Hash table is a data structure used to implement an associative array, a structure that can map keys to values. A hash table uses a hash function to compute an index into an array of buckets or slots, from which the desired value can be found. O log(n) O (1)
  • 23. Other Indices and reductions Offset B+- Tree is a type of index used to park new and update data on table set outside the body of the main table. Offset table << Main Table! Used with columnar TBAT files Why not columnar DBs? TBAT are.