SlideShare a Scribd company logo
International Journal of Modern Trends in Engineering and
Research
www.ijmter.com
e-ISSN: 2349-9745
p-ISSN: 2393-8161
@IJMTER-2014, All rights Reserved 16
An Analysis on Query Optimization in Distributed Database
Joshi Janki1
1
R&D Department, Infitrix Software, Delhi
Abstract: The query optimizer is a significant element in today’s relational database
management system. This element is responsible for translating a user-submitted query
commonly written in a non-procedural language-into an efficient query evaluation program that
can be executed against the database. This research paper describes architecture steps of query
process and optimization time and memory usage. Key goal of this paper is to understand the
basic query optimization process and its architecture.
Keywords –Query Optimization, Distributed Database System, Query Processing
I. INTRODUCTION
Query optimization is a function of much relational database management system. Generally, the
query optimizer cannot be accessed directly by users, once queries are submitted to database
server, and parsed by the parser, they are then passed to the query optimizer where optimization
take place. Queries results are generated by accessing relevant database data and manipulating it
in a way that yields the requested information.[1] Since database structures are complex, in most
cases, and especially for not-very-simple queries, the needed data for a query can be collected
from a database by accessing it in different ways, through different data-structures, and in
different orders. It determines the lowest cost plan for executing queries. By "lowest cost plan,"
we mean an access path to the data that takes the least amount of time.[2]
Figure1: Query Optimization Concept Through above figure
International Journal of Modern Trends in Engineering and Research (IJMTER)
Volume 01, Issue 04, [October - 2014]
e-ISSN: 2349-9745
p-ISSN: 2393-8161
@IJMTER-2014, All rights Reserved 17
The description of the above figure is as below:
The Query Parser checks the validity of the query and then translates it into an internal Form
usually a relational calculus expression or something equivalent. The Query Optimizer examines
all algebraic expressions that are equivalent to the given query and chooses the one that is
estimated to be the cheapest. The Code Generator or the Interpreter transforms the access plan
generated by the optimizer into calls to the query processor. The Query Processor actually
executes the query.[3]
Queries are posed to a DBMS by interactive users or by programs written in general-purpose
programming languages (e.g., Fortran, PL-1) that have queries embedded in them. An interactive
(ad hoc) query goes through the entire path shown in Figure 1. On the other hand, an embedded
query goes through the three steps only once, when the program in which it is embedded is
compiled. The code produced by the Code Generator is stored in the database and is simply
invoked and executed by the Query Processor whenever control reaches that query during the
program execution (run time). Thus, independent of the number of times an embedded query
needs to be executed, optimization is not repeated until database updates make the access plan
invalid (e.g., index deletion) or highly suboptimal (e.g., extensive changes in database
contents).[5]
Figure2: Query Optimizer Architecture [7]
The entire query optimization process can be seen as having two stages: rewriting and planning.
There is only one module in the first stage, the Rewriter, whereas all other modules are in the
second stage.
International Journal of Modern Trends in Engineering and Research (IJMTER)
Volume 01, Issue 04, [October - 2014]
e-ISSN: 2349-9745
p-ISSN: 2393-8161
@IJMTER-2014, All rights Reserved 18
II. Functionality of Query Optimizer Architecture
Rewriter: Module applies transformations to a given query and produces equivalent queries that
are hopefully more efficient, e.g., replacement of views with their definition, attending out of
nested queries, etc. The transformations performed by the Rewriter depend only on the
declarative i.e., static, characteristics of queries and do not take into account the actual query
costs for the specific DBMS and database concerned. If the rewriting is known or assumed to
always be beneficial, the original query is discarded; otherwise, it is sent to the next stage as
well. By the nature of the rewriting transformations, this stage operates at the declarative
level.[7]
Planner: This is the main module of the ordering stage. It examines all possible execution plans
for each query produced in the previous stage and selects the overall cheapest one to be used to
generate the answer of the original query. It employs a search strategy, which examines the space
of execution plans in a particular fashion. This space is determined by two other modules of the
optimizer, the Algebraic Space and the Method-Structure Space. For the most part, these two
modules and the search strategy determine the cost, i.e., running time, of the optimizer itself,
which should be as low as possible. The execution plans examined by the Planner are compared
based on estimates of their cost so that the cheapest may be chosen. These costs are derived by
the last two modules of the optimizer, the Cost Model and the Size-Distribution Estimator.[7]
Method-Structure Space: This module determines the implementation choices that exist for the
execution of each ordered series of actions specified by the Algebraic Space. This choice is
related to the available join methods for each join (e.g., nested loops, merge scan, and hash join),
if supporting data structures are built on the y, if/when duplicates are eliminated, and other
implementation characteristics of this sort, which are predetermined by the DBMS
implementation. This choice is also related to the available indices for accessing each relation,
which is determined by the physical schema of each database stored in its catalogs. Given an
algebraic formula or tree from the Algebraic Space, this module produces all corresponding
complete execution plans, which specify the implementation of each algebraic operator and the
use of any indices.[7]
Cost Model: This module specifies the arithmetic formulas that are used to estimate the cost of
execution plans. For every different join method, for every different index type access, and in
general for every distinct kind of step that can be found in an execution plan, there is a formula
that gives its cost. Given the complexity of many of these steps, most of these formulas are
simple approximations of what the system actually does and are based on certain assumptions
regarding issues like buffer management, disk-cpu overlap, sequential vs. random I/O, etc. The
most important input parameters to a formula are the size of the buffer pool used by the
corresponding step, the sizes of relations or indices accessed, and possibly various distributions
International Journal of Modern Trends in Engineering and Research (IJMTER)
Volume 01, Issue 04, [October - 2014]
e-ISSN: 2349-9745
p-ISSN: 2393-8161
@IJMTER-2014, All rights Reserved 19
of values in these relations. While the first one is determined by the DBMS for each query, the
other two are estimated by the Size-Distribution Estimator.[7]
Size-Distribution Estimator: This module specifies how the sizes (and possibly frequency
distributions of attribute values) of database relations and indices as well as (sub)query results
are estimated. As mentioned above, these estimates are needed by the Cost Model. The specific
estimation approach adopted in this module also determines the form of statistics that need to be
maintained in the catalogs of each database, if any.[7]
Algebraic Space: This module determines the action execution orders that are to be considered
by the Planner for each query sent to it. All such series of actions produce the same query
answer, but usually differ in performance. They are usually represented in relational algebra as
formulas or in tree form. Because of the algorithmic nature of the objects generated by this
module and sent to the Planner, the overall planning stage is characterized as operating at the
procedural level.[7]
III. Examples of Optimization Time and Memory
To find item price and customer name from two tables Orders and Customers Original:
Original:
Select O.ItemPrice, C.Name
From Orders O, Customers C
Corrected:
Select O.ItemPrice, C.Name
From Orders O, Customers C
Where O.CustomerID = C.CustomerID
In the first example we can see two query one is original and second one is corrected query. Here
join query was not used and also not used for all the keys, so that it would return so many
records and it’s takes a hours to find result.[4]
To find out employees salary based on their ID
Original:
For i = 1 to 20000
Select salary From Employees Where EmpID = Parameter(i)
Corrected:
Select salary From Employees Where EmpID >= 1 and EmpID <= 20000
International Journal of Modern Trends in Engineering and Research (IJMTER)
Volume 01, Issue 04, [October - 2014]
e-ISSN: 2349-9745
p-ISSN: 2393-8161
@IJMTER-2014, All rights Reserved 20
The original Query involves a lot of time and memory consumption and will make your entire
system slow.[8]
VI. CONCLUSION
The paper gives brief concept of query optimization along with its architecture and module
functionality. It also describes working of its query flow methods (step by step) execution. With
the help of example it shows optimization time and memory based on record extraction.
References
[1]M. M. Astrahan et al. System R: A relational approach to data management. ACM
Transactions on Database Systems, 1(2):97{137, June 1976.
[2] G. Antoshenkov. Dynamic query optimization in Rdb/VMS. In Proc. IEEE Int. Conference
on Data engineering, pages 538{547, Vienna, Austria, March 1993.
[3] K. Bennett, M. C. Ferris, and Y. Ioannidis. A genetic algorithm for database query
optimization. In Proc. 4th Int. Conference on Genetic Algorithms, pages 400{407, San Diego,
CA, July 1991.
[4] P. A. Bernstein, N. Goodman, E. Wong, C. L. Reeve, and J. B. Rothnie. Queryprocessing in a
system for distributed databases (SDD-1). ACM TODS, 6(4):602{625,December 1981.
[5]R. Cole and G. Graefe. Optimization of dynamic query evaluation plans. In Proc.ACM-
SIGMOD Conference on the Management of Data, pages 150{160, Minneapolis,MN, June 1994.
[6] S. Christodoulakis. Implications of certain assumptions in database performance evaluation.
uation. ACM TODS, 9(2):163{186, June 1984.
[7]S. Christodoulakis. On the estimation and use of selectivities in database performance
evaluation. Research Report CS-89-24, Dept. of Computer Science, University ofWa-terloo,
June 1989.
[8]http://www.serverwatch.com/tutorials.
An Analysis on Query Optimization in Distributed Database
An Analysis on Query Optimization in Distributed Database

More Related Content

PDF
Issues in Query Processing and Optimization
Editor IJMTER
 
PDF
Hh3512801283
IJERA Editor
 
PDF
Size estimation of olap systems
csandit
 
PDF
SIZE ESTIMATION OF OLAP SYSTEMS
cscpconf
 
PDF
501 183-191
idescitation
 
PDF
Power Consumption and Energy Estimation in Smartphones
INFOGAIN PUBLICATION
 
PDF
N1803017478
IOSR Journals
 
PDF
O1803017981
IOSR Journals
 
Issues in Query Processing and Optimization
Editor IJMTER
 
Hh3512801283
IJERA Editor
 
Size estimation of olap systems
csandit
 
SIZE ESTIMATION OF OLAP SYSTEMS
cscpconf
 
501 183-191
idescitation
 
Power Consumption and Energy Estimation in Smartphones
INFOGAIN PUBLICATION
 
N1803017478
IOSR Journals
 
O1803017981
IOSR Journals
 

What's hot (19)

PDF
HW/SW Partitioning Approach on Reconfigurable Multimedia System on Chip
CSCJournals
 
PDF
F1803013034
IOSR Journals
 
PDF
A model for run time software architecture adaptation
ijseajournal
 
PDF
H1803014347
IOSR Journals
 
PDF
Dynamically Adapting Software Components for the Grid
Editor IJCATR
 
PDF
Harnessing deep learning algorithms to predict software refactoring
TELKOMNIKA JOURNAL
 
PDF
Self-adaptive Software Modeling Based on Contextual Requirements
TELKOMNIKA JOURNAL
 
PPS
UML Unit 01
Prashanth Shivakumar
 
PDF
SE18_Lec 01_Introduction to Software Engineering
Amr E. Mohamed
 
PDF
Task scheduling methodologies for high speed computing systems
ijesajournal
 
PDF
JPL : IMPLEMENTATION OF A PROLOG SYSTEM SUPPORTING INCREMENTAL TABULATION
csandit
 
PDF
Statistical Model to Validate A Metaprocess-Oriented Methodology based on RAS...
IJMERJOURNAL
 
PDF
Integration of queuing network and idef3 for business process analysis
Patricia Tavares Boralli
 
PDF
Consistency of data replication
ijitjournal
 
PDF
Comparative Analysis of Various Grid Based Scheduling Algorithms
iosrjce
 
PDF
On the Choice of Models of Computation for Writing Executable Specificatoins ...
ijeukens
 
PDF
SE18_Lec 07_System Modelling and Context Model
Amr E. Mohamed
 
PDF
Presenting an Algorithm for Tasks Scheduling in Grid Environment along with I...
Editor IJCATR
 
PDF
mlsys_portrait
Ian Dewancker
 
HW/SW Partitioning Approach on Reconfigurable Multimedia System on Chip
CSCJournals
 
F1803013034
IOSR Journals
 
A model for run time software architecture adaptation
ijseajournal
 
H1803014347
IOSR Journals
 
Dynamically Adapting Software Components for the Grid
Editor IJCATR
 
Harnessing deep learning algorithms to predict software refactoring
TELKOMNIKA JOURNAL
 
Self-adaptive Software Modeling Based on Contextual Requirements
TELKOMNIKA JOURNAL
 
SE18_Lec 01_Introduction to Software Engineering
Amr E. Mohamed
 
Task scheduling methodologies for high speed computing systems
ijesajournal
 
JPL : IMPLEMENTATION OF A PROLOG SYSTEM SUPPORTING INCREMENTAL TABULATION
csandit
 
Statistical Model to Validate A Metaprocess-Oriented Methodology based on RAS...
IJMERJOURNAL
 
Integration of queuing network and idef3 for business process analysis
Patricia Tavares Boralli
 
Consistency of data replication
ijitjournal
 
Comparative Analysis of Various Grid Based Scheduling Algorithms
iosrjce
 
On the Choice of Models of Computation for Writing Executable Specificatoins ...
ijeukens
 
SE18_Lec 07_System Modelling and Context Model
Amr E. Mohamed
 
Presenting an Algorithm for Tasks Scheduling in Grid Environment along with I...
Editor IJCATR
 
mlsys_portrait
Ian Dewancker
 
Ad

Viewers also liked (17)

PDF
Database Review and Challenges (2016)
Mayuree Srikulwong
 
PPTX
Lec 7 query processing
Md. Mashiur Rahman
 
PPTX
2 ddb architecture
Mr Patrick NIYISHAKA
 
PDF
Distributed Database
Mayuree Srikulwong
 
PPTX
Query processing
Deepak Singh
 
PPTX
Query processing and Query Optimization
Niraj Gandha
 
PPTX
Distributed Query Processing
Mythili Kannan
 
PDF
8 query processing and optimization
Kumar
 
PPTX
Distributed dbms
ReachLocal Services India
 
PPT
Lecture 11 - distributed database
HoneySah
 
PDF
Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...
Beat Signer
 
PPT
Distributed Database Management System
Hardik Patil
 
PPT
13. Query Processing in DBMS
koolkampus
 
PPT
Lecture 10 distributed database management system
emailharmeet
 
PPTX
Distributed database
ReachLocal Services India
 
PPT
Distributed Database System
Sulemang
 
PPSX
Parallel Database
VESIT/University of Mumbai
 
Database Review and Challenges (2016)
Mayuree Srikulwong
 
Lec 7 query processing
Md. Mashiur Rahman
 
2 ddb architecture
Mr Patrick NIYISHAKA
 
Distributed Database
Mayuree Srikulwong
 
Query processing
Deepak Singh
 
Query processing and Query Optimization
Niraj Gandha
 
Distributed Query Processing
Mythili Kannan
 
8 query processing and optimization
Kumar
 
Distributed dbms
ReachLocal Services India
 
Lecture 11 - distributed database
HoneySah
 
Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...
Beat Signer
 
Distributed Database Management System
Hardik Patil
 
13. Query Processing in DBMS
koolkampus
 
Lecture 10 distributed database management system
emailharmeet
 
Distributed database
ReachLocal Services India
 
Distributed Database System
Sulemang
 
Parallel Database
VESIT/University of Mumbai
 
Ad

Similar to An Analysis on Query Optimization in Distributed Database (20)

PPTX
Mc seminar
Ankit Anand
 
PDF
IRJET- Machine Learning Techniques for Code Optimization
IRJET Journal
 
PDF
Software Engineering Important Short Question for Exams
MuhammadTalha436
 
PDF
An Adjacent Analysis of the Parallel Programming Model Perspective: A Survey
IRJET Journal
 
PDF
Q01231103109
IOSR Journals
 
PDF
Delivering IT as A Utility- A Systematic Review
ijfcstjournal
 
PDF
Selection & Maintenance of Materialized View and It’s Application for Fast Qu...
IJCSES Journal
 
DOCX
software engineering
bharati vidhyapeeth uni.-pune
 
PDF
Benchmarking Techniques for Performance Analysis of Operating Systems and Pro...
IRJET Journal
 
PDF
Cd24534538
IJERA Editor
 
PDF
IRJET- Determining Document Relevance using Keyword Extraction
IRJET Journal
 
PPTX
Software estimation techniques
Tan Tran
 
DOC
Print report
Ved Prakash
 
PDF
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...
IRJET Journal
 
PDF
dd presentation.pdf
AnSHiKa187943
 
PDF
lake city institute of technology
RaviKalola786
 
PDF
IJSRED-V2I4P8
IJSRED
 
PPTX
Software development life cycle
shefali mishra
 
PDF
IRJET- Automatic Database Schema Generator
IRJET Journal
 
PDF
Size and Time Estimation in Goal Graph Using Use Case Points (UCP): A Survey
IJERA Editor
 
Mc seminar
Ankit Anand
 
IRJET- Machine Learning Techniques for Code Optimization
IRJET Journal
 
Software Engineering Important Short Question for Exams
MuhammadTalha436
 
An Adjacent Analysis of the Parallel Programming Model Perspective: A Survey
IRJET Journal
 
Q01231103109
IOSR Journals
 
Delivering IT as A Utility- A Systematic Review
ijfcstjournal
 
Selection & Maintenance of Materialized View and It’s Application for Fast Qu...
IJCSES Journal
 
software engineering
bharati vidhyapeeth uni.-pune
 
Benchmarking Techniques for Performance Analysis of Operating Systems and Pro...
IRJET Journal
 
Cd24534538
IJERA Editor
 
IRJET- Determining Document Relevance using Keyword Extraction
IRJET Journal
 
Software estimation techniques
Tan Tran
 
Print report
Ved Prakash
 
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...
IRJET Journal
 
dd presentation.pdf
AnSHiKa187943
 
lake city institute of technology
RaviKalola786
 
IJSRED-V2I4P8
IJSRED
 
Software development life cycle
shefali mishra
 
IRJET- Automatic Database Schema Generator
IRJET Journal
 
Size and Time Estimation in Goal Graph Using Use Case Points (UCP): A Survey
IJERA Editor
 

More from Editor IJMTER (20)

PDF
A NEW DATA ENCODER AND DECODER SCHEME FOR NETWORK ON CHIP
Editor IJMTER
 
PDF
A RESEARCH - DEVELOP AN EFFICIENT ALGORITHM TO RECOGNIZE, SEPARATE AND COUNT ...
Editor IJMTER
 
PDF
Analysis of VoIP Traffic in WiMAX Environment
Editor IJMTER
 
PDF
A Hybrid Cloud Approach for Secure Authorized De-Duplication
Editor IJMTER
 
PDF
Aging protocols that could incapacitate the Internet
Editor IJMTER
 
PDF
A Cloud Computing design with Wireless Sensor Networks For Agricultural Appli...
Editor IJMTER
 
PDF
A CAR POOLING MODEL WITH CMGV AND CMGNV STOCHASTIC VEHICLE TRAVEL TIMES
Editor IJMTER
 
PDF
Sustainable Construction With Foam Concrete As A Green Green Building Material
Editor IJMTER
 
PDF
USE OF ICT IN EDUCATION ONLINE COMPUTER BASED TEST
Editor IJMTER
 
PDF
Textual Data Partitioning with Relationship and Discriminative Analysis
Editor IJMTER
 
PDF
Testing of Matrices Multiplication Methods on Different Processors
Editor IJMTER
 
PDF
Survey on Malware Detection Techniques
Editor IJMTER
 
PDF
SURVEY OF TRUST BASED BLUETOOTH AUTHENTICATION FOR MOBILE DEVICE
Editor IJMTER
 
PDF
SURVEY OF GLAUCOMA DETECTION METHODS
Editor IJMTER
 
PDF
Survey: Multipath routing for Wireless Sensor Network
Editor IJMTER
 
PDF
Step up DC-DC Impedance source network based PMDC Motor Drive
Editor IJMTER
 
PDF
SPIRITUAL PERSPECTIVE OF AUROBINDO GHOSH’S PHILOSOPHY IN TODAY’S EDUCATION
Editor IJMTER
 
PDF
Software Quality Analysis Using Mutation Testing Scheme
Editor IJMTER
 
PDF
Software Defect Prediction Using Local and Global Analysis
Editor IJMTER
 
PDF
Software Cost Estimation Using Clustering and Ranking Scheme
Editor IJMTER
 
A NEW DATA ENCODER AND DECODER SCHEME FOR NETWORK ON CHIP
Editor IJMTER
 
A RESEARCH - DEVELOP AN EFFICIENT ALGORITHM TO RECOGNIZE, SEPARATE AND COUNT ...
Editor IJMTER
 
Analysis of VoIP Traffic in WiMAX Environment
Editor IJMTER
 
A Hybrid Cloud Approach for Secure Authorized De-Duplication
Editor IJMTER
 
Aging protocols that could incapacitate the Internet
Editor IJMTER
 
A Cloud Computing design with Wireless Sensor Networks For Agricultural Appli...
Editor IJMTER
 
A CAR POOLING MODEL WITH CMGV AND CMGNV STOCHASTIC VEHICLE TRAVEL TIMES
Editor IJMTER
 
Sustainable Construction With Foam Concrete As A Green Green Building Material
Editor IJMTER
 
USE OF ICT IN EDUCATION ONLINE COMPUTER BASED TEST
Editor IJMTER
 
Textual Data Partitioning with Relationship and Discriminative Analysis
Editor IJMTER
 
Testing of Matrices Multiplication Methods on Different Processors
Editor IJMTER
 
Survey on Malware Detection Techniques
Editor IJMTER
 
SURVEY OF TRUST BASED BLUETOOTH AUTHENTICATION FOR MOBILE DEVICE
Editor IJMTER
 
SURVEY OF GLAUCOMA DETECTION METHODS
Editor IJMTER
 
Survey: Multipath routing for Wireless Sensor Network
Editor IJMTER
 
Step up DC-DC Impedance source network based PMDC Motor Drive
Editor IJMTER
 
SPIRITUAL PERSPECTIVE OF AUROBINDO GHOSH’S PHILOSOPHY IN TODAY’S EDUCATION
Editor IJMTER
 
Software Quality Analysis Using Mutation Testing Scheme
Editor IJMTER
 
Software Defect Prediction Using Local and Global Analysis
Editor IJMTER
 
Software Cost Estimation Using Clustering and Ranking Scheme
Editor IJMTER
 

Recently uploaded (20)

PDF
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
PPTX
Module2 Data Base Design- ER and NF.pptx
gomathisankariv2
 
PDF
Machine Learning All topics Covers In This Single Slides
AmritTiwari19
 
PDF
FLEX-LNG-Company-Presentation-Nov-2017.pdf
jbloggzs
 
PPT
Understanding the Key Components and Parts of a Drone System.ppt
Siva Reddy
 
PDF
The Effect of Artifact Removal from EEG Signals on the Detection of Epileptic...
Partho Prosad
 
DOCX
SAR - EEEfdfdsdasdsdasdasdasdasdasdasdasda.docx
Kanimozhi676285
 
PDF
Natural_Language_processing_Unit_I_notes.pdf
sanguleumeshit
 
PPTX
22PCOAM21 Session 2 Understanding Data Source.pptx
Guru Nanak Technical Institutions
 
PDF
AI-Driven IoT-Enabled UAV Inspection Framework for Predictive Maintenance and...
ijcncjournal019
 
PDF
EVS+PRESENTATIONS EVS+PRESENTATIONS like
saiyedaqib429
 
PDF
CAD-CAM U-1 Combined Notes_57761226_2025_04_22_14_40.pdf
shailendrapratap2002
 
PDF
Packaging Tips for Stainless Steel Tubes and Pipes
heavymetalsandtubes
 
PDF
top-5-use-cases-for-splunk-security-analytics.pdf
yaghutialireza
 
PDF
Construction of a Thermal Vacuum Chamber for Environment Test of Triple CubeS...
2208441
 
PPTX
quantum computing transition from classical mechanics.pptx
gvlbcy
 
PPTX
IoT_Smart_Agriculture_Presentations.pptx
poojakumari696707
 
PDF
Chad Ayach - A Versatile Aerospace Professional
Chad Ayach
 
PDF
LEAP-1B presedntation xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
hatem173148
 
PDF
settlement FOR FOUNDATION ENGINEERS.pdf
Endalkazene
 
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
Module2 Data Base Design- ER and NF.pptx
gomathisankariv2
 
Machine Learning All topics Covers In This Single Slides
AmritTiwari19
 
FLEX-LNG-Company-Presentation-Nov-2017.pdf
jbloggzs
 
Understanding the Key Components and Parts of a Drone System.ppt
Siva Reddy
 
The Effect of Artifact Removal from EEG Signals on the Detection of Epileptic...
Partho Prosad
 
SAR - EEEfdfdsdasdsdasdasdasdasdasdasdasda.docx
Kanimozhi676285
 
Natural_Language_processing_Unit_I_notes.pdf
sanguleumeshit
 
22PCOAM21 Session 2 Understanding Data Source.pptx
Guru Nanak Technical Institutions
 
AI-Driven IoT-Enabled UAV Inspection Framework for Predictive Maintenance and...
ijcncjournal019
 
EVS+PRESENTATIONS EVS+PRESENTATIONS like
saiyedaqib429
 
CAD-CAM U-1 Combined Notes_57761226_2025_04_22_14_40.pdf
shailendrapratap2002
 
Packaging Tips for Stainless Steel Tubes and Pipes
heavymetalsandtubes
 
top-5-use-cases-for-splunk-security-analytics.pdf
yaghutialireza
 
Construction of a Thermal Vacuum Chamber for Environment Test of Triple CubeS...
2208441
 
quantum computing transition from classical mechanics.pptx
gvlbcy
 
IoT_Smart_Agriculture_Presentations.pptx
poojakumari696707
 
Chad Ayach - A Versatile Aerospace Professional
Chad Ayach
 
LEAP-1B presedntation xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
hatem173148
 
settlement FOR FOUNDATION ENGINEERS.pdf
Endalkazene
 

An Analysis on Query Optimization in Distributed Database

  • 1. International Journal of Modern Trends in Engineering and Research www.ijmter.com e-ISSN: 2349-9745 p-ISSN: 2393-8161 @IJMTER-2014, All rights Reserved 16 An Analysis on Query Optimization in Distributed Database Joshi Janki1 1 R&D Department, Infitrix Software, Delhi Abstract: The query optimizer is a significant element in today’s relational database management system. This element is responsible for translating a user-submitted query commonly written in a non-procedural language-into an efficient query evaluation program that can be executed against the database. This research paper describes architecture steps of query process and optimization time and memory usage. Key goal of this paper is to understand the basic query optimization process and its architecture. Keywords –Query Optimization, Distributed Database System, Query Processing I. INTRODUCTION Query optimization is a function of much relational database management system. Generally, the query optimizer cannot be accessed directly by users, once queries are submitted to database server, and parsed by the parser, they are then passed to the query optimizer where optimization take place. Queries results are generated by accessing relevant database data and manipulating it in a way that yields the requested information.[1] Since database structures are complex, in most cases, and especially for not-very-simple queries, the needed data for a query can be collected from a database by accessing it in different ways, through different data-structures, and in different orders. It determines the lowest cost plan for executing queries. By "lowest cost plan," we mean an access path to the data that takes the least amount of time.[2] Figure1: Query Optimization Concept Through above figure
  • 2. International Journal of Modern Trends in Engineering and Research (IJMTER) Volume 01, Issue 04, [October - 2014] e-ISSN: 2349-9745 p-ISSN: 2393-8161 @IJMTER-2014, All rights Reserved 17 The description of the above figure is as below: The Query Parser checks the validity of the query and then translates it into an internal Form usually a relational calculus expression or something equivalent. The Query Optimizer examines all algebraic expressions that are equivalent to the given query and chooses the one that is estimated to be the cheapest. The Code Generator or the Interpreter transforms the access plan generated by the optimizer into calls to the query processor. The Query Processor actually executes the query.[3] Queries are posed to a DBMS by interactive users or by programs written in general-purpose programming languages (e.g., Fortran, PL-1) that have queries embedded in them. An interactive (ad hoc) query goes through the entire path shown in Figure 1. On the other hand, an embedded query goes through the three steps only once, when the program in which it is embedded is compiled. The code produced by the Code Generator is stored in the database and is simply invoked and executed by the Query Processor whenever control reaches that query during the program execution (run time). Thus, independent of the number of times an embedded query needs to be executed, optimization is not repeated until database updates make the access plan invalid (e.g., index deletion) or highly suboptimal (e.g., extensive changes in database contents).[5] Figure2: Query Optimizer Architecture [7] The entire query optimization process can be seen as having two stages: rewriting and planning. There is only one module in the first stage, the Rewriter, whereas all other modules are in the second stage.
  • 3. International Journal of Modern Trends in Engineering and Research (IJMTER) Volume 01, Issue 04, [October - 2014] e-ISSN: 2349-9745 p-ISSN: 2393-8161 @IJMTER-2014, All rights Reserved 18 II. Functionality of Query Optimizer Architecture Rewriter: Module applies transformations to a given query and produces equivalent queries that are hopefully more efficient, e.g., replacement of views with their definition, attending out of nested queries, etc. The transformations performed by the Rewriter depend only on the declarative i.e., static, characteristics of queries and do not take into account the actual query costs for the specific DBMS and database concerned. If the rewriting is known or assumed to always be beneficial, the original query is discarded; otherwise, it is sent to the next stage as well. By the nature of the rewriting transformations, this stage operates at the declarative level.[7] Planner: This is the main module of the ordering stage. It examines all possible execution plans for each query produced in the previous stage and selects the overall cheapest one to be used to generate the answer of the original query. It employs a search strategy, which examines the space of execution plans in a particular fashion. This space is determined by two other modules of the optimizer, the Algebraic Space and the Method-Structure Space. For the most part, these two modules and the search strategy determine the cost, i.e., running time, of the optimizer itself, which should be as low as possible. The execution plans examined by the Planner are compared based on estimates of their cost so that the cheapest may be chosen. These costs are derived by the last two modules of the optimizer, the Cost Model and the Size-Distribution Estimator.[7] Method-Structure Space: This module determines the implementation choices that exist for the execution of each ordered series of actions specified by the Algebraic Space. This choice is related to the available join methods for each join (e.g., nested loops, merge scan, and hash join), if supporting data structures are built on the y, if/when duplicates are eliminated, and other implementation characteristics of this sort, which are predetermined by the DBMS implementation. This choice is also related to the available indices for accessing each relation, which is determined by the physical schema of each database stored in its catalogs. Given an algebraic formula or tree from the Algebraic Space, this module produces all corresponding complete execution plans, which specify the implementation of each algebraic operator and the use of any indices.[7] Cost Model: This module specifies the arithmetic formulas that are used to estimate the cost of execution plans. For every different join method, for every different index type access, and in general for every distinct kind of step that can be found in an execution plan, there is a formula that gives its cost. Given the complexity of many of these steps, most of these formulas are simple approximations of what the system actually does and are based on certain assumptions regarding issues like buffer management, disk-cpu overlap, sequential vs. random I/O, etc. The most important input parameters to a formula are the size of the buffer pool used by the corresponding step, the sizes of relations or indices accessed, and possibly various distributions
  • 4. International Journal of Modern Trends in Engineering and Research (IJMTER) Volume 01, Issue 04, [October - 2014] e-ISSN: 2349-9745 p-ISSN: 2393-8161 @IJMTER-2014, All rights Reserved 19 of values in these relations. While the first one is determined by the DBMS for each query, the other two are estimated by the Size-Distribution Estimator.[7] Size-Distribution Estimator: This module specifies how the sizes (and possibly frequency distributions of attribute values) of database relations and indices as well as (sub)query results are estimated. As mentioned above, these estimates are needed by the Cost Model. The specific estimation approach adopted in this module also determines the form of statistics that need to be maintained in the catalogs of each database, if any.[7] Algebraic Space: This module determines the action execution orders that are to be considered by the Planner for each query sent to it. All such series of actions produce the same query answer, but usually differ in performance. They are usually represented in relational algebra as formulas or in tree form. Because of the algorithmic nature of the objects generated by this module and sent to the Planner, the overall planning stage is characterized as operating at the procedural level.[7] III. Examples of Optimization Time and Memory To find item price and customer name from two tables Orders and Customers Original: Original: Select O.ItemPrice, C.Name From Orders O, Customers C Corrected: Select O.ItemPrice, C.Name From Orders O, Customers C Where O.CustomerID = C.CustomerID In the first example we can see two query one is original and second one is corrected query. Here join query was not used and also not used for all the keys, so that it would return so many records and it’s takes a hours to find result.[4] To find out employees salary based on their ID Original: For i = 1 to 20000 Select salary From Employees Where EmpID = Parameter(i) Corrected: Select salary From Employees Where EmpID >= 1 and EmpID <= 20000
  • 5. International Journal of Modern Trends in Engineering and Research (IJMTER) Volume 01, Issue 04, [October - 2014] e-ISSN: 2349-9745 p-ISSN: 2393-8161 @IJMTER-2014, All rights Reserved 20 The original Query involves a lot of time and memory consumption and will make your entire system slow.[8] VI. CONCLUSION The paper gives brief concept of query optimization along with its architecture and module functionality. It also describes working of its query flow methods (step by step) execution. With the help of example it shows optimization time and memory based on record extraction. References [1]M. M. Astrahan et al. System R: A relational approach to data management. ACM Transactions on Database Systems, 1(2):97{137, June 1976. [2] G. Antoshenkov. Dynamic query optimization in Rdb/VMS. In Proc. IEEE Int. Conference on Data engineering, pages 538{547, Vienna, Austria, March 1993. [3] K. Bennett, M. C. Ferris, and Y. Ioannidis. A genetic algorithm for database query optimization. In Proc. 4th Int. Conference on Genetic Algorithms, pages 400{407, San Diego, CA, July 1991. [4] P. A. Bernstein, N. Goodman, E. Wong, C. L. Reeve, and J. B. Rothnie. Queryprocessing in a system for distributed databases (SDD-1). ACM TODS, 6(4):602{625,December 1981. [5]R. Cole and G. Graefe. Optimization of dynamic query evaluation plans. In Proc.ACM- SIGMOD Conference on the Management of Data, pages 150{160, Minneapolis,MN, June 1994. [6] S. Christodoulakis. Implications of certain assumptions in database performance evaluation. uation. ACM TODS, 9(2):163{186, June 1984. [7]S. Christodoulakis. On the estimation and use of selectivities in database performance evaluation. Research Report CS-89-24, Dept. of Computer Science, University ofWa-terloo, June 1989. [8]http://www.serverwatch.com/tutorials.