dd presentation.pdf

Name- Anshika Das
Roll-14400121029
Subcode- PCC-CS601
Sub-DBMS
Department-CSE

Agenda
1. Introduction
• Defining the Role of Databases
• Overview of Key Components
2. Query Processing and Optimization
• Understanding the Query Lifecycle
• Techniques for Optimizing Queries
2
3. Database Security
• Importance of Secure Database
Management
• Key Security Measures
4. Advanced Topics
• In-Memory Databases, NoSQL,
Big Data, ML, Blockchain

Importance of Database Management
• Central role in storing, retrieving, and managing
data.
• Foundation for various applications and
systems.
Key Components:
Query Processing:
Transformation of user
queries into
executable actions.
Optimization:
Enhancing query
performance through
efficient algorithms.
Security: Safeguarding
data integrity,
confidentiality, and
availability. 3

4
◼ Query processing:
◼ Is the list of activities that are perform to obtain the
required tuples that satisfy a given query.
◼ Query optimization:
◼ The process of choosing a suitable execution
strategy for processing a query.
Introduction to Query Processing and Optimization
Two internal
representations
of a query:
• Query Tree
• Query Graph

query
parser and
translator
relational algebra
expression
optimizer
evaluation plan
evaluation
engine
output
data data
data
statistics
Query Processing

It is done in the following steps:
S tep-1:
Parser: During parse call, the database performs the following
checks- Syntax check, Semantic check and Shared pool check,
after converting the query into relational algebra. Parser performs
the following checks as (refer detailed diagram).
1.Syntax check – concludes SQL syntactic validity. Example:
SELECT * FORM employee
Here error of wrong spelling of FROM is given by this check.
Process of query processing

2. Semantic check – determines whether the statement is meaningful or not.
Example: query contains a table name which does not exist is checked by this
check.
3. Shared Pool check – Every query possess a hash code during its
execution. So, this check determines existence of written hash code in shared
pool if code exists in shared pool then database will not take additional steps for
optimization and execution.
Hard Parse and Soft Parse –
If there is a fresh query and its hash code does not exist in
shared pool then that query has to pass through from the
additional steps known as hard parsing otherwise if hash code
exists then query does not passes through additional steps. It just
passes directly to execution engine (refer detailed diagram). This
is known as soft parsing.
Hard Parse includes following steps – Optimizer and Row source
generation.
Step-2:
Optimizer: During optimization stage, database must perform a
hard parse at least for one unique DML statement and perform
optimization during this parse. This database never optimizes DDL
unless it includes a DML component such as subquery that
require optimization.

It is a process in which multiple query execution plan for satisfying a
query are examined and most efficient query plan is satisfied for
execution.
Database catalog stores the execution plans and then optimizer passes
the lowest cost plan for execution.
Row Source Generation –
The Row Source Generation is a software that receives a optimal
execution plan from the optimizer and produces an iterative execution
plan that is usable by the rest of the database. the iterative plan is the
binary program that when executes by the sql engine produces the
result set.
Step-3:
Execution Engine: Finally runs the query and display the required
result.
Query in a high level language
Scanning, parsing and validating
Immediate form of query
Query optimizer
Execution plan
Query code generator
Code to execute the query

Example of query process
SELECT * FROM Student Details WHERE name=Paul
Parse query and translate
check syntax, verify names, etc
translate into relational algebra (RDBMS)
create evaluation plans
Find best plan (optimization)
Execute plan
student_details
cid name
00112233 Paul
00112238 Rob
00112235 Matt
takes
cid courseid
00112233 312
00112233 395
00112235 312
course
courseid coursename
312 Advanced DBs
395 Machine Learning

student_details
cid name
00112233 Paul
00112238 Rob
00112235 Matt
takes
cid courseid
00112233 312
00112233 395
00112235 312
course
courseid coursename
312 Advanced DBs
395 Machine Learning
Query to retrieve the information of a particular student name =
Paul
SELECT * FROM Student_ details WHERE name=Paul
σname=Paul(Student_ details)
πname( σ =001122359(Student Details) )
Relational Algebra

Specify which access path to follow
Specify which algorithm to use to evaluate operator
Specify how operators interleave
Optimization:
estimate the cost of each plan (not all plans)
select plan with lowest estimated cost
Evaluation plans
Student_details
σname=Paul
Student_details
σcoursename=Advanced DBs l
_details takes
cid; hash join
courseid; index-nested loop
course
πname

What needs to be considered
• Disk I/O s
• Sequential
• random
• CPU time
• Network communication
What are we going to consider
• Disk I/O s
• page reads/writes
• Ignoring cost of writing final output
Estimating Cost

Query Optimization
Need for Query Optimization
1.Improved Performance:
Reduced Execution Time: Optimized
queries often result in reduced execution
times, leading to faster response times for
users and applications.
Resource Utilization: Optimized queries
use system resources more efficiently,
preventing unnecessary strain on the
database server.
2.Resource Conservation:
CPU and Memory Usage: Efficiently optimized
queries consume fewer CPU and memory
resources, allowing the system to handle a larger
number of concurrent queries without degradation
in performance.
Disk I/O: Optimized queries minimize disk I/O
operations, reducing the load on storage devices
and improving overall system throughput.
3.Scalability:
Support for Growth: As the volume of data and user
queries increases, well-optimized queries ensure that the
database system can scale effectively without a
proportional decrease in performance.
Adaptability: Query optimization allows the DBMS to
adapt to changes in data distribution, size, and query
patterns.
4.Cost Reduction:
Hardware Costs: By optimizing queries and utilizing
system resources efficiently, organizations can avoid the
need for constant hardware upgrades to meet growing
demands.
Operational Costs: Faster query execution and
reduced resource consumption contribute to lower
operational costs, especially in large-scale enterprise
environments.

Rule-based Optimization
Overview: Rule-based optimization involves
static optimization using predefined rules set by
the database administrator or system designer.
How It Works:
• The DBMS relies on a set of
predefined rules and heuristics to
choose an execution plan for a given
query.
• These rules are typically based on the
structure of the query, the available
indexes, and historical performance
data.
Advantages:
• Simplicity: Rule-based optimization is
straightforward to implement and
understand.
• Predictability: The query optimizer
follows a set of fixed rules, providing
predictable results.
Limitations:
• Lack of Adaptability: Rule-based
optimization may struggle with
adaptability to changing data
distributions or dynamic workload
patterns.
• Limited Complexity: In complex
scenarios, rule-based systems may not
handle optimization challenges.
Cost-based Optimization
Overview: Cost-based optimization
is a dynamic optimization approach
based on estimating the cost of
different execution plans and
choosing most efficient one.
How It Works:
• The DBMS analyzes multiple
execution plans for a query
and estimates the cost
associated with each plan.
• The cost includes factors
such as disk I/O, CPU usage,
and memory consumption.
• The optimizer selects the
execution plan with the
lowest estimated cost.
Advantages:
• Adaptability: Cost-based
optimization adapts to
changes in data distribution,
statistics, and system
resources.
Limitations:
• Complexity: Implementing a
cost-based optimizer
requires sophisticated
algorithms and statistical
modeling.

Query optimization is used to access and modify
the database in the most efficient way possible. It
is the art of obtaining necessary information in a
predictable, reliable, and timely manner. Query
optimization is formally described as the process
of transforming a query into an equivalent form
that may be evaluated more efficiently. The goal of
query optimization is to find an execution plan that
reduces the time required to process a query. We
must complete two major tasks to attain this
optimization target.
The first is to determine the optimal plan to access
the database, and the second is to reduce the
time required to execute the query plan.
Following query parsing which is a process by which this
decision making is done that for a given query, calculating
how many different ways there are in which the query can
run, then the parsed query is delivered to the query optimizer,
which generates various execution plans to analyze the
parsed query and select the plan with the lowest estimated
cost. The catalog manager assists the optimizer in selecting
the optimum plan to perform the query by generating the cost
of each plan.
Process of Query Optimization

The optimizer tries to come up with the best execution
plan possible for a SQL statement.
Among all the candidate plans reviewed, the optimizer
chooses the plan with the lowest cost. The optimizer
computes costs based on available facts. The cost
computation takes into account query execution factors
such as I/O, CPU, and communication for a certain query
in a given context.
Purpose of the Query Optimizer in DBMS
Sr No. Class Name Role
01 10 Shreya CR
02 10 Ritik
For example, there is a query that requests information
about students who are in leadership roles, such as being
a class representative. If the optimizer statistics show
that 50% of students are in positions of leadership, the
optimizer may decide that a full table search is the most
efficient. However, if data show that just a small number of
students are in positions of leadership, reading an index
followed by table access by row id may be more efficient
than a full table scan.
Because the database has so many internal statistics and
tools at its disposal, the optimizer is frequently in a better
position than the user to decide the best way to execute a
statement. As a result, the optimizer is used by all SQL
statements.

The optimizer is made up of three parts: the transformer, the
estimator, and the plan generator. The figure below depicts
those components.
Optimizer Components
Query Transformer :The query transformer determines
whether it is advantageous to rewrite the original SQL
statement into a semantically equivalent SQL statement at a
lower cost for some statements.
When a plausible alternative exists, the database compares
the costs of each alternative and chooses the one with the
lowest cost. The query transformer shown in the query
below can be taken as an example of how query optimization
is done by transforming an OR-based input query into a
UNION ALL-based output query.
The given query is transformed using query transformer

Estimator:
The estimator is the optimizer component that calculates the total
cost of a given execution plan.
To determine the cost, the estimator employs three different
methods:
Selectivity: The query picks a percentage of the rows in the
row set, with 0 indicating no rows and 1 indicating all rows.
Selectivity is determined by a query predicate, such
as WHERE the last name LIKE X%, or by a mix of predicates. As
the selectivity value approaches zero, a predicate gets more
selective, and as the value nears one, it becomes less selective
(or more unselective).
For example, The row set can be a base table, a view, or the result
of a join. The selectivity is tied to a query predicate, such as Last
Name = 'Prakash', or a combination of predicates, such as Last _
Name = 'Prakash' AND job id = 'SDE'.
Cost: This metric represents the number of units of
labor or resources used. The query optimizer uses disc
I/O, CPU utilization, and memory usage as units of
effort. For example, if the plan for query A has a lower
cost than the plan for query B, then the following
outcomes are possible: A executes faster
than B, A executes slower than B or A executes in the
same amount of time as B.
Cardinality: The cardinality of an execution plan is
the number of rows returned by each action. This
input is shared by all cost functions and is essential
for determining the best strategy. Cardinality in
DBMS can be calculated using DBMS STATS table
statistics or after taking into account the impact of
predicates (filter, join, and so
on), DISTINCT or GROUP BY operations, and so on.
In an execution plan, the Rows column displays the
estimated cardinality.
For example, if the optimizer estimates that a full table
scan will yield 100 rows, then the cardinality estimate for
this operation is 100. The cardinality estimate appears in
the execution plan's Rows column.

The plan generator investigates multiple plans for a query block
by experimenting with various access paths, join methods, and
join orders.
Because of the different combinations that the database can
utilize to generate the same outcome, many plans are available.
The plan with the lowest cost is chosen by the optimizer.
Depending on how it is invoked, the optimizer performs
different actions.
The database offers the following optimization types:
Normal Optimization: the optimizer parses the SQL
and produces an execution plan. For most SQL
statements, the usual mode gives a reasonable plan. The
optimizer when operating under normal mode it has
stringent time limits, usually a fraction of a second, during
which it must identify an optimal plan.
SQL Tuning Advisor optimization: The optimizer is
known as Automatic Tuning Optimizer when SQL Tuning
Advisor invokes it by taking one or more SQL statements
as an input. In this situation, the optimizer conducts further
analysis to improve the plan generated in regular mode.
The optimizer produces a set of activities, along with their
reasoning and predicted reward, to produce a
considerably better plan.
Automatic Tuning Optimizer
Plan Generator

Query Optimization Techniques with Examples
Indexing:
Objective: Accelerate data retrieval
by creating indexes on columns used
in WHERE clauses.
Example
Query Rewriting:
Objective: Restructure queries to
provide the same result with
improved efficiency.
Example
Join Optimization:
Objective: Optimize JOIN
operations to minimize
computational overhead.
Example:
Subquery
Optimization:
Objective: Optimize
subqueries to improve their
efficiency.
Example:
Query Caching:
Objective: Store and reuse the
results of frequently executed queries.
Example:
Parallel Processing:
Objective: Execute multiple parts of a
query simultaneously to improve overall
performance.
Example:

❑ Overview To Database Security.
❑ What is Database Security
❑ Why need of database security.
❑ Concepts of Database Security.
❑ Security Problems
❑ Security Controls
Outline of Database Security

In today’s world, we need everything secured
whether it is your mobile phone , computer ,
vehicle or almost anything.
Overview

What is database security?
Database:
It is a collection of
information stored in a
computer.
Security:
It is being free from
danger.
Database Security:
It is the mechanisms that
protect the database against
intentional or accidental
threats.

Definition of Database Security
Database Security is defined as
the process by which
“Confidentiality,
Integrity and Availability” of the
database can be protected
Database
Security
Authentication
Authorization
and Access
control
Data
Encryption
Data privacy
protection
Data Integrity
Verification
Auditing and
Logging
Concepts of Database Security
Three are 3 main aspects
Secrecy or Confidentiality
Integrity
Availability

SECRECY
▪ It is protecting the database
from unauthorized users.
▪ Ensures that users are allowed
to do the things they are
trying to do.
▪ Encryption is a technique or a
process by which the data is
encoded in such a way that only
that authorized users are able to
read the data.
▪ Protecting the database from
authorized users.
▪ Ensures that what users are
trying to do is correct.
▪ For examples,
▪ An employee should be able to
modify his or her own information.
INTEGRITY
Database must have not unplanned
downtime.
To ensure this ,following steps should be
taken
Restrict the amount of the storage
space given to each user in the
database.
Limitthe number of concurrentsessions
made available to each database user.
Backup the data at periodic intervals to
ensure data recovery in case of application
users.
AVAILABILITY

SECURITY PROBLEMS
Any circumstance or event with the potential to adversely
impact an IS through unauthorized access, destruction,
disclosure, modification of data, and/or denial of service.
There are two kinds of threat.
• Non-fraudulent Threat
• fraudulent Threat

▪ Natural or accidental disasters.
▪ Errors or bugs in hardware or software.
▪ Human errors.
fraudulent Threat
Authorized users
Those who abuse their privileges and authority.
Hostile agents
Those improper users (outsider or insiders).
who attack the software and/or hardware system, or
read or write data in a database
Non-fraudulent Threat

Database Protection Requirements
1. Protection from Improper Access
2. Protection from Inference
3. Integrity of the Database
4. User Authentication
5. Multilevel Protection
6. Confinement
7. Management and Protection of Sensitive Data

Security Controls
Authorization- privileges, views.
Encryption - public key / private key, secure
sockets.
Authentication – passwords.
Logical- firewalls, net proxies.
A FIREWALL is dedicated software on another computer which
inspects network trafficpassing through it and denies (or) permits
passage based on set of rules. Basically it is a piece of software that
monitors all traffic that goes from your system to another via the
Internet or network and Vice Versa.
Diagram Representation

Encryption and Decryption
Privacy Protection:
Encryption safeguards sensitive information, such as
personal details, financial data, and communications,
protecting user privacy.
Secure Communication: Encryption secures data
during transmission, preventing unauthorized
interception and eavesdropping.
Advantages and Disadvantages of Encryption
Advantages
Authentication and Authorization:
Encryption can be used in conjunction with
authentication and authorization mechanisms to
enhance overall security.
Performance Overhead:
The process of encrypting and decrypting data
introduces computational overhead, potentially
impacting system performance.
Potential for Key Exposure:
If encryption keys are not adequately protected, they
may be vulnerable to theft, leading to unauthorized
access.
Resource Consumption:
Encryption can consume additional resources, such as
CPU and memory, especially in resource-constrained
environments.
Disadvantages

Access to Encrypted Data
The primary purpose of decryption is to
enable authorized users to access and read
the originally encrypted data.
Data Utilization:
Decryption allows for the utilization of data
for various purposes such as analysis,
reporting, and decision-making.
Data Recovery:
In the case of data loss or system failures, having
access to decryption keys allows for the recovery of
encrypted data.
Advantage and Disadvantages of Decryption
Security Risks:
Decryption, if not carefully managed, can pose security
risks, especially if unauthorized parties gain access to
decryption keys.
Potential for Misuse:
If decryption keys fall into the wrong hands, there is a risk of
data misuse, unauthorized access, and potential security
breaches.
Data Exposure:
Decrypting data exposes it to potential threats during the
time it is in its readable form, especially if not adequately
protected.
Advantages Disadvantages

Read authorization - allows reading, but not modification of data
Insert authorization - allows insertion of new data,
but not modification of existing data.
Update authorization - allows modification, but not deletion of data.
Delete authorization - allows deletion of data
Authorization

Advance Topics
Distributed Databases:
Overview: Distributed databases involve the
storage and management of data across
multiple locations or servers.
Key Aspects: Data distribution, replication,
consistency, and fault tolerance.
Challenges: Network latency, data
synchronization, and ensuring consistency
across distributed nodes.
Data Warehousing:
Overview: Data warehousing involves the
collection, integration, and storage of data from
different sources for analysis and reporting.
Key Aspects: ETL (Extract, Transform, Load)
processes, data marts, and multidimensional data
models.
Challenges: Data integration, data quality, and
designing effective data models for analysis.
Data Mining and Machine Learning:
Overview: Data mining involves discovering patterns
and trends in large datasets, while machine learning
uses algorithms to make predictions.
Key Aspects: Classification, clustering, regression, and
predictive modeling.
Challenges: Feature selection, model interpretation,
and ensuring the quality of input data.
Blockchain and Databases:
Overview: Blockchain is a distributed ledger
technology that enables secure and transparent
transactions.
Key Aspects: Decentralization, consensus
mechanisms, and smart contracts.
Challenges: Scalability, privacy concerns, and
integration with traditional databases.

In today's dynamic technological landscape, the
understanding and implementation of these concepts are
crucial for organizations aiming to harness the full
potential of their data. Balancing performance, security,
and innovation is key to building robust and adaptive
database systems that meet the demands of modern
applications and business requirements. As technology
continues to evolve, staying informed about emerging
trends and advanced database management techniques
becomes increasingly important for professionals in the
field.
Conclusion

dd presentation.pdf

More Related Content

Similar to dd presentation.pdf (20)

More from AnSHiKa187943 (17)

Recently uploaded (20)

dd presentation.pdf