SlideShare a Scribd company logo
Cost Based Optimizer – 2 of 2 Hotsos Enterprises, Ltd. Grapevine, Texas Oracle.  Performance.  Now. [email_address]
Agenda Cost Based Optimizer and its impact on performance Skewed Data Histograms Impact Performance (Logical I/O Impact) Performance (Join Strategy) Bind Variables Cardinality and Cost Conclusion
Cost Based Optimizer
Cost Based Optimizer (CBO) The CBO in reality is a complex decision making software Use several Database Initialization Parameters These are listed in the 10053 trace file Uses several session level initialization parameter These are parameters at the session level that override the database initialization parameters Uses statistics about the objects (Tables, Indexes) Hints to the optimizer Uses Statistics about the system (CPU, Disk etc) Use this information and makes decisions on the “best way” to generate an execution plan Use Information about the skew of the column if that information is gathered
CBO will be part of your life if you keep working with Oracle. The cost-based query optimizer (CBO)… Uses data from a variety of sources Estimates the costs of several execution plans Chooses the plan it estimates to be the least expensive Characteristics Adapts to changing circumstances Frustrating if you don’t know what it considers as input Works great if you know how to use it But produces very poor results if you lie to it The only query optimizer supported by Oracle Corporation from release 10 onward
The cost-based query optimizer chooses the plan that it computes as having the lowest estimated cost. Don’t assume the following are identical CBO’s estimated cost of an execution plan The actual cost of an execution plan CBO’s cost estimate can be imperfect Are your CBO inputs perfect? CBO isn’t perfect, but by 9.2 it’s almost always good enough Without properly collected statistics, the CBO will use RBO if no statistics exist on any object in the statement use default statistics if statistics exist for a single object in the statement but not others use dynamic sampling to generate statistics (based on parameter setting and Oracle version)
Cost Based Optimizer
Execution plan changes can result in profoundly different application performance. Table size change Device latency change Execution plan change Type C performance changes are the most profound size change performance change performance change performance change
Recap The CBO is a complex piece of software It uses several data points to calculate the cost of the execution plan and will choose the plan with the lowest cost It is dynamic and will adapt to changing data better than the Rule Based Optimizer A good understanding of the Cost Based Optimizer is imperative in understanding the rationale behind some of the choices
Skewed Data
Skewed Data Skewed Data is where the data distribution is not uniform A good example is the owner column for dba_objects The column is highly skewed Select owner,count(*) from dba_objects  Group by owner;
Some kinds of data skew naturally; some don’t. Guaranteed to be skewed E.g., status attribute (open | closed) of a sales order table Possibly not skewed E.g., sale date attribute of a sales order table
Histograms
What are the costs and benefits of histograms? Benefits of histograms CBO sometimes needs the information to make good decisions Costs of histograms Computing histograms will consume extra computing capacity during the statistics collection Some CPU time and extra latching is required during plan determination for the optimizer to consider histograms
Histograms provide the optimizer with better information from which to derive an execution plan for a query. A histogram is a graphic representation of frequency distribution by means of rectangles whose widths represent class intervals and whose heights represent corresponding frequencies Oracle implements histograms in two ways Height-balanced – created if column  NDV  >  SIZE Frequency – created if column  NDV  <=  SIZE
Types of Histograms Frequency Every distinct value in the column will have a count of how many occurrences of that value Height Balanced Histograms All histogram entries will have the same value but a range for the columns will be used
Frequency Histogram
Height Balanced Histogram
Histograms can be gathered by setting the parameter for  METHOD_OPT . For a specific column: FOR COLUMNS column_x SIZE <n|REPEAT|AUTO|SKEWONLY> For all the columns in a table: FOR ALL COLUMNS For only the columns that have an index: FOR ALL INDEXED COLUMNS EXEC DBMS_STATS.GATHER_TABLE_STATS( ownname=>'OP', tabname=>'my_table',  method_opt=>'FOR COLUMNS column_x SIZE 10')
Histograms are not useful in all cases. Histograms are not useful for columns with the following characteristics: All (or most) predicates on the column use bind variables The column data is uniformly distributed The column is unique and is used only with equality predicates Data distribution changes frequently and statistics aren't collected to match
Even in the most recent Oracle versions, histogram optimization doesn’t completely work with bind variables. Oracle version 8 Use of bind variables prohibits histogram optimization Oracle version 9 and above Oracle query optimizer “peeks” at bind value to use histogram optimization But only on initial hard parse of a query
Be prepared for how application developers might have worked around skew problems. The old-fashioned RBO technique Create the index Hard-code the selective query with “ status=1 ” Hard-code the un-selective query with “ status+0=1 ” A CBO technique Create the index Hard-code the selective query with  /*+ index(t) */ Hard-code the un-selective query with  /*+ full(t) */ Don’t resort to either of these!
Where Histogram Information is Stored DBA_TAB_HISTOGRAMS DBA_TAB_COL_STATISTICS
Demo Histogram Data Dictionary Tables
Impact Performance in terms of Logical I/O’s
Demo Cardinality
Demo Join Cardinality
Recap Histograms can be really useful when gathered on skewed columns Histograms are specific to your data and version Test it out and prove that gathering histograms is beneficial Be careful of bind variable substitutions as histograms may not be used

More Related Content

What's hot (19)

PPTX
Part2 Best Practices for Managing Optimizer Statistics
Maria Colgan
 
PPT
Overview of query evaluation
avniS
 
PDF
How to Analyze and Tune MySQL Queries for Better Performance
oysteing
 
PPT
Chapter15
gourab87
 
PDF
Brad McGehee Intepreting Execution Plans Mar09
guest9d79e073
 
PPTX
Honey I Shrunk the Database
Vanessa Hurst
 
PDF
How to analyze and tune sql queries for better performance vts2016
oysteing
 
PDF
MySQL Optimizer Cost Model
Olav Sandstå
 
PPTX
How to understand and analyze Apache Hive query execution plan for performanc...
DataWorks Summit/Hadoop Summit
 
PPTX
SQL Server 2016 Query store
Vitaliy Popovych
 
PPTX
Part4 Influencing Execution Plans with Optimizer Hints
Maria Colgan
 
PDF
phoenix-on-calcite-nyc-meetup
Maryann Xue
 
DOCX
Stacks
Acad
 
PPTX
02 database oprimization - improving sql performance - ent-db
uncleRhyme
 
PDF
8 query processing and optimization
Kumar
 
PDF
How to analyze and tune sql queries for better performance percona15
oysteing
 
PPT
Augustus Overview Open Source Analytics
jtrussell
 
PDF
Tech Talk - JPA and Query Optimization - publish
Gleydson Lima
 
DOCX
ETL and pivoting in spark
Subhasish Guha
 
Part2 Best Practices for Managing Optimizer Statistics
Maria Colgan
 
Overview of query evaluation
avniS
 
How to Analyze and Tune MySQL Queries for Better Performance
oysteing
 
Chapter15
gourab87
 
Brad McGehee Intepreting Execution Plans Mar09
guest9d79e073
 
Honey I Shrunk the Database
Vanessa Hurst
 
How to analyze and tune sql queries for better performance vts2016
oysteing
 
MySQL Optimizer Cost Model
Olav Sandstå
 
How to understand and analyze Apache Hive query execution plan for performanc...
DataWorks Summit/Hadoop Summit
 
SQL Server 2016 Query store
Vitaliy Popovych
 
Part4 Influencing Execution Plans with Optimizer Hints
Maria Colgan
 
phoenix-on-calcite-nyc-meetup
Maryann Xue
 
Stacks
Acad
 
02 database oprimization - improving sql performance - ent-db
uncleRhyme
 
8 query processing and optimization
Kumar
 
How to analyze and tune sql queries for better performance percona15
oysteing
 
Augustus Overview Open Source Analytics
jtrussell
 
Tech Talk - JPA and Query Optimization - publish
Gleydson Lima
 
ETL and pivoting in spark
Subhasish Guha
 

Viewers also liked (20)

PDF
The Cost Based Optimiser in 11gR2
Sage Computing Services
 
PPTX
AODV Protocol
Darshan Rathi
 
PDF
E learningt3 4puketapapahomework2015-3
Takahe One
 
PPTX
2013 stamps-intro-assembly
c.titus.brown
 
PPT
Review Adobe Wallaby
Julio Cesar Retamal Rojas
 
PPT
18 Di Concetta
Yvonne Sgroi
 
PPTX
La comunicazione-del-vino-ai-tempi-di-facebook
Slawka G. Scarso
 
PDF
Analizador sintáctico de Pascal escrito en Bison
Egdares Futch H.
 
PDF
Top 5 Issues Affecting the HR Profession in Ohio
Kegler Brown Hill + Ritter
 
PPT
MoMoTLV Israel March 2010 - Aviv Revach - Mobile Apps Monetization Overview
MobileMonday Tel-Aviv
 
PDF
2016 legal seminar for credit professionals
Kegler Brown Hill + Ritter
 
PDF
33 Lead Generation Tips in 33 Minutes
Alex Rascanu
 
PPT
Velkomst 011210 passivhus nordvest
Bertel Bolt-Jørgensen
 
PDF
2015 Ohio Ballot Issues
Kegler Brown Hill + Ritter
 
PDF
Kegler Brown's 2015 Managing Labor + Employee Relations Seminar
Kegler Brown Hill + Ritter
 
PPTX
Global crisis2011
sadettin
 
PDF
How to convert a file to Portable Document format (PDF)?
jessecadelina
 
PDF
pl_global-powers-cons-products-2015
Blossom Out
 
PDF
OSHA Goes On the Attack as the Obama Administration Winds Down: Are You Prepa...
Kegler Brown Hill + Ritter
 
PPTX
2015 ohsu-metagenome
c.titus.brown
 
The Cost Based Optimiser in 11gR2
Sage Computing Services
 
AODV Protocol
Darshan Rathi
 
E learningt3 4puketapapahomework2015-3
Takahe One
 
2013 stamps-intro-assembly
c.titus.brown
 
Review Adobe Wallaby
Julio Cesar Retamal Rojas
 
18 Di Concetta
Yvonne Sgroi
 
La comunicazione-del-vino-ai-tempi-di-facebook
Slawka G. Scarso
 
Analizador sintáctico de Pascal escrito en Bison
Egdares Futch H.
 
Top 5 Issues Affecting the HR Profession in Ohio
Kegler Brown Hill + Ritter
 
MoMoTLV Israel March 2010 - Aviv Revach - Mobile Apps Monetization Overview
MobileMonday Tel-Aviv
 
2016 legal seminar for credit professionals
Kegler Brown Hill + Ritter
 
33 Lead Generation Tips in 33 Minutes
Alex Rascanu
 
Velkomst 011210 passivhus nordvest
Bertel Bolt-Jørgensen
 
2015 Ohio Ballot Issues
Kegler Brown Hill + Ritter
 
Kegler Brown's 2015 Managing Labor + Employee Relations Seminar
Kegler Brown Hill + Ritter
 
Global crisis2011
sadettin
 
How to convert a file to Portable Document format (PDF)?
jessecadelina
 
pl_global-powers-cons-products-2015
Blossom Out
 
OSHA Goes On the Attack as the Obama Administration Winds Down: Are You Prepa...
Kegler Brown Hill + Ritter
 
2015 ohsu-metagenome
c.titus.brown
 
Ad

Similar to Cost Based Optimizer - Part 2 of 2 (20)

PPTX
Presentación Oracle Database Migración consideraciones 10g/11g/12c
Ronald Francisco Vargas Quesada
 
PPTX
Processes in Query Optimization in (ABMS) Advanced Database Management Systems
gamemaker762
 
PDF
Managing Statistics for Optimal Query Performance
Karen Morton
 
PPTX
DB
Samchu Li
 
PDF
Implementation of query optimization for reducing run time
Alexander Decker
 
PPTX
Explain the explain_plan
Maria Colgan
 
PPTX
Oracle Query Optimizer - An Introduction
adryanbub
 
PPTX
Beginners guide to_optimizer
Maria Colgan
 
PPTX
Analysis Services Best Practices From Large Deployments
rsnarayanan
 
PDF
Brad McGehee Intepreting Execution Plans Mar09
Mark Ginnebaugh
 
PDF
Cost-Based Optimizer in Apache Spark 2.2
Databricks
 
PDF
Data warehousing testing strategies cognos
Sandeep Mehta
 
PDF
Best Practices for Oracle Exadata and the Oracle Optimizer
Edgar Alejandro Villegas
 
PPTX
SQL Server 2008 Development for Programmers
Adam Hutson
 
PDF
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Databricks
 
DOCX
12 1-man-operation center-ug(2)
Ron DeLong
 
PDF
Ps training mannual ( configuration )
Soumya De
 
PPT
Oracle Sql Tuning
Chris Adkin
 
PDF
Presentation v mware roi tco calculator
solarisyourep
 
PPTX
PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...
Dave Stokes
 
Presentación Oracle Database Migración consideraciones 10g/11g/12c
Ronald Francisco Vargas Quesada
 
Processes in Query Optimization in (ABMS) Advanced Database Management Systems
gamemaker762
 
Managing Statistics for Optimal Query Performance
Karen Morton
 
Implementation of query optimization for reducing run time
Alexander Decker
 
Explain the explain_plan
Maria Colgan
 
Oracle Query Optimizer - An Introduction
adryanbub
 
Beginners guide to_optimizer
Maria Colgan
 
Analysis Services Best Practices From Large Deployments
rsnarayanan
 
Brad McGehee Intepreting Execution Plans Mar09
Mark Ginnebaugh
 
Cost-Based Optimizer in Apache Spark 2.2
Databricks
 
Data warehousing testing strategies cognos
Sandeep Mehta
 
Best Practices for Oracle Exadata and the Oracle Optimizer
Edgar Alejandro Villegas
 
SQL Server 2008 Development for Programmers
Adam Hutson
 
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Databricks
 
12 1-man-operation center-ug(2)
Ron DeLong
 
Ps training mannual ( configuration )
Soumya De
 
Oracle Sql Tuning
Chris Adkin
 
Presentation v mware roi tco calculator
solarisyourep
 
PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...
Dave Stokes
 
Ad

More from Mahesh Vallampati (20)

PDF
Operating a payables shared service organization in oracle cloud oow 2019_v4
Mahesh Vallampati
 
PPTX
Oracle BI Publisher to Transform Cloud ERP Reports
Mahesh Vallampati
 
PPTX
Cloudy with a chance of 1099
Mahesh Vallampati
 
PPTX
Banking on the Cloud
Mahesh Vallampati
 
PPTX
Statistical Accounts and Data in Oracle Cloud General Ledger
Mahesh Vallampati
 
PDF
Sparse Matrix Manipulation Made easy in an Oracle RDBMS
Mahesh Vallampati
 
PDF
The Data Architect Manifesto
Mahesh Vallampati
 
PPTX
Five pillars of competency
Mahesh Vallampati
 
PDF
Oracle EBS Change Projects Process Flows
Mahesh Vallampati
 
PDF
Cutover plan template Tool
Mahesh Vallampati
 
PDF
CRM Lead Lifecycle Process
Mahesh Vallampati
 
PPTX
Enough Blame for System Performance Issues
Mahesh Vallampati
 
PDF
Oracle R12 12.1.3 Legal Entity Data Gathering Template
Mahesh Vallampati
 
PDF
ERP Manager meets SDLC and CMMI
Mahesh Vallampati
 
PPT
Oracle 11i OID AD Integration
Mahesh Vallampati
 
PDF
Generic Backup and Restore Process
Mahesh Vallampati
 
PDF
OIC Process Flow V7
Mahesh Vallampati
 
PPT
XBRL in Oracle 11i and R12
Mahesh Vallampati
 
PDF
Sales Process Flow V4
Mahesh Vallampati
 
DOCX
ITP Instance Management Process V2
Mahesh Vallampati
 
Operating a payables shared service organization in oracle cloud oow 2019_v4
Mahesh Vallampati
 
Oracle BI Publisher to Transform Cloud ERP Reports
Mahesh Vallampati
 
Cloudy with a chance of 1099
Mahesh Vallampati
 
Banking on the Cloud
Mahesh Vallampati
 
Statistical Accounts and Data in Oracle Cloud General Ledger
Mahesh Vallampati
 
Sparse Matrix Manipulation Made easy in an Oracle RDBMS
Mahesh Vallampati
 
The Data Architect Manifesto
Mahesh Vallampati
 
Five pillars of competency
Mahesh Vallampati
 
Oracle EBS Change Projects Process Flows
Mahesh Vallampati
 
Cutover plan template Tool
Mahesh Vallampati
 
CRM Lead Lifecycle Process
Mahesh Vallampati
 
Enough Blame for System Performance Issues
Mahesh Vallampati
 
Oracle R12 12.1.3 Legal Entity Data Gathering Template
Mahesh Vallampati
 
ERP Manager meets SDLC and CMMI
Mahesh Vallampati
 
Oracle 11i OID AD Integration
Mahesh Vallampati
 
Generic Backup and Restore Process
Mahesh Vallampati
 
OIC Process Flow V7
Mahesh Vallampati
 
XBRL in Oracle 11i and R12
Mahesh Vallampati
 
Sales Process Flow V4
Mahesh Vallampati
 
ITP Instance Management Process V2
Mahesh Vallampati
 

Recently uploaded (20)

PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
PDF
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
Biography of Daniel Podor.pdf
Daniel Podor
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
Biography of Daniel Podor.pdf
Daniel Podor
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 

Cost Based Optimizer - Part 2 of 2

  • 1. Cost Based Optimizer – 2 of 2 Hotsos Enterprises, Ltd. Grapevine, Texas Oracle. Performance. Now. [email_address]
  • 2. Agenda Cost Based Optimizer and its impact on performance Skewed Data Histograms Impact Performance (Logical I/O Impact) Performance (Join Strategy) Bind Variables Cardinality and Cost Conclusion
  • 4. Cost Based Optimizer (CBO) The CBO in reality is a complex decision making software Use several Database Initialization Parameters These are listed in the 10053 trace file Uses several session level initialization parameter These are parameters at the session level that override the database initialization parameters Uses statistics about the objects (Tables, Indexes) Hints to the optimizer Uses Statistics about the system (CPU, Disk etc) Use this information and makes decisions on the “best way” to generate an execution plan Use Information about the skew of the column if that information is gathered
  • 5. CBO will be part of your life if you keep working with Oracle. The cost-based query optimizer (CBO)… Uses data from a variety of sources Estimates the costs of several execution plans Chooses the plan it estimates to be the least expensive Characteristics Adapts to changing circumstances Frustrating if you don’t know what it considers as input Works great if you know how to use it But produces very poor results if you lie to it The only query optimizer supported by Oracle Corporation from release 10 onward
  • 6. The cost-based query optimizer chooses the plan that it computes as having the lowest estimated cost. Don’t assume the following are identical CBO’s estimated cost of an execution plan The actual cost of an execution plan CBO’s cost estimate can be imperfect Are your CBO inputs perfect? CBO isn’t perfect, but by 9.2 it’s almost always good enough Without properly collected statistics, the CBO will use RBO if no statistics exist on any object in the statement use default statistics if statistics exist for a single object in the statement but not others use dynamic sampling to generate statistics (based on parameter setting and Oracle version)
  • 8. Execution plan changes can result in profoundly different application performance. Table size change Device latency change Execution plan change Type C performance changes are the most profound size change performance change performance change performance change
  • 9. Recap The CBO is a complex piece of software It uses several data points to calculate the cost of the execution plan and will choose the plan with the lowest cost It is dynamic and will adapt to changing data better than the Rule Based Optimizer A good understanding of the Cost Based Optimizer is imperative in understanding the rationale behind some of the choices
  • 11. Skewed Data Skewed Data is where the data distribution is not uniform A good example is the owner column for dba_objects The column is highly skewed Select owner,count(*) from dba_objects Group by owner;
  • 12. Some kinds of data skew naturally; some don’t. Guaranteed to be skewed E.g., status attribute (open | closed) of a sales order table Possibly not skewed E.g., sale date attribute of a sales order table
  • 14. What are the costs and benefits of histograms? Benefits of histograms CBO sometimes needs the information to make good decisions Costs of histograms Computing histograms will consume extra computing capacity during the statistics collection Some CPU time and extra latching is required during plan determination for the optimizer to consider histograms
  • 15. Histograms provide the optimizer with better information from which to derive an execution plan for a query. A histogram is a graphic representation of frequency distribution by means of rectangles whose widths represent class intervals and whose heights represent corresponding frequencies Oracle implements histograms in two ways Height-balanced – created if column NDV > SIZE Frequency – created if column NDV <= SIZE
  • 16. Types of Histograms Frequency Every distinct value in the column will have a count of how many occurrences of that value Height Balanced Histograms All histogram entries will have the same value but a range for the columns will be used
  • 19. Histograms can be gathered by setting the parameter for METHOD_OPT . For a specific column: FOR COLUMNS column_x SIZE <n|REPEAT|AUTO|SKEWONLY> For all the columns in a table: FOR ALL COLUMNS For only the columns that have an index: FOR ALL INDEXED COLUMNS EXEC DBMS_STATS.GATHER_TABLE_STATS( ownname=>'OP', tabname=>'my_table', method_opt=>'FOR COLUMNS column_x SIZE 10')
  • 20. Histograms are not useful in all cases. Histograms are not useful for columns with the following characteristics: All (or most) predicates on the column use bind variables The column data is uniformly distributed The column is unique and is used only with equality predicates Data distribution changes frequently and statistics aren't collected to match
  • 21. Even in the most recent Oracle versions, histogram optimization doesn’t completely work with bind variables. Oracle version 8 Use of bind variables prohibits histogram optimization Oracle version 9 and above Oracle query optimizer “peeks” at bind value to use histogram optimization But only on initial hard parse of a query
  • 22. Be prepared for how application developers might have worked around skew problems. The old-fashioned RBO technique Create the index Hard-code the selective query with “ status=1 ” Hard-code the un-selective query with “ status+0=1 ” A CBO technique Create the index Hard-code the selective query with /*+ index(t) */ Hard-code the un-selective query with /*+ full(t) */ Don’t resort to either of these!
  • 23. Where Histogram Information is Stored DBA_TAB_HISTOGRAMS DBA_TAB_COL_STATISTICS
  • 24. Demo Histogram Data Dictionary Tables
  • 25. Impact Performance in terms of Logical I/O’s
  • 28. Recap Histograms can be really useful when gathered on skewed columns Histograms are specific to your data and version Test it out and prove that gathering histograms is beneficial Be careful of bind variable substitutions as histograms may not be used

Editor's Notes

  • #7: Note that without properly collected statistics, the CBO will do one of two things: if no statistics exist for any object used in the SQL statement, the CBO may use rule-based optimization (prior to v10) or use dynamic sampling if statistics exist for any single object but not others in the SQL statement, the CBO may use a set of default statistics for the object without statistics or use dynamic sampling. CBO default statistics for objects without collected stats (prior to v10…in v10 dynamic sampling is typically used instead of defaults): TABLE SETTING DEFAULT STATISTICS cardinality (number of blocks * (block size – cache layer) / average row length average row length 100 bytes number of blocks 100 or actual value based on the extent map remote cardinality (distrib) 2000 rows remote average row length 100 bytes INDEX SETTING DEFAULT STATISTICS levels 1 leaf blocks 25 leaf blocks/key 1 data blocks/key 1 distinct keys 100 clustering factor 800
  • #9: Plot A illustrates a situation in which the execution plan does not change, but the query response time varies significantly as the number of rows in the table changes. This kind of thing occurs when an application chooses a TABLE ACCESS (FULL) execution plan for a growing table. It’s what causes RBO-based applications to appear fast in a small development environment, but then behave poorly in the production environment. Plot B illustrates the marginal improvement that’s achievable, for example, by distributing an inefficient application’s workload more uniformly across the disks in a disk array. Notice that the execution plan (or “shape of the performance curve”) isn’t necessarily changed by such an operation (although, if the output of dbms_stats.gather_system_statistics changes as a result of the configuration change, then the plan might change). The performance for a given number of rows might change, however, as the plot here indicates. Plot C illustrates what is commonly the most profound type of performance change: an execution plan change. This situation can be caused by a change to any of CBO inputs. For example, an accidental deletion of a segment’s statistics can change a plan from a nice fast plan (depicted by the green curve, which is O(log n)) to a horrifically slow plan (depicted by the red curve, which is O(n 2 )). The phenomenon illustrated in plot C is what has happened when a query that was fast last week now runs for 14 hours without completing before you finally give up and kill the session.
  • #15: Since the CBO determines the selectivity of predicates that appear in queries, it is important that there be adequate information for the CBO to make it&apos;s estimates properly. By gathering histogram data, the CBO can make improved selectivity estimates in the presence of data skew, resulting in optimal execution plans with non-uniform data distributions. The histogram approach provides an efficient and compact way to represent data distributions. Selectivity estimates are used to decide when to use an index and the order in which to join tables. Many table columns are not uniformly distributed. Therefore, the normal calculations for selectivity may not be accurate without the use of histograms.
  • #16: Height-balanced histograms put approximately the same number of values into each interval, so that the endpoints of the interval are determined by the number of values in that interval. Only the last (largest) values in each bucket appear as bucket (end point) values. A height-balanced histogram will be created if the number of histogram buckets ( SIZE ) indicates a value smaller than the number of distinct values in the column. Frequency histograms (sometimes called value-based histograms) are created when the number of histogram buckets ( SIZE ) specified is greater than or equal to the number of distinct column values. In frequency histograms, all the individual values in the column have a corresponding bucket, and the bucket number reflects the repetition count of each value. The type of histogram is stored in the HISTOGRAM column of the *TAB_COL_STATISTICS views. The column can have values of HEIGHT BALANCED, FREQUENCY , or NONE . The SIZE of a histogram can be set by you or automatically by Oracle when the histogram is collected. The default SIZE (when no SIZE is specified) is 75. The maximum SIZE is 255.
  • #20: DBMS_STATS Constants SIZE REPEAT Causes the histograms to be created with the same options as last time you created it. It reads the data dictionary to figure out what to do. SIZE AUTO Oracle looks at the data and using a magical, undocumented and changing algorithm, figures out all by itself what columns to gather stats on and how many buckets and all. It&apos;ll collect histograms in memory only for those columns which are used by your applications (those columns appearing in a predicate involving an equality, range, or like operators). It knows that a particular column was used by an application because at parse time, it will store workload information in SGA. Then it will store histograms in the data dictionary only if it has skewed data (and it worthy of a histogram). SIZE SKEWONLY When you collect histograms with the SIZE option set to SKEWONLY , it collects histogram data in memory for all specified columns (if you do not specify any, all columns are used). Once an &amp;quot;in-memory&amp;quot; histogram is computed for a column, it is stored inside the data dictionary only if it has &amp;quot;popular&amp;quot; values (multiple end-points with the same value which is what is meant by &amp;quot;there is skew in the data&amp;quot;).
  • #22: In Oracle version 8, the use of bind variables in a predicate effectively disables the use of histograms. This is because the optimizer needs to know the value ( WHERE col = &apos;x&apos; ) in order to check the histogram statistics for selectivity for that value. When a bind variable is used, it is not actually bound into the query until execution time. Since the execution plan is determined in the parse phase, the optimizer won&apos;t know the value and thus can&apos;t use the histogram to makes its decision. In Oracle version 9, the optimizer behavior regarding bind variables changed slightly. In version 9, when a query is initially parsed, the optimizer will &amp;quot;peek&amp;quot; at the value of the bind variable and use the value it finds to make decisions. Does that make the situation better or worse? It depends. Let&apos;s say that when the query is initially parsed, it has a bind variable value of 1 being used in the predicate. If the column has a histogram and the histogram indicates that selectivity is low for that value (few values match), then it will likely choose to use an index on that column if available. Everything works well, performance is sub-second and everyone is happy. Now, what happens if the query is executed a 2 nd time but passes the value of 0 in the bind variable (and the selectivity for the value 0 is high…lots of values match). What happens? The original plan is still used and the query will attempt to use the same index. If there are thousands of records in the row source, it is likely that the index scan will perform significantly worse than simply doing a full table scan. In this case, everything works but performance stinks and complaints arise. So, what do you do? For some, the best solution is to not use bind variables when you have a column with a limited number of values and the values are skewed and to just hard-code the value you need. The best way to know what to do is to test different approaches to find what works best for your environment.
  • #23: The RBO workaround is forgivable because it’s all the RBO environment could offer as an option. The CBO technique shown here is particularly bad because it makes the application less flexible and therefore less able to respond appropriately to system changes. Ideally, if you (the developer) already know that data for certain columns tends to skew, you can write code to account for it. A good guideline to follow is to look at the number of distinct values in the column. If the column has only a few distinct values, then hard-coding the value will allow the optimizer to correctly choose the plan based on histogram data. If there are a lot of distinct values, but you know in advance the actual skewed values, you could write conditional code to use a bind variable in all cases except when the known skewed values are requested. In that case, the conditional code would branch to a SQL statement version which hard-codes the skewed value under those circumstances.