SlideShare a Scribd company logo
Histogram Support in MySQL 8.0
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Histogram Support in MySQL 8.0
Øystein Grøvlen
Senior Principal Software Engineer
MySQL Optimizer Team, Oracle
February 2018
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Motivating example
Quick start guide
How are histograms used?
Query example
Some advice
1
2
3
4
5
3
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Motivating example
Quick start guide
How are histograms used?
Query example
Some advice
1
2
3
4
5
4
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Motivating Example
EXPLAIN SELECT *
FROM orders JOIN customer ON o_custkey = c_custkey
WHERE o_orderdate < '1993-01-01' AND c_acctbal < -1000;
5
JOIN Query
id
select
type
table type possible keys key
key
len
ref rows filtered extra
1 SIMPLE orders ALL
i_o_orderdate,
i_o_custkey
NULL NULL NULL 15000000 31.19
Using
where
1 SIMPLE customer
eq_
ref
PRIMARY PRIMARY 4
dbt3.orders.
o_custkey
1 33.33
Using
where
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Motivating Example
EXPLAIN SELECT /*+ JOIN_ORDER(customer, orders) */ *
FROM orders JOIN customer ON o_custkey = c_custkey
WHERE o_orderdate < '1993-01-01' AND c_acctbal < -1000;
6
Reverse join order
id
select
type
table type possible keys key
key
len
ref rows filtered extra
1 SIMPLE customer ALL PRIMARY NULL NULL NULL 1500000 33.33
Using
where
1 SIMPLE orders ref
i_o_orderdate,
i_o_custkey
i_o_custkey 5
dbt3.
customer.
c_custkey
15 31.19
Using
where
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Comparing Join Order
0
2
4
6
8
10
12
14
16
QueryExecutionTime(seconds)
orders → customer customer → orders
Performance
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Histograms
ANALYZE TABLE customer UPDATE HISTOGRAM ON c_acctbal WITH 1024 BUCKETS;
EXPLAIN SELECT *
FROM orders JOIN customer ON o_custkey = c_custkey
WHERE o_orderdate < '1993-01-01' AND c_acctbal < -1000;
8
Create histogram to get a better plan
id
select
type
table type possible keys key
key
len
ref rows filtered extra
1 SIMPLE customer ALL PRIMARY NULL NULL NULL 1500000 0.00
Using
where
1 SIMPLE orders ref
i_o_orderdate,
i_o_custkey
i_o_custkey 5
dbt3.
customer.
c_custkey
15 31.19
Using
where
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Motivating example
Quick start guide
How are histograms used?
Query example
Some advice
1
2
3
4
5
9
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Histograms
• Information about value distribution for a column
• Data values group in buckets
– Frequency calculated for each bucket
– Maximum 1024 buckets
• May use sampling to build histogram
– Sample rate depends on available memory
• Automatically chooses between two histogram types:
– Singleton: One value per bucket
– Equi-height: Multiple values per bucket
10
Column statistics
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Singleton Histogram
0
0,05
0,1
0,15
0,2
0,25
0 1 2 3 5 6 7 8 9 10
Frequency
• One value per bucket
• Each bucket stores:
– Value
– Cumulative frequency
• Well suited to estimate both
equality and range predicates
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Equi-Height Histogram
0
0,05
0,1
0,15
0,2
0,25
0,3
0,35
0 - 0 1 - 1 2 - 3 5 - 6 7 - 10
Frequency
• Multiple values per bucket
• Not quite equi-height
– Values are not split across buckets
⇒Frequent values in separate buckets
• Each bucket stores:
– Minimum value
– Maximum value
– Cumulative frequency
– Number of distinct values
• Best suited for range predicates
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Usage
• Create or refresh histogram(s) for column(s):
ANALYZE TABLE table UPDATE HISTOGRAM ON column [, column] WITH n BUCKETS;
– Note: Will only update histogram, not other statistics
• Drop histogram:
ANALYZE TABLE table DROP HISTOGRAM ON column [, column];
• Based on entire table or sampling:
– Depends on avail. memory: histogram_generation_max_mem_size (default: 20 MB)
• New storage engine API for sampling
– Default implementation: Full table scan even when sampling
– Storage engines may implement more efficient sampling
13
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Storage
• Stored in a JSON column in data dictionary
• Can be inspected in Information Schema table:
SELECT JSON_PRETTY(histogram)
FROM information_schema.column_statistics
WHERE schema_name = 'dbt3_sf1'
AND table_name ='lineitem'
AND column_name = 'l_linenumber';
14
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Histogram content
{
"buckets": [[1, 0.24994938524948698], [2, 0.46421066400720523],
[3, 0.6427401784471978], [4, 0.7855470933802572],
[5, 0.8927398868395817], [6, 0.96423707532558], [7, 1] ],
"data-type": "int",
"null-values": 0.0,
"collation-id": 8,
"last-updated": "2018-02-03 21:05:21.690872",
"sampling-rate": 0.20829115437457252,
"histogram-type": "singleton",
"number-of-buckets-specified": 1024
}
15
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Strings
• Max. 42 characters considered
• Base64 encoded
SELECT FROM_BASE64(SUBSTR(v, LOCATE(':', v, 10) + 1)) value, c cumulfreq
FROM information_schema.column_statistics,
JSON_TABLE(histogram->'$.buckets', '$[*]'
COLUMNS(v VARCHAR(60) PATH '$[0]',
c double PATH '$[1]')) hist
WHERE column_name = 'o_orderstatus';
+-------+--------------------+
| value | cumulfreq |
+-------+--------------------+
| F | 0.4862529264385756 |
| O | 0.974029654577566 |
| P | 0.9999999999999999 |
+-------+--------------------+
16
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Calculate Bucket Frequency
SELECT FROM_BASE64(SUBSTR(v, LOCATE(':', v, 10) + 1)) value, c cumulfreq,
c - LAG(c, 1, 0) over () freq
FROM information_schema.column_statistics,
JSON_TABLE(histogram->'$.buckets', '$[*]'
COLUMNS(v VARCHAR(60) PATH '$[0]',
c double PATH '$[1]')) hist
WHERE column_name = 'o_orderstatus';
+-------+--------------------+----------------------+
| value | cumulfreq | freq |
+-------+--------------------+----------------------+
| F | 0.4862529264385756 | 0.4862529264385756 |
| O | 0.974029654577566 | 0.48777672813899037 |
| P | 0.9999999999999999 | 0.025970345422433927 |
+-------+--------------------+----------------------+
Use window function
17
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Motivating example
Quick start guide
How are histograms used?
Query example
Some advice
1
2
3
4
5
18
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
• tx JOIN tx+1
• records(tx+1) = records(tx) * condition_filter_effect * records_per_key
When are Histograms useful?
Estimate cost of join
tx tx+1
Ref
access
Number of
records read
from tx
Conditionfilter
effect
Records passing the
table conditions on tx
Cardinality statistics
for index
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Filter estimate based on what is
available:
1. Range estimate
2. Index statistics
3. Guesstimate
= 0.1
<=,<,>,>= 1/3
BETWEEN 1/9
NOT <op> 1 – SEL(<op>)
AND P(A and B) = P(A) * P(B)
OR P(A or B) = P(A) + P(B) – P(A and B)
… …
How to Calculate Condition Filter Effect, MySQL 5.7
SELECT *
FROM office JOIN employee ON office.id = employee.office_id
WHERE office_name = 'San Francisco' AND
employee.name = 'John' AND age > 21 AND
hire_date BETWEEN '2014-01-01' AND '2014-06-01';
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Filter estimate based on what is
available:
1. Range estimate
2. Index statistics
3. Histograms
4. Guesstimate
= 0.1
<=,<,>,>= 1/3
BETWEEN 1/9
NOT <op> 1 – SEL(<op>)
AND P(A and B) = P(A) * P(B)
OR P(A or B) = P(A) + P(B) – P(A and B)
… …
How to Calculate Condition Filter Effect, MySQL 5.7
SELECT *
FROM office JOIN employee ON office.id = employee.office_id
WHERE office_name = 'San Francisco' AND
employee.name = 'John' AND age > 21 AND
hire_date BETWEEN '2014-01-01' AND '2014-06-01';
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
SELECT *
FROM office JOIN employee ON office.id = employee.office_id
WHERE office_name = 'San Francisco' AND
employee.name = 'John' AND age > 21 AND
hire_date BETWEEN '2014-01-01' AND '2014-06-01';
Calculating Condition Filter Effect for Tables
Condition filter effect for tables:
– office: 0.03
– employee: 0.29 * 0.1 * 0.33 ≈ 0.01
Example without histograms
0.1
(guesstimate)
0.33
(guesstimate)
0.29
(range)
0.03
(index)
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
SELECT *
FROM office JOIN employee ON office.id = employee.office_id
WHERE office_name = 'San Francisco' AND
employee.name = 'John' AND age > 21 AND
hire_date BETWEEN '2014-01-01' AND '2014-06-01';
Calculating Condition Filter Effect for Tables
Condition filter effect for tables:
– office: 0.03
– employee: 0.29 * 0.1 * 0.95 ≈ 0.03
Example with histogram
0.1
(guesstimate)
0.95
(histogram)
0.29
(range)
0.03
(index)
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Computing Selectivity From Histogram
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
0-7
8-16
17-24
25-31
32-38
39-46
47-53
54-61
62-70
71-104
Frequency
age
Cumulative Frequency
Example
age <= 21
0.203
Selectivity = 0.203 +
0.306
(0.306 – 0.203) * 5/8 = 0.267
age > 21 Selectivity = 1 - 0.267 = 0.733
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Motivating example
Quick start guide
How are histograms used?
Query example
Some advice
1
2
3
4
5
25
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
DBT-3 Query 7
SELECT supp_nation, cust_nation, l_year, SUM(volume) AS revenue
FROM (SELECT n1.n_name AS supp_nation, n2.n_name AS cust_nation,
EXTRACT(YEAR FROM l_shipdate) AS l_year,
l_extendedprice * (1 - l_discount) AS volume
FROM supplier, lineitem, orders, customer, nation n1, nation n2
WHERE s_suppkey = l_suppkey AND o_orderkey = l_orderkey
AND c_custkey = o_custkey AND s_nationkey = n1.n_nationkey
AND c_nationkey = n2.n_nationkey
AND ((n1.n_name = 'RUSSIA' AND n2.n_name = 'FRANCE')
OR (n1.n_name = 'FRANCE' AND n2.n_name = 'RUSSIA'))
AND l_shipdate BETWEEN '1995-01-01' AND '1996-12-31') AS shipping
GROUP BY supp_nation , cust_nation , l_year
ORDER BY supp_nation , cust_nation , l_year;
Volume Shipping Query
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
DBT-3 Query 7
Query plan without histogram
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
DBT-3 Query 7
Query plan with histogram
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
DBT-3 Query 7
0,0
0,2
0,4
0,6
0,8
1,0
1,2
1,4
1,6
1,8
QueryExecutionTime(seconds)
Without histogram With histogram
Performance
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Motivating example
Quick start guide
How is histograms used?
Query example
Some advice
1
2
3
4
5
30
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Some advice
• Histograms are useful for columns that are
– not the first column of any index, and
– used in WHERE conditions of
• JOIN queries
• Queries with IN-subqueries
• ORDER BY ... LIMIT queries
• Best fit
– Low cardinality columns (e.g., gender, orderStatus, dayOfWeek, enums)
– Columns with uneven distribution (skew)
– Stable distribution (do not change much over time)
Which columns to create histograms for?
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Some more advice
• When not to create histograms:
– First column of an index
– Never used in WHERE clause
– Monotonically increasing column values (e.g. date columns)
• Histogram will need frequent updates to be accurate
• Consider to create index
• How many buckets?
– If possible, enough to get a singleton histogram
– For equi-height, 100 buckets should be enough
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
More information
• MySQL Server Team blog
– http://mysqlserverteam.com/
– https://mysqlserverteam.com/histogram-statistics-in-mysql/ (Erik Frøseth)
• My blog:
– http://oysteing.blogspot.com/
• MySQL forums:
– Optimizer & Parser: http://forums.mysql.com/list.php?115
– Performance: http://forums.mysql.com/list.php?24
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The preceding is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, and timing of any features or
functionality described for Oracle’s products remains at the sole discretion of Oracle.
34
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 35
Histogram Support in MySQL 8.0

More Related Content

What's hot (20)

PDF
How to analyze and tune sql queries for better performance webinar
oysteing
 
PDF
MySQL 8.0.18 latest updates: Hash join and EXPLAIN ANALYZE
Norvald Ryeng
 
PDF
How to Take Advantage of Optimizer Improvements in MySQL 8.0
Norvald Ryeng
 
PDF
SQL window functions for MySQL
Dag H. Wanvik
 
PDF
LATERAL Derived Tables in MySQL 8.0
Norvald Ryeng
 
PDF
Query optimization techniques for partitioned tables.
Ashutosh Bapat
 
PDF
Partition and conquer large data in PostgreSQL 10
Ashutosh Bapat
 
PDF
Agile Database Development with JSON
Chris Saxon
 
PDF
Api presentation
Susant Sahani
 
PPTX
New SQL features in latest MySQL releases
Georgi Sotirov
 
PDF
PostgreSQL: Advanced features in practice
Jano Suchal
 
PPTX
Wellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
Connor McDonald
 
PPT
Array i imp
Vivek Kumar
 
PDF
Hadoop Summit EU 2014
cwensel
 
PPT
Explain that explain
Fabrizio Parrella
 
PPT
Oracle tips and tricks
Yanli Liu
 
PPTX
Trie Data Structure
Badiuzzaman Pranto
 
PPTX
Getting started with R when analysing GitHub commits
Barbara Fusinska
 
PDF
R Programming: Export/Output Data In R
Rsquared Academy
 
PPT
R Brown-bag seminars : Seminar-8
Muhammad Nabi Ahmad
 
How to analyze and tune sql queries for better performance webinar
oysteing
 
MySQL 8.0.18 latest updates: Hash join and EXPLAIN ANALYZE
Norvald Ryeng
 
How to Take Advantage of Optimizer Improvements in MySQL 8.0
Norvald Ryeng
 
SQL window functions for MySQL
Dag H. Wanvik
 
LATERAL Derived Tables in MySQL 8.0
Norvald Ryeng
 
Query optimization techniques for partitioned tables.
Ashutosh Bapat
 
Partition and conquer large data in PostgreSQL 10
Ashutosh Bapat
 
Agile Database Development with JSON
Chris Saxon
 
Api presentation
Susant Sahani
 
New SQL features in latest MySQL releases
Georgi Sotirov
 
PostgreSQL: Advanced features in practice
Jano Suchal
 
Wellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
Connor McDonald
 
Array i imp
Vivek Kumar
 
Hadoop Summit EU 2014
cwensel
 
Explain that explain
Fabrizio Parrella
 
Oracle tips and tricks
Yanli Liu
 
Trie Data Structure
Badiuzzaman Pranto
 
Getting started with R when analysing GitHub commits
Barbara Fusinska
 
R Programming: Export/Output Data In R
Rsquared Academy
 
R Brown-bag seminars : Seminar-8
Muhammad Nabi Ahmad
 

Similar to Histogram Support in MySQL 8.0 (20)

PDF
Histograms: Pre-12c and now
Anju Garg
 
PDF
How to use histograms to get better performance
MariaDB plc
 
PDF
Using histograms to get better performance
Sergey Petrunya
 
PDF
Histogram-in-Parallel-universe-of-MySQL-and-MariaDB
Mydbops
 
PDF
Improved histograms in MariaDB 10.8
Sergey Petrunya
 
PDF
Cardinality Estimation through Histogram in Apache Spark 2.3 with Ron Hu and ...
Databricks
 
PDF
Understanding histogramppt.prn
Leyi (Kamus) Zhang
 
PDF
Histograms : Pre-12c and Now
Anju Garg
 
PDF
MariaDB 10.3 Optimizer - where does it stand
Sergey Petrunya
 
PDF
Billion Goods in Few Categories: How Histograms Save a Life?
Sveta Smirnova
 
PDF
Histograms in MariaDB, MySQL and PostgreSQL
Sergey Petrunya
 
PPTX
Melbourne Groundbreakers Tour - Upgrading without risk
Connor McDonald
 
PPTX
Sangam 18 - The New Optimizer in Oracle 12c
Connor McDonald
 
PDF
Histograms in 12c era
Mauro Pagano
 
PPTX
Calamities with cardinalities
Randolf Geist
 
PDF
Riyaj: why optimizer_hates_my_sql_2010
Riyaj Shamsudeen
 
PDF
Optimizer Histograms: When they Help and When Do Not?
Sveta Smirnova
 
PPTX
DB
Samchu Li
 
DOCX
10053 - null is not nothing
Heribertus Bramundito
 
PDF
Enhancing Spark SQL Optimizer with Reliable Statistics
Jen Aman
 
Histograms: Pre-12c and now
Anju Garg
 
How to use histograms to get better performance
MariaDB plc
 
Using histograms to get better performance
Sergey Petrunya
 
Histogram-in-Parallel-universe-of-MySQL-and-MariaDB
Mydbops
 
Improved histograms in MariaDB 10.8
Sergey Petrunya
 
Cardinality Estimation through Histogram in Apache Spark 2.3 with Ron Hu and ...
Databricks
 
Understanding histogramppt.prn
Leyi (Kamus) Zhang
 
Histograms : Pre-12c and Now
Anju Garg
 
MariaDB 10.3 Optimizer - where does it stand
Sergey Petrunya
 
Billion Goods in Few Categories: How Histograms Save a Life?
Sveta Smirnova
 
Histograms in MariaDB, MySQL and PostgreSQL
Sergey Petrunya
 
Melbourne Groundbreakers Tour - Upgrading without risk
Connor McDonald
 
Sangam 18 - The New Optimizer in Oracle 12c
Connor McDonald
 
Histograms in 12c era
Mauro Pagano
 
Calamities with cardinalities
Randolf Geist
 
Riyaj: why optimizer_hates_my_sql_2010
Riyaj Shamsudeen
 
Optimizer Histograms: When they Help and When Do Not?
Sveta Smirnova
 
10053 - null is not nothing
Heribertus Bramundito
 
Enhancing Spark SQL Optimizer with Reliable Statistics
Jen Aman
 
Ad

More from oysteing (9)

PDF
POLARDB: A database architecture for the cloud
oysteing
 
PDF
POLARDB: A database architecture for the cloud
oysteing
 
PDF
POLARDB for MySQL - Parallel Query
oysteing
 
PDF
JSON_TABLE -- The best of both worlds
oysteing
 
PDF
How to Analyze and Tune MySQL Queries for Better Performance
oysteing
 
PDF
How to Analyze and Tune MySQL Queries for Better Performance
oysteing
 
PDF
How to analyze and tune sql queries for better performance vts2016
oysteing
 
PDF
How to Analyze and Tune MySQL Queries for Better Performance
oysteing
 
PDF
How to analyze and tune sql queries for better performance percona15
oysteing
 
POLARDB: A database architecture for the cloud
oysteing
 
POLARDB: A database architecture for the cloud
oysteing
 
POLARDB for MySQL - Parallel Query
oysteing
 
JSON_TABLE -- The best of both worlds
oysteing
 
How to Analyze and Tune MySQL Queries for Better Performance
oysteing
 
How to Analyze and Tune MySQL Queries for Better Performance
oysteing
 
How to analyze and tune sql queries for better performance vts2016
oysteing
 
How to Analyze and Tune MySQL Queries for Better Performance
oysteing
 
How to analyze and tune sql queries for better performance percona15
oysteing
 
Ad

Recently uploaded (20)

PDF
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
PDF
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
PDF
Adobe Premiere Pro Crack / Full Version / Free Download
hashhshs786
 
PPTX
In From the Cold: Open Source as Part of Mainstream Software Asset Management
Shane Coughlan
 
PDF
TheFutureIsDynamic-BoxLang witch Luis Majano.pdf
Ortus Solutions, Corp
 
PDF
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
PPTX
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
PPTX
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
PPTX
Customise Your Correlation Table in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PPTX
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
PDF
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
PDF
How to Hire AI Developers_ Step-by-Step Guide in 2025.pdf
DianApps Technologies
 
PDF
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
PPTX
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
PDF
AOMEI Partition Assistant Crack 10.8.2 + WinPE Free Downlaod New Version 2025
bashirkhan333g
 
PDF
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
PPTX
Comprehensive Risk Assessment Module for Smarter Risk Management
EHA Soft Solutions
 
PDF
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
Adobe Premiere Pro Crack / Full Version / Free Download
hashhshs786
 
In From the Cold: Open Source as Part of Mainstream Software Asset Management
Shane Coughlan
 
TheFutureIsDynamic-BoxLang witch Luis Majano.pdf
Ortus Solutions, Corp
 
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
Customise Your Correlation Table in IBM SPSS Statistics.pptx
Version 1 Analytics
 
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
How to Hire AI Developers_ Step-by-Step Guide in 2025.pdf
DianApps Technologies
 
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
AOMEI Partition Assistant Crack 10.8.2 + WinPE Free Downlaod New Version 2025
bashirkhan333g
 
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
Comprehensive Risk Assessment Module for Smarter Risk Management
EHA Soft Solutions
 
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 

Histogram Support in MySQL 8.0

  • 2. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Histogram Support in MySQL 8.0 Øystein Grøvlen Senior Principal Software Engineer MySQL Optimizer Team, Oracle February 2018
  • 3. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Program Agenda Motivating example Quick start guide How are histograms used? Query example Some advice 1 2 3 4 5 3
  • 4. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Program Agenda Motivating example Quick start guide How are histograms used? Query example Some advice 1 2 3 4 5 4
  • 5. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Motivating Example EXPLAIN SELECT * FROM orders JOIN customer ON o_custkey = c_custkey WHERE o_orderdate < '1993-01-01' AND c_acctbal < -1000; 5 JOIN Query id select type table type possible keys key key len ref rows filtered extra 1 SIMPLE orders ALL i_o_orderdate, i_o_custkey NULL NULL NULL 15000000 31.19 Using where 1 SIMPLE customer eq_ ref PRIMARY PRIMARY 4 dbt3.orders. o_custkey 1 33.33 Using where
  • 6. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Motivating Example EXPLAIN SELECT /*+ JOIN_ORDER(customer, orders) */ * FROM orders JOIN customer ON o_custkey = c_custkey WHERE o_orderdate < '1993-01-01' AND c_acctbal < -1000; 6 Reverse join order id select type table type possible keys key key len ref rows filtered extra 1 SIMPLE customer ALL PRIMARY NULL NULL NULL 1500000 33.33 Using where 1 SIMPLE orders ref i_o_orderdate, i_o_custkey i_o_custkey 5 dbt3. customer. c_custkey 15 31.19 Using where
  • 7. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Comparing Join Order 0 2 4 6 8 10 12 14 16 QueryExecutionTime(seconds) orders → customer customer → orders Performance
  • 8. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Histograms ANALYZE TABLE customer UPDATE HISTOGRAM ON c_acctbal WITH 1024 BUCKETS; EXPLAIN SELECT * FROM orders JOIN customer ON o_custkey = c_custkey WHERE o_orderdate < '1993-01-01' AND c_acctbal < -1000; 8 Create histogram to get a better plan id select type table type possible keys key key len ref rows filtered extra 1 SIMPLE customer ALL PRIMARY NULL NULL NULL 1500000 0.00 Using where 1 SIMPLE orders ref i_o_orderdate, i_o_custkey i_o_custkey 5 dbt3. customer. c_custkey 15 31.19 Using where
  • 9. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Program Agenda Motivating example Quick start guide How are histograms used? Query example Some advice 1 2 3 4 5 9
  • 10. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Histograms • Information about value distribution for a column • Data values group in buckets – Frequency calculated for each bucket – Maximum 1024 buckets • May use sampling to build histogram – Sample rate depends on available memory • Automatically chooses between two histogram types: – Singleton: One value per bucket – Equi-height: Multiple values per bucket 10 Column statistics
  • 11. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Singleton Histogram 0 0,05 0,1 0,15 0,2 0,25 0 1 2 3 5 6 7 8 9 10 Frequency • One value per bucket • Each bucket stores: – Value – Cumulative frequency • Well suited to estimate both equality and range predicates
  • 12. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Equi-Height Histogram 0 0,05 0,1 0,15 0,2 0,25 0,3 0,35 0 - 0 1 - 1 2 - 3 5 - 6 7 - 10 Frequency • Multiple values per bucket • Not quite equi-height – Values are not split across buckets ⇒Frequent values in separate buckets • Each bucket stores: – Minimum value – Maximum value – Cumulative frequency – Number of distinct values • Best suited for range predicates
  • 13. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Usage • Create or refresh histogram(s) for column(s): ANALYZE TABLE table UPDATE HISTOGRAM ON column [, column] WITH n BUCKETS; – Note: Will only update histogram, not other statistics • Drop histogram: ANALYZE TABLE table DROP HISTOGRAM ON column [, column]; • Based on entire table or sampling: – Depends on avail. memory: histogram_generation_max_mem_size (default: 20 MB) • New storage engine API for sampling – Default implementation: Full table scan even when sampling – Storage engines may implement more efficient sampling 13
  • 14. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Storage • Stored in a JSON column in data dictionary • Can be inspected in Information Schema table: SELECT JSON_PRETTY(histogram) FROM information_schema.column_statistics WHERE schema_name = 'dbt3_sf1' AND table_name ='lineitem' AND column_name = 'l_linenumber'; 14
  • 15. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Histogram content { "buckets": [[1, 0.24994938524948698], [2, 0.46421066400720523], [3, 0.6427401784471978], [4, 0.7855470933802572], [5, 0.8927398868395817], [6, 0.96423707532558], [7, 1] ], "data-type": "int", "null-values": 0.0, "collation-id": 8, "last-updated": "2018-02-03 21:05:21.690872", "sampling-rate": 0.20829115437457252, "histogram-type": "singleton", "number-of-buckets-specified": 1024 } 15
  • 16. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Strings • Max. 42 characters considered • Base64 encoded SELECT FROM_BASE64(SUBSTR(v, LOCATE(':', v, 10) + 1)) value, c cumulfreq FROM information_schema.column_statistics, JSON_TABLE(histogram->'$.buckets', '$[*]' COLUMNS(v VARCHAR(60) PATH '$[0]', c double PATH '$[1]')) hist WHERE column_name = 'o_orderstatus'; +-------+--------------------+ | value | cumulfreq | +-------+--------------------+ | F | 0.4862529264385756 | | O | 0.974029654577566 | | P | 0.9999999999999999 | +-------+--------------------+ 16
  • 17. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Calculate Bucket Frequency SELECT FROM_BASE64(SUBSTR(v, LOCATE(':', v, 10) + 1)) value, c cumulfreq, c - LAG(c, 1, 0) over () freq FROM information_schema.column_statistics, JSON_TABLE(histogram->'$.buckets', '$[*]' COLUMNS(v VARCHAR(60) PATH '$[0]', c double PATH '$[1]')) hist WHERE column_name = 'o_orderstatus'; +-------+--------------------+----------------------+ | value | cumulfreq | freq | +-------+--------------------+----------------------+ | F | 0.4862529264385756 | 0.4862529264385756 | | O | 0.974029654577566 | 0.48777672813899037 | | P | 0.9999999999999999 | 0.025970345422433927 | +-------+--------------------+----------------------+ Use window function 17
  • 18. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Program Agenda Motivating example Quick start guide How are histograms used? Query example Some advice 1 2 3 4 5 18
  • 19. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | • tx JOIN tx+1 • records(tx+1) = records(tx) * condition_filter_effect * records_per_key When are Histograms useful? Estimate cost of join tx tx+1 Ref access Number of records read from tx Conditionfilter effect Records passing the table conditions on tx Cardinality statistics for index
  • 20. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Filter estimate based on what is available: 1. Range estimate 2. Index statistics 3. Guesstimate = 0.1 <=,<,>,>= 1/3 BETWEEN 1/9 NOT <op> 1 – SEL(<op>) AND P(A and B) = P(A) * P(B) OR P(A or B) = P(A) + P(B) – P(A and B) … … How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco' AND employee.name = 'John' AND age > 21 AND hire_date BETWEEN '2014-01-01' AND '2014-06-01';
  • 21. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Filter estimate based on what is available: 1. Range estimate 2. Index statistics 3. Histograms 4. Guesstimate = 0.1 <=,<,>,>= 1/3 BETWEEN 1/9 NOT <op> 1 – SEL(<op>) AND P(A and B) = P(A) * P(B) OR P(A or B) = P(A) + P(B) – P(A and B) … … How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco' AND employee.name = 'John' AND age > 21 AND hire_date BETWEEN '2014-01-01' AND '2014-06-01';
  • 22. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco' AND employee.name = 'John' AND age > 21 AND hire_date BETWEEN '2014-01-01' AND '2014-06-01'; Calculating Condition Filter Effect for Tables Condition filter effect for tables: – office: 0.03 – employee: 0.29 * 0.1 * 0.33 ≈ 0.01 Example without histograms 0.1 (guesstimate) 0.33 (guesstimate) 0.29 (range) 0.03 (index)
  • 23. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco' AND employee.name = 'John' AND age > 21 AND hire_date BETWEEN '2014-01-01' AND '2014-06-01'; Calculating Condition Filter Effect for Tables Condition filter effect for tables: – office: 0.03 – employee: 0.29 * 0.1 * 0.95 ≈ 0.03 Example with histogram 0.1 (guesstimate) 0.95 (histogram) 0.29 (range) 0.03 (index)
  • 24. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Computing Selectivity From Histogram 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 0-7 8-16 17-24 25-31 32-38 39-46 47-53 54-61 62-70 71-104 Frequency age Cumulative Frequency Example age <= 21 0.203 Selectivity = 0.203 + 0.306 (0.306 – 0.203) * 5/8 = 0.267 age > 21 Selectivity = 1 - 0.267 = 0.733
  • 25. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Program Agenda Motivating example Quick start guide How are histograms used? Query example Some advice 1 2 3 4 5 25
  • 26. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | DBT-3 Query 7 SELECT supp_nation, cust_nation, l_year, SUM(volume) AS revenue FROM (SELECT n1.n_name AS supp_nation, n2.n_name AS cust_nation, EXTRACT(YEAR FROM l_shipdate) AS l_year, l_extendedprice * (1 - l_discount) AS volume FROM supplier, lineitem, orders, customer, nation n1, nation n2 WHERE s_suppkey = l_suppkey AND o_orderkey = l_orderkey AND c_custkey = o_custkey AND s_nationkey = n1.n_nationkey AND c_nationkey = n2.n_nationkey AND ((n1.n_name = 'RUSSIA' AND n2.n_name = 'FRANCE') OR (n1.n_name = 'FRANCE' AND n2.n_name = 'RUSSIA')) AND l_shipdate BETWEEN '1995-01-01' AND '1996-12-31') AS shipping GROUP BY supp_nation , cust_nation , l_year ORDER BY supp_nation , cust_nation , l_year; Volume Shipping Query
  • 27. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | DBT-3 Query 7 Query plan without histogram
  • 28. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | DBT-3 Query 7 Query plan with histogram
  • 29. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | DBT-3 Query 7 0,0 0,2 0,4 0,6 0,8 1,0 1,2 1,4 1,6 1,8 QueryExecutionTime(seconds) Without histogram With histogram Performance
  • 30. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Program Agenda Motivating example Quick start guide How is histograms used? Query example Some advice 1 2 3 4 5 30
  • 31. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Some advice • Histograms are useful for columns that are – not the first column of any index, and – used in WHERE conditions of • JOIN queries • Queries with IN-subqueries • ORDER BY ... LIMIT queries • Best fit – Low cardinality columns (e.g., gender, orderStatus, dayOfWeek, enums) – Columns with uneven distribution (skew) – Stable distribution (do not change much over time) Which columns to create histograms for?
  • 32. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Some more advice • When not to create histograms: – First column of an index – Never used in WHERE clause – Monotonically increasing column values (e.g. date columns) • Histogram will need frequent updates to be accurate • Consider to create index • How many buckets? – If possible, enough to get a singleton histogram – For equi-height, 100 buckets should be enough
  • 33. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | More information • MySQL Server Team blog – http://mysqlserverteam.com/ – https://mysqlserverteam.com/histogram-statistics-in-mysql/ (Erik Frøseth) • My blog: – http://oysteing.blogspot.com/ • MySQL forums: – Optimizer & Parser: http://forums.mysql.com/list.php?115 – Performance: http://forums.mysql.com/list.php?24
  • 34. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Safe Harbor Statement The preceding is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. 34
  • 35. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 35