Microsoft SQL Server
Filtered Indexes and Sparse Columns:
            Together,
            Together Separately
            Speaker: Don Vilen
          Chief S i i BuySight
          Chi f Scientist, B Si h




                     February 2011

           Mark Ginnebaugh, User Group Leader
                  www.bayareasql.org
15 Feb 2011




Filtered Indexes and
Sparse Columns:
     Together, Separately –

Don Vilen Chief Scientist Buysight
     Vilen,     Scientist,
DVilen@buysight.com
Agenda
 ◦   Filtered Indexes
 ◦   Filtered Statistics
 ◦   Wide Tables
 ◦   Sparse Columns
     S       C l

 ◦ T th …
   Together
 ◦ … and Separately

 ◦ Everything is SQL Server 2008 (and later), in
   all editions
The Scenario
 ◦ 100,000 rows in the table
    99 500 rows are hi
     99,500          historical, remaining 500 rows are current
                           i l       i i
    Indicated by NULL EndDate column or IsActive bit, etc.
 ◦ All queries on current data use index
 ◦ But why index all the historical 99.5% of the table?

 ◦ 1 000 columns in a table
   1,000
 ◦ BikeColor column is relevant only if ItemType is
   ‘Bicycle’
    For 0.5% of the rows; remainder are NULL
 ◦ But why index all the rows regardless of ItemType
   value?
Filtered Indexes
 ◦ Indexes only rows with values that match WHERE clause
    CREATE INDEX xyz ON table(columns, …)
                   y          (       , )
       WHERE EndDate IS NULL
       WHERE IsActive = 1
       WHERE ItemType = ‘Bicycle’
 ◦ Uses:
    Ranges of values for smaller portion of large table
       Avoid the common 80-90% of data where the index wouldn’t be helpful
    For categories of row data
       Index on Column120 and Column121 only useful when C1 = 37
    Table partitions, where index is needed only on the ‘current’ partition(s)
       Each partition will have the index structure, but only ‘current’ partitions will have any
        rows in the index
 ◦ Benefits
    Better query performance
    Reduction in storage costs
    Reduction in maintenance cost/time
Filtered Index – Allowed Syntax
◦ WHERE <filter_predicate>[from BOL: CREATE INDEX]
    <filter_predicate> ::= <conjunct> [ AND <conjunct> ]
    <conjunct> ::= <disjunct> | <comparison>
    <disjunct> ::= column_name IN (constant ,…)
    <comparison> ::= column_name <comparison_op> constant
  <comparison_op> ::= { IS | IS NOT | = | <> | != | > | >= | !> | < | <= | !< }



◦ No BETWEEN, no LIKE, no subquery, no variables

◦ So must be simple and deterministic
Filtered Indexes – Requirements
 ◦ Always some comparison involved, so must agree
   on how operations work, so requires standard
                     work
   SET options
    ON for ANSI_NULLS, ANSI_PADDING,
     ANSI_WARNINGS, ARITHABORT
     ANSI WARNINGS ARITHABORT,
     CONCAT_NULL_YIELDS_NULL,
     QUOTED_IDENTIFIER
    OFF for NUMERIC_ROUNDABORT
 ◦ Else:
    If not set when index is created, won’t create the index
    If not set when INSERT, UPDATE, DELETE, MERGE
     affects the data, gives error and rolls back
    If not set when the index might be used to optimize the
     query, it will not be considered
Filtered Indexes – Applicability
 ◦ Non-clustered indexes only (rather obviously )
 ◦ F UNIQUE i d
   For          indexes, only th i d d rows
                           l the indexed
   must have unique index values
    Duplicates in the non-indexed rows are not checked, but
     be careful that an update to a qualifying column doesn’t
                                                      doesn t
     cause a duplicate to occur
      CREATE UNIQUE INDEX ix1 ON xyz (c3)
         WHERE c2 = 10
    So now there is a way to create a unique index on
     column with multiple NULL values; create index WHERE
     ColY IS NOT NULL
 ◦ Fil
   Filtered i d
          d indexes d not apply to:
                    do       l
    XML indexes
    Full-text indexes
    Spatial indexes
Filtered Indexes – Getting Them Used 1
  ◦ QO can only use the index when it knows the index will
    match the conditions in the query’s WHERE clause
                                query s
  ◦ Assume Column120 and Column121 useful only when
    C1 = 37
     So CREATE INDEX i1 on dbo.t1 (Column120, Column121)
                               dbo t1 (Column120
        WHERE C1 = 37
     SELECT Column121
        FROM dbo.t1
        WHERE Column120 = 13
      Cannot use the index even if Column120 and Column121 only
      appear for C1 = 37
       As far as the QO knows, there may be other Column120 or Column121
        values that are not in the index
  ◦ Help the QO by adding more limiting predicates to
    WHERE clause
     Make it WHERE Column120 = 13 AND C1 = 37
Filtered Indexes – Getting Them Used 2
  ◦ WHERE with a variable rather than a literal
  ◦ Assume index is on WHERE IsActive > 0
      DECLARE @IsActive int; SET @IsActive = 1;
      SELECT xyz FROM table WHERE IsActive = @IsActive
  ◦ QO doesn’t know value of variable, so doesn’t
    know if index fits
    So shouldn’t use variables as if they were constants
  ◦ Again, help the QO by adding more limiting
    p
    predicates to WHERE clause
    Make it WHERE IsActive = @IsActive AND IsActive > 0

             But
             B t perhaps that d
                    h    th t doesn’t really make sense h
                                   ’t    ll    k        here
Filtered Indexes – Getting Them Used 3
    ◦ WHERE with a function or conversion on the filter
      predicate
      Obvious: WHERE ABS(C1) = 37
         Cannot use index on WHERE C1 = 37
         Could change it to WHERE C1 = ABS(37) if same meaning .. but not in
          this case
           hi
      Implicit conversions:
         Assume index is WHERE c3 > 100
         DECLARE @varR real; SET @varR = 1000.5;
                      @                @
         SELECT * FROM tv2 WHERE c3 = @varR
           Requires conversion of c3 to real before comparison, so can’t use
             index
         SELECT * FROM tv2 WHERE c3 = cast(@varR as int)
                                                   (@            )
           At least it requires no conversion of c3, but is unknown value at
             optimization time, so can’t use index
         So add a limiting predicate … assuming you know it will always be
          right
         SELECT * FROM tv2 WHERE c3 = cast(@varR as int) AND c3 > 100
A Mis-Application of Filtered Indexes
  Mis-
   ◦ Create a filtered index on c and b with
     WHERE on c

   ◦ Attempt to use the index as a validation table

   ◦ In code use the index in a hint and expect to
     get no row back for a b where c is a match,
     but
     b it gets an error instead due to hint
                               dd       h
     prevents a plan from being created
Filtered Indexes – And Views
 ◦ Cannot create a Filtered index on a view, not
   even a non-clustered index on an indexed view
   But a filtered index can be chosen by the QO for the
    query formed from a view .. or function
           f      df        i       f ti
Filtered Indexes – Considerations 1
 ◦ Storage size differences
    Fewer index rows take less space
    Less IO, more information fits in memory
    4,000 pages vs. 1 page
           p g         p g
 ◦ Limits auto-parameterization
    QO will not auto-parameterize if predicate is used in a
     filtered index (“in most cases”, per BOL)
                    ( in      cases
    Otherwise would inhibit use of filtered index
    So can affect plan reuse
 ◦ Index maintenance – same rebuild and reorganize
   as regular index
    But hopefully much less work to do
Filtered Indexes – Considerations 2
 ◦ Covering index
    Consider INCLUDEing other columns so more
     likely to be selected by QO
 ◦ DTA can suggest a filtered index
                     fil    di d
    ColX IS NOT NULL – only of this form
    But the missing indexes functionality does not flag
             missing-indexes
     them as missing
 ◦ When not to use:
    When non-filtered index already exists, or another
     access path is likely better or adequate
      Avoid the extra index maintenance
Filtered Statistics
  ◦ CREATE STATISTICS stats1 ON table (cols)
       WHERE <condition>
  ◦ Uses:
     Can create filtered statistics on skewed data to assist QO
     Filtered Statistics will likely be more precise because they cover only the
      data in the filtered subset (or filtered index)
     Table partitions, where statistics are needed only on ‘current’ partition(s)
  ◦ Cannot reference a computed column, a UDT column, a spatial
    data type column, or a hierarchyID data type column

  ◦ AutoCreateStats will create statistics on Filtered Index key
    columns
  ◦ AutoCreateStats will not create filtered statistics on other
    columns
     You have to create them yourself
  ◦ AutoUpdateStats will keep them updated once they are created
Metadata for Indexes, Statistics
 ◦ sys.indexes
    has_filter, filter_definition
 ◦ sys.stats
    has_filter, filter_definition


 ◦ SSMS
    Indexes and Statistics Properties have a Filter tab
Questions on Filtered Indexes,
Statistics
   Any q
      y questions?

   Now we’ll move on to Wide Tables
         we ll                Tables,
    Sparse Columns
Wide Tables
 ◦ Up to 30,000 Columns
   Great for Sharepoint-like “a row is an object, some
    attributes depend on other attributes”
 ◦ Some limits:
     Columns per non-wide table: 1,024
     Columns per wide table: 30,000
     Columns per SELECT statement: 4,096
     Columns per INSERT statement: 4,096
     Indexes per table: 1 000
                          1,000
     Statistics per table: 30,000
       BOL: Maximum Capacity Specifications for SQL Server
Wide Table
◦ A wide table has defined a column set, using sparse
  columns
  New row structure for sparse columns
    {column, value}, {column, value} …
  Can create flexible schemas within an application
  Can add or drop columns whenever you want without
   having to touch each row
◦ The maximum size of a wide table row is 8,018
                                            8 018
  bytes, so most of the data in a row has to be NULL
  Or has to be varchar-type columns so it can overflow to
   another page
◦ Limit is still 1,024 for number of non-sparse
  columns plus computed columns, even in a wide
  table
Wide Tables – Performance Impact
 ◦ Performance considerations:
    Increased run-time and compile-time memory
     requirements
    Wid t bl can h
     Wide tables       have up t 30,000 columns defined;
                               to 30 000 l      d fi d
     this can increase compile time
    There can be up to 1,000 indexes on a wide table,
                     p     ,                         ,
     which increases the index maintenance time
      Nonclustered indexes should be filtered indexes to
       minimize their impact

      For more information, see BOL: Performance Considerations
       for Wide Tables
Sparse Columns
◦ CREATE TABLE … (…, c1 int SPARSE NULL,
  …)
◦ New row format for sparse columns

◦ Column:
    Must be NULLable
    Cannot be part of a cluster index
    Cannot b part of a primary key index
     C      be       f             k   d
    Cannot have a DEFAULT
    Cannot be a computed column
Sparse Columns – Some More Cannots
  ◦ Some types cannot be sparse:
    geography   • ntext    • User-defined data types
    geometry    • text
    image       • timestamp


  ◦S
   Some attributes cannot be on sparse columns
             b            b              l
    No Filestream
    N t Id tit
     Not Identity
    Not RowGuidCol
Sparse Columns – Types and Size
 ◦ Size impact
   An important consideration but not the only one


 ◦ At what percentage of NULLs does a sparse
   column take less space than a non-sparse
   column?
          Non-Sparse
          N    S                Sparse
                                S                 Null Estimate
                                                  N ll E i
   BIT    1/8th byte          4 1/8th bytes      –> 98%
   BIGINT 8 bytes
              y                12 bytes
                                   y              –> 52%


     See BOL: Using Sparse Columns for a complete table of types
Column Sets
◦ How do you know which columns ‘exist’ for a row?
◦ You could just SELECT them; those that don t exist are NULL
                                         don’t
◦ Can define a “Column set”
   Optional, only one per table
◦ Include a column:
   MyColSet      XML      COLUMN_SET FOR ALL_SPARSE_COLUMNS
◦ Selecting from MyColSet returns an XML description of the sparse
  columns in that row
   <c25>ABC</c25><c34>599</c34>
◦ Can INSERT / UPDATE sparse columns by
   Referring to them by name as usual, or
   Specifying the XML for the Column_Set column

     See BOL: Using Column Sets for more details
Feature / Technology Support
 ◦ Sparse columns and column sets are not fully
   supported b some SQL Server technologies
            d by          S         h l i

 ◦ S arse Col mns not s
   Sparse Columns     supported b :
                          orted by:
   Merge Replication

 ◦ Column Sets not supported by:
   Replication, Distributed Query, Change Data
      p                          y      g
    Capture

     See BOL: Using Column Sets for more details
Meta Data for Sparse Columns
 ◦ sys.columns – is_sparse, is_column_set
   And in:
       sys.system_columns
       sys.all_columns
        sys all columns
       sys.computed_columns
       sys.identity_columns


 ◦ Do not confuse with sparse files as used for
   Database Snapshots
   The is_sparse in sys.database_files, sys.master_files
Together
 ◦ Sparse Columns together with Filtered Index
 ◦ On Sparse column, filtered index with
           xx IS NOT NULL
   avoids indexing all the rows with no value

 ◦ Makes a lot of sense, and likely the driving
   force behind filtered indexes
 ◦ B not needed on every sparse column
   But         d d                      l
Separately
 ◦ Filtered Index without Sparse Column
   Filtered indexes on skewed data
   Filtered statistics on skewed data


 ◦ Sparse Column without Filtered Index
   Sparse columns on sparse data, perhaps no index to
    go with it
Summary
 ◦   Filtered Indexes
 ◦   Filtered Statistics
 ◦   Wide Tables
 ◦   Sparse Columns

 ◦ Together …
 ◦ … and Separately

 ◦ Don Vilen
      Chief Scientist, Buysight
      DVilen@buysight com
       DVilen@buysight.com
To learn more or inquire about speaking opportunities, please
                   q            p     g pp           ,p
                           contact:

Mark Ginnebaugh, User Group Leader mark@designmind.com

More Related Content

ODP
Oracle SQL Advanced
PDF
Database Management System 1
PPT
Advanced Sql Training
PDF
Mysql Optimization
PDF
Introduction To Oracle Sql
PDF
Introduction to oracle functions
ODP
SQL Tunning
Oracle SQL Advanced
Database Management System 1
Advanced Sql Training
Mysql Optimization
Introduction To Oracle Sql
Introduction to oracle functions
SQL Tunning

What's hot (20)

PPTX
Table views
PPTX
Sql modifying data - MYSQL part I
PPT
Sql server select queries ppt 18
PPT
SQL select statement and functions
PPT
Myth busters - performance tuning 101 2007
PPT
EAV in Magento
PPTX
1. dml select statement reterive data
PDF
Olapsql
PPTX
MYSQL single rowfunc-multirowfunc-groupby-having
PPTX
Oraclesql
PDF
Nested Queries Lecture
PPTX
MySQL Indexes
PPT
Excel_useful_tips
DOC
Sql functions
PPTX
MYSql manage db
PPTX
MYSQL join
PPTX
MYSQL using set operators
PDF
[Www.pkbulk.blogspot.com]dbms10
PPTX
Arrays in java
PDF
OBIEE 12c Advanced Analytic Functions
Table views
Sql modifying data - MYSQL part I
Sql server select queries ppt 18
SQL select statement and functions
Myth busters - performance tuning 101 2007
EAV in Magento
1. dml select statement reterive data
Olapsql
MYSQL single rowfunc-multirowfunc-groupby-having
Oraclesql
Nested Queries Lecture
MySQL Indexes
Excel_useful_tips
Sql functions
MYSql manage db
MYSQL join
MYSQL using set operators
[Www.pkbulk.blogspot.com]dbms10
Arrays in java
OBIEE 12c Advanced Analytic Functions
Ad

Similar to Microsoft SQL Server Filtered Indexes & Sparse Columns Feb 2011 (20)

PPT
Filtered Indexes In Sql 2008
PPTX
SQL Server 2012 Best Practices
PPTX
Index the obvious and not so obvious
PDF
Statistics and Indexes Internals
PPT
Indexing Strategies
ODP
San diegophp
PDF
SQLDay2013_Denny Cherry - Table indexing for the .NET Developer
PPTX
SQL Explore 2012: P&T Part 2
PPTX
Query Optimization in SQL Server
PPTX
Database Performance Tuning
PPT
Indexing
PDF
Mysql query optimization
PPTX
Indexing the MySQL Index: Key to performance tuning
PPTX
Sql query performance analysis
PDF
"Using Indexes in SQL Server 2008" by Alexander Korotkiy, part 1
PDF
MySQL Indexing Crash Course
PDF
SQL Joins and Query Optimization
PPTX
Sql performance tuning
PPTX
Sql query performance analysis
Filtered Indexes In Sql 2008
SQL Server 2012 Best Practices
Index the obvious and not so obvious
Statistics and Indexes Internals
Indexing Strategies
San diegophp
SQLDay2013_Denny Cherry - Table indexing for the .NET Developer
SQL Explore 2012: P&T Part 2
Query Optimization in SQL Server
Database Performance Tuning
Indexing
Mysql query optimization
Indexing the MySQL Index: Key to performance tuning
Sql query performance analysis
"Using Indexes in SQL Server 2008" by Alexander Korotkiy, part 1
MySQL Indexing Crash Course
SQL Joins and Query Optimization
Sql performance tuning
Sql query performance analysis
Ad

More from Mark Ginnebaugh (20)

PDF
Automating Microsoft Power BI Creations 2015
PDF
Microsoft SQL Server Analysis Services (SSAS) - A Practical Introduction
PDF
Platfora - An Analytics Sandbox In A World Of Big Data
PDF
Microsoft SQL Server Relational Databases and Primary Keys
PDF
DesignMind Microsoft Business Intelligence SQL Server
PDF
San Francisco Bay Area SQL Server July 2013 meetings
PDF
Silicon Valley SQL Server User Group June 2013
PDF
Microsoft SQL Server Continuous Integration
PDF
Hortonworks Big Data & Hadoop
PDF
Microsoft SQL Server Physical Join Operators
PDF
Microsoft Data Warehouse Business Intelligence Lifecycle - The Kimball Approach
PDF
Fusion-io Memory Flash for Microsoft SQL Server 2012
PDF
Microsoft Data Mining 2012
PDF
Microsoft SQL Server PASS News August 2012
PDF
Business Intelligence Dashboard Design Best Practices
PDF
Microsoft Mobile Business Intelligence
PDF
Microsoft SQL Server 2012 Cloud Ready
PDF
Microsoft SQL Server 2012 Master Data Services
PDF
Microsoft SQL Server PowerPivot
PDF
Microsoft SQL Server Testing Frameworks
Automating Microsoft Power BI Creations 2015
Microsoft SQL Server Analysis Services (SSAS) - A Practical Introduction
Platfora - An Analytics Sandbox In A World Of Big Data
Microsoft SQL Server Relational Databases and Primary Keys
DesignMind Microsoft Business Intelligence SQL Server
San Francisco Bay Area SQL Server July 2013 meetings
Silicon Valley SQL Server User Group June 2013
Microsoft SQL Server Continuous Integration
Hortonworks Big Data & Hadoop
Microsoft SQL Server Physical Join Operators
Microsoft Data Warehouse Business Intelligence Lifecycle - The Kimball Approach
Fusion-io Memory Flash for Microsoft SQL Server 2012
Microsoft Data Mining 2012
Microsoft SQL Server PASS News August 2012
Business Intelligence Dashboard Design Best Practices
Microsoft Mobile Business Intelligence
Microsoft SQL Server 2012 Cloud Ready
Microsoft SQL Server 2012 Master Data Services
Microsoft SQL Server PowerPivot
Microsoft SQL Server Testing Frameworks

Recently uploaded (20)

PPTX
IndustrialAIGuerillaInnovatorsARCPodcastEp3.pptx
PPTX
Understanding Procurement Strategies.pptx Your score increases as you pick a ...
PDF
The Influence of Historical Figures on Legal Communication (www.kiu.ac.ug)
PPTX
Chapter 2 strategic Presentation (6).pptx
PDF
Clouds that Assimilate the Build Parts I&II .pdf
PPTX
Week2: Market and Marketing Aspect of Feasibility Study.pptx
PDF
Communication Tactics in Legal Contexts: Historical Case Studies (www.kiu.ac...
PPTX
Accounting Management SystemBatch-4.pptx
PDF
France's Top 5 Promising EdTech Companies to Watch in 2025.pdf
PDF
Handouts for Housekeeping.pdfhsjsnvvbdjsnwb
PDF
From Legacy to Velocity: how we rebuilt everything in 8 months.
PDF
Comments on Clouds that Assimilate Parts I&II.pdf
PDF
QT INTRODUCTION chapters that help to study
PPTX
003 seven PARTS OF SPEECH english subject.pptx
DOCX
ola and uber project work (Recovered).docx
DOCX
Center Enamel Can Provide Pressure Vessels for Maldives Chemical Industry.docx
PDF
How to run a consulting project from scratch
PDF
The Future of Marketing: AI, Funnels & MBA Careers | My Annual IIM Lucknow Talk
PDF
Second Hand Fashion Call to Action March 2025
PDF
757557697-CERTIKIT-ISO22301-Implementation-Guide-v6.pdf
IndustrialAIGuerillaInnovatorsARCPodcastEp3.pptx
Understanding Procurement Strategies.pptx Your score increases as you pick a ...
The Influence of Historical Figures on Legal Communication (www.kiu.ac.ug)
Chapter 2 strategic Presentation (6).pptx
Clouds that Assimilate the Build Parts I&II .pdf
Week2: Market and Marketing Aspect of Feasibility Study.pptx
Communication Tactics in Legal Contexts: Historical Case Studies (www.kiu.ac...
Accounting Management SystemBatch-4.pptx
France's Top 5 Promising EdTech Companies to Watch in 2025.pdf
Handouts for Housekeeping.pdfhsjsnvvbdjsnwb
From Legacy to Velocity: how we rebuilt everything in 8 months.
Comments on Clouds that Assimilate Parts I&II.pdf
QT INTRODUCTION chapters that help to study
003 seven PARTS OF SPEECH english subject.pptx
ola and uber project work (Recovered).docx
Center Enamel Can Provide Pressure Vessels for Maldives Chemical Industry.docx
How to run a consulting project from scratch
The Future of Marketing: AI, Funnels & MBA Careers | My Annual IIM Lucknow Talk
Second Hand Fashion Call to Action March 2025
757557697-CERTIKIT-ISO22301-Implementation-Guide-v6.pdf

Microsoft SQL Server Filtered Indexes & Sparse Columns Feb 2011

  • 1. Microsoft SQL Server Filtered Indexes and Sparse Columns: Together, Together Separately Speaker: Don Vilen Chief S i i BuySight Chi f Scientist, B Si h February 2011 Mark Ginnebaugh, User Group Leader www.bayareasql.org
  • 2. 15 Feb 2011 Filtered Indexes and Sparse Columns: Together, Separately – Don Vilen Chief Scientist Buysight Vilen, Scientist, [email protected]
  • 3. Agenda ◦ Filtered Indexes ◦ Filtered Statistics ◦ Wide Tables ◦ Sparse Columns S C l ◦ T th … Together ◦ … and Separately ◦ Everything is SQL Server 2008 (and later), in all editions
  • 4. The Scenario ◦ 100,000 rows in the table  99 500 rows are hi 99,500 historical, remaining 500 rows are current i l i i  Indicated by NULL EndDate column or IsActive bit, etc. ◦ All queries on current data use index ◦ But why index all the historical 99.5% of the table? ◦ 1 000 columns in a table 1,000 ◦ BikeColor column is relevant only if ItemType is ‘Bicycle’  For 0.5% of the rows; remainder are NULL ◦ But why index all the rows regardless of ItemType value?
  • 5. Filtered Indexes ◦ Indexes only rows with values that match WHERE clause  CREATE INDEX xyz ON table(columns, …) y ( , )  WHERE EndDate IS NULL  WHERE IsActive = 1  WHERE ItemType = ‘Bicycle’ ◦ Uses:  Ranges of values for smaller portion of large table  Avoid the common 80-90% of data where the index wouldn’t be helpful  For categories of row data  Index on Column120 and Column121 only useful when C1 = 37  Table partitions, where index is needed only on the ‘current’ partition(s)  Each partition will have the index structure, but only ‘current’ partitions will have any rows in the index ◦ Benefits  Better query performance  Reduction in storage costs  Reduction in maintenance cost/time
  • 6. Filtered Index – Allowed Syntax ◦ WHERE <filter_predicate>[from BOL: CREATE INDEX]  <filter_predicate> ::= <conjunct> [ AND <conjunct> ]  <conjunct> ::= <disjunct> | <comparison>  <disjunct> ::= column_name IN (constant ,…)  <comparison> ::= column_name <comparison_op> constant  <comparison_op> ::= { IS | IS NOT | = | <> | != | > | >= | !> | < | <= | !< } ◦ No BETWEEN, no LIKE, no subquery, no variables ◦ So must be simple and deterministic
  • 7. Filtered Indexes – Requirements ◦ Always some comparison involved, so must agree on how operations work, so requires standard work SET options  ON for ANSI_NULLS, ANSI_PADDING, ANSI_WARNINGS, ARITHABORT ANSI WARNINGS ARITHABORT, CONCAT_NULL_YIELDS_NULL, QUOTED_IDENTIFIER  OFF for NUMERIC_ROUNDABORT ◦ Else:  If not set when index is created, won’t create the index  If not set when INSERT, UPDATE, DELETE, MERGE affects the data, gives error and rolls back  If not set when the index might be used to optimize the query, it will not be considered
  • 8. Filtered Indexes – Applicability ◦ Non-clustered indexes only (rather obviously ) ◦ F UNIQUE i d For indexes, only th i d d rows l the indexed must have unique index values  Duplicates in the non-indexed rows are not checked, but be careful that an update to a qualifying column doesn’t doesn t cause a duplicate to occur  CREATE UNIQUE INDEX ix1 ON xyz (c3) WHERE c2 = 10  So now there is a way to create a unique index on column with multiple NULL values; create index WHERE ColY IS NOT NULL ◦ Fil Filtered i d d indexes d not apply to: do l  XML indexes  Full-text indexes  Spatial indexes
  • 9. Filtered Indexes – Getting Them Used 1 ◦ QO can only use the index when it knows the index will match the conditions in the query’s WHERE clause query s ◦ Assume Column120 and Column121 useful only when C1 = 37  So CREATE INDEX i1 on dbo.t1 (Column120, Column121) dbo t1 (Column120 WHERE C1 = 37  SELECT Column121 FROM dbo.t1 WHERE Column120 = 13 Cannot use the index even if Column120 and Column121 only appear for C1 = 37  As far as the QO knows, there may be other Column120 or Column121 values that are not in the index ◦ Help the QO by adding more limiting predicates to WHERE clause  Make it WHERE Column120 = 13 AND C1 = 37
  • 10. Filtered Indexes – Getting Them Used 2 ◦ WHERE with a variable rather than a literal ◦ Assume index is on WHERE IsActive > 0  DECLARE @IsActive int; SET @IsActive = 1;  SELECT xyz FROM table WHERE IsActive = @IsActive ◦ QO doesn’t know value of variable, so doesn’t know if index fits  So shouldn’t use variables as if they were constants ◦ Again, help the QO by adding more limiting p predicates to WHERE clause  Make it WHERE IsActive = @IsActive AND IsActive > 0 But B t perhaps that d h th t doesn’t really make sense h ’t ll k here
  • 11. Filtered Indexes – Getting Them Used 3 ◦ WHERE with a function or conversion on the filter predicate  Obvious: WHERE ABS(C1) = 37  Cannot use index on WHERE C1 = 37  Could change it to WHERE C1 = ABS(37) if same meaning .. but not in this case hi  Implicit conversions:  Assume index is WHERE c3 > 100  DECLARE @varR real; SET @varR = 1000.5; @ @  SELECT * FROM tv2 WHERE c3 = @varR  Requires conversion of c3 to real before comparison, so can’t use index  SELECT * FROM tv2 WHERE c3 = cast(@varR as int) (@ )  At least it requires no conversion of c3, but is unknown value at optimization time, so can’t use index  So add a limiting predicate … assuming you know it will always be right  SELECT * FROM tv2 WHERE c3 = cast(@varR as int) AND c3 > 100
  • 12. A Mis-Application of Filtered Indexes Mis- ◦ Create a filtered index on c and b with WHERE on c ◦ Attempt to use the index as a validation table ◦ In code use the index in a hint and expect to get no row back for a b where c is a match, but b it gets an error instead due to hint dd h prevents a plan from being created
  • 13. Filtered Indexes – And Views ◦ Cannot create a Filtered index on a view, not even a non-clustered index on an indexed view  But a filtered index can be chosen by the QO for the query formed from a view .. or function f df i f ti
  • 14. Filtered Indexes – Considerations 1 ◦ Storage size differences  Fewer index rows take less space  Less IO, more information fits in memory  4,000 pages vs. 1 page p g p g ◦ Limits auto-parameterization  QO will not auto-parameterize if predicate is used in a filtered index (“in most cases”, per BOL) ( in cases  Otherwise would inhibit use of filtered index  So can affect plan reuse ◦ Index maintenance – same rebuild and reorganize as regular index  But hopefully much less work to do
  • 15. Filtered Indexes – Considerations 2 ◦ Covering index  Consider INCLUDEing other columns so more likely to be selected by QO ◦ DTA can suggest a filtered index fil di d  ColX IS NOT NULL – only of this form  But the missing indexes functionality does not flag missing-indexes them as missing ◦ When not to use:  When non-filtered index already exists, or another access path is likely better or adequate  Avoid the extra index maintenance
  • 16. Filtered Statistics ◦ CREATE STATISTICS stats1 ON table (cols) WHERE <condition> ◦ Uses:  Can create filtered statistics on skewed data to assist QO  Filtered Statistics will likely be more precise because they cover only the data in the filtered subset (or filtered index)  Table partitions, where statistics are needed only on ‘current’ partition(s) ◦ Cannot reference a computed column, a UDT column, a spatial data type column, or a hierarchyID data type column ◦ AutoCreateStats will create statistics on Filtered Index key columns ◦ AutoCreateStats will not create filtered statistics on other columns  You have to create them yourself ◦ AutoUpdateStats will keep them updated once they are created
  • 17. Metadata for Indexes, Statistics ◦ sys.indexes  has_filter, filter_definition ◦ sys.stats  has_filter, filter_definition ◦ SSMS  Indexes and Statistics Properties have a Filter tab
  • 18. Questions on Filtered Indexes, Statistics  Any q y questions?  Now we’ll move on to Wide Tables we ll Tables, Sparse Columns
  • 19. Wide Tables ◦ Up to 30,000 Columns  Great for Sharepoint-like “a row is an object, some attributes depend on other attributes” ◦ Some limits:  Columns per non-wide table: 1,024  Columns per wide table: 30,000  Columns per SELECT statement: 4,096  Columns per INSERT statement: 4,096  Indexes per table: 1 000 1,000  Statistics per table: 30,000  BOL: Maximum Capacity Specifications for SQL Server
  • 20. Wide Table ◦ A wide table has defined a column set, using sparse columns  New row structure for sparse columns  {column, value}, {column, value} …  Can create flexible schemas within an application  Can add or drop columns whenever you want without having to touch each row ◦ The maximum size of a wide table row is 8,018 8 018 bytes, so most of the data in a row has to be NULL  Or has to be varchar-type columns so it can overflow to another page ◦ Limit is still 1,024 for number of non-sparse columns plus computed columns, even in a wide table
  • 21. Wide Tables – Performance Impact ◦ Performance considerations:  Increased run-time and compile-time memory requirements  Wid t bl can h Wide tables have up t 30,000 columns defined; to 30 000 l d fi d this can increase compile time  There can be up to 1,000 indexes on a wide table, p , , which increases the index maintenance time  Nonclustered indexes should be filtered indexes to minimize their impact  For more information, see BOL: Performance Considerations for Wide Tables
  • 22. Sparse Columns ◦ CREATE TABLE … (…, c1 int SPARSE NULL, …) ◦ New row format for sparse columns ◦ Column:  Must be NULLable  Cannot be part of a cluster index  Cannot b part of a primary key index C be f k d  Cannot have a DEFAULT  Cannot be a computed column
  • 23. Sparse Columns – Some More Cannots ◦ Some types cannot be sparse:  geography • ntext • User-defined data types  geometry • text  image • timestamp ◦S Some attributes cannot be on sparse columns b b l  No Filestream  N t Id tit Not Identity  Not RowGuidCol
  • 24. Sparse Columns – Types and Size ◦ Size impact  An important consideration but not the only one ◦ At what percentage of NULLs does a sparse column take less space than a non-sparse column? Non-Sparse N S Sparse S Null Estimate N ll E i  BIT 1/8th byte 4 1/8th bytes –> 98%  BIGINT 8 bytes y 12 bytes y –> 52%  See BOL: Using Sparse Columns for a complete table of types
  • 25. Column Sets ◦ How do you know which columns ‘exist’ for a row? ◦ You could just SELECT them; those that don t exist are NULL don’t ◦ Can define a “Column set”  Optional, only one per table ◦ Include a column:  MyColSet XML COLUMN_SET FOR ALL_SPARSE_COLUMNS ◦ Selecting from MyColSet returns an XML description of the sparse columns in that row  <c25>ABC</c25><c34>599</c34> ◦ Can INSERT / UPDATE sparse columns by  Referring to them by name as usual, or  Specifying the XML for the Column_Set column  See BOL: Using Column Sets for more details
  • 26. Feature / Technology Support ◦ Sparse columns and column sets are not fully supported b some SQL Server technologies d by S h l i ◦ S arse Col mns not s Sparse Columns supported b : orted by:  Merge Replication ◦ Column Sets not supported by:  Replication, Distributed Query, Change Data p y g Capture  See BOL: Using Column Sets for more details
  • 27. Meta Data for Sparse Columns ◦ sys.columns – is_sparse, is_column_set  And in:  sys.system_columns  sys.all_columns sys all columns  sys.computed_columns  sys.identity_columns ◦ Do not confuse with sparse files as used for Database Snapshots  The is_sparse in sys.database_files, sys.master_files
  • 28. Together ◦ Sparse Columns together with Filtered Index ◦ On Sparse column, filtered index with xx IS NOT NULL avoids indexing all the rows with no value ◦ Makes a lot of sense, and likely the driving force behind filtered indexes ◦ B not needed on every sparse column But d d l
  • 29. Separately ◦ Filtered Index without Sparse Column  Filtered indexes on skewed data  Filtered statistics on skewed data ◦ Sparse Column without Filtered Index  Sparse columns on sparse data, perhaps no index to go with it
  • 30. Summary ◦ Filtered Indexes ◦ Filtered Statistics ◦ Wide Tables ◦ Sparse Columns ◦ Together … ◦ … and Separately ◦ Don Vilen  Chief Scientist, Buysight  DVilen@buysight com [email protected]
  • 31. To learn more or inquire about speaking opportunities, please q p g pp ,p contact: Mark Ginnebaugh, User Group Leader [email protected]