Total Pageviews

Saturday, April 20, 2013

Missing Index DMVs – misleading suggestions

Creating missing indexes is one of DBA’s core responsibilities in the production environment and makes the query execution much faster. A lot of enhancement has been done and new features have been added starting from SQL 2005 that make our day to day task more efficient and easier.

1.       Execution plan with tooltips and details properties.
2.       DMVs for missing indexes - index operational status and usages.

Where do you want to go from here?
While DMVs are great to begin with, there is a glitch with the output which we need to be aware of. When missing index information was added to the catalog views, SQL Server Query Optimizer does not consider existing indexes for the underlying table at all as suggests. It adds column name when a query is compiled or recompiled.

So if we consider these recommendations blindly, then we will end up creating a large number of indexes which is certainly not useful. Rather than improving the performance it will degrade the overall performance.


Missing Columns DMO:
Dynamic management object
Information returned
Returns summary information about missing index groups.
Returns information about a specific group of missing indexes
Returns detailed information about a missing index
Returns information about the database table columns that are missing an index.


Things to keep in mind about missing indexes DMV:
The missing indexes feature is a lightweight tool for finding missing indexes that might significantly improve query performance. It does not provide adequate information to fine tune indexing configuration. Below are some limitations:

1.       By default, the missing indexes feature is turned on.
2.       No controls are provided to turn the feature on or off, or to reset any of the tables returned when the dynamic management objects are queried.
3.       When SQL Server is restarted, all of the missing index information is dropped.
4.       This feature can only be disabled if an instance of SQL Server is started by using the -x argument with the “sqlservr.exe” command-prompt utility.
5.       Missing index DMVs are updated when a query is optimized by the query optimizer.
6.       Missing index information from “Estimate Execution Plan” will differ from “Actual Execution Plan”.
7.       It does not provide partitioning.
8.       It suggests to include a clustered index key with the INCLUDE option.
9.       It is not intended to fine tune an indexing configuration.
10.   It cannot gather statistics for more than 500 missing index groups.
11.   It does not specify an order for columns to be used in an index.
12.   For queries involving only inequality predicates, it returns less accurate cost information.
13.   It reports only include columns for some queries, so index key columns must be manually selected.
14.   It returns only raw information about columns on which indexes might be missing.
15.   It does not suggest filtered indexes.
16.   It can return different costs for the same missing index group that appears multiple times in XML Showplans.
17.   It does not consider trivial query plans.
18.   If a transaction creates or drops a table, the rows containing missing index information about the dropped objects are removed from the missing index management object.
19.   When the metadata for a table changes, all missing index information about that table is deleted from these dynamic management views.

Good practices:
Consider the following when using missing index DMVs to create or adjust indexes:

1.       Try to consolidate all the recommendations into a fewer number of indexes.
2.       Never include a clustered key in the index key columns or with the INCLUDE column.
3.       If an existing index needs to be adjusted, always use DROP_EXISING=ON with the CREATE INDEX statement.
4.       unique_compile” in “sys.dm_db_missing_index_group_stats” will tell us how many times a query has been complied or recompiled because of this missing index group.
5.       Review related queries and query execution plans along with missing index suggestions.

General Guidelines while using missing DMVs:
To convert the information returned by “sys.dm_db_missing_index_details” into a CREATE INDEX statement, consider the following the guidelines:

1.       Equality columns should be put before the inequality columns, and together they should make the key of the index. For example

EQUALITY = col1, col5
INEQUALITY = col3, col7
Index key will be: col1 + col5 + col3 + col7

2.       To determine an effective order for the equality columns, order them based on their selectivity: put the most selective columns (most unique value column) first (leftmost in the column list). For example, say col1, col2 and col3 has following distinct records

DISTINCT col1 = 100
DISTINCT col2 = 10
DISTINCT col3 = 60

Index Key will be: Col1 + Col3 + Col2

3.       Included columns should be added to the CREATE INDEX statement using the INCLUDE clause minus the clustered key. For example,

xID = Clustered key
nCustID = Key column
sName1= Non key column
dDOB= Non key column

Good: CREATE INDEX IX_nCustID ON Table1 (nCustID) INCLUDE (sName1, dDOB)
Bad: CREATE INDEX IX_nCustID ON Table1 (nCustID) INCLUDE (xID, sName1, dDOB)
Ugly: CREATE INDEX IX_nCustID ON Table1 (nCustID, xID) INCLUDE (sName1, dDOB)

4.       Column orders in INCLUDE option has no effect, for example INCLUDE (ColA, ColB, ColC) is the same as INCLUDE (ColB, ColA, ColC).

A practical way to create missing indexes:
In this exercise, we will be reviewing how to consolidate all recommendations about missing indexes and how to create the most beneficial one. To perform this test, Server 2012 is used however any SQL Server edition can be used.

DMV Query to find missing indexes and relevant information:

SELECT sys.schemas.schema_id
     , sys.schemas.name AS schema_name
     , sys.objects.object_id
     , sys.objects.name AS object_name
     , sys.objects.type
     , partitions.rows_count
     , partitions.size_mb
     , migs.unique_compiles
     , mid.equality_columns
     , mid.inequality_columns
     , mid.included_columns
     , migs.user_seeks
     , migs.user_scans
     , migs.avg_total_user_cost
     , migs.avg_user_impact
     , migs.last_user_seek
     , migs.last_user_scan
     , migs.system_seeks
     , migs.system_scans
     , migs.avg_total_system_cost
     , migs.avg_system_impact
     , migs.last_system_seek
     , migs.last_system_scan
     , (convert(NUMERIC(19, 2), migs.user_seeks) + convert(NUMERIC(19, 2), migs.unique_compiles))
       * convert(NUMERIC(19, 2), migs.avg_total_user_cost)
       * convert(NUMERIC(19, 2), migs.avg_user_impact / 100.0) AS score
FROM
  sys.objects
  JOIN (SELECT object_id
             , sum(CASE
                 WHEN index_id BETWEEN 0 AND 1 THEN
                   row_count
                 ELSE
                   0
               END) AS rows_count
             , convert(NUMERIC(19, 2), convert(NUMERIC(19, 3), sum(in_row_reserved_page_count + lob_reserved_page_count
               + row_overflow_reserved_page_count)) / convert(NUMERIC(19, 2), 128)) AS size_mb
        FROM
          sys.dm_db_partition_stats
        WHERE
          sys.dm_db_partition_stats.index_id BETWEEN 0 AND 1
        GROUP BY
          object_id
       ) AS partitions
    ON sys.objects.object_id = partitions.object_id
  JOIN sys.schemas
    ON sys.objects.schema_id = sys.schemas.schema_id
  JOIN sys.dm_db_missing_index_details mid
    ON sys.objects.object_id = mid.object_id
  JOIN sys.dm_db_missing_index_groups mig
    ON mid.index_handle = mig.index_handle
  JOIN sys.dm_db_missing_index_group_stats migs
    ON mig.index_group_handle = migs.group_handle
WHERE
  mid.database_id = db_id()

Let’s do a simple test:

1.       Create the database and table:

CREATE DATABASE TestDB
GO
USE [TestDB]
GO

CREATE TABLE [tblLarge](
       [xID] [int] IDENTITY(1,1) NOT NULL,
       [sName1] [varchar](10) NULL,
       [sName2] [varchar](13) NULL,
       [sName3] [varchar](36) NULL,
       [nCustID] INT NULL,
       [sIdentifier] [char](2) NULL,
       [dDOB] [date] NULL,
       [nWage] [numeric](12, 2) NULL,
       [sLicense] [char](7) NULL,
       [bGender] [bit] NULL
) ON [PRIMARY]
GO

2.       Run the following INSERT statement in a loop to insert 1,300,000 records.

SET NOCOUNT ON
INSERT  INTO tblLarge
        ( sName1 ,
          sName2 ,
          sName3 ,
          nCustID ,
          sIdentifier ,
          dDOB ,
          nWage ,
          sLicense ,
          bGender
                   
        )
VALUES  ( LEFT(CAST(NEWID() AS VARCHAR(36)), RAND() * 10) ,
          LEFT(CAST(NEWID() AS VARCHAR(36)), RAND() * 11) ,
          CAST(NEWID() AS VARCHAR(36)) ,
          FLOOR(RAND() * 10000) ,
          LEFT(CAST(NEWID() AS VARCHAR(36)), 2) ,
          DATEADD(dd, -RAND() * 20000, GETDATE()) ,
          ( RAND() * 1000 ) ,
          SUBSTRING(CAST(NEWID() AS VARCHAR(36)), 6, 7) ,
          COALESCE(ISNUMERIC(LEFT(CAST(NEWID() AS VARCHAR(36)), 1)), 0)
        )
      
GO 1300000

3.       Create a clustered index on the xID Column.

ALTER TABLE tblLarge ADD CONSTRAINT PK_tblLarge
PRIMARY KEY CLUSTERED (xID) WITH (FILLFACTOR=90)

4.       Execute the following SELECT statements several times to get missing index suggestions:

-- First query
SELECT  nCustID ,
        sName1 ,
        sName2 ,
        dDOB
FROM    tblLarge
WHERE   dDOB BETWEEN '2010-05-29' AND '2010-10-25'
        AND nCustID BETWEEN 10 AND 1000
ORDER BY nCustID

-- Query Two
SELECT  nCustID ,
        sName1 ,
        sIdentifier
FROM    tblLarge
WHERE   sIdentifier = 'AA'
        AND dDOB BETWEEN '2000-01-01' AND '2010-10-10'
ORDER BY nCustID

-- Query Three
SELECT  nCustID ,
        COUNT(nCustID) nCount
FROM    tblLarge
WHERE   nCustID BETWEEN 50 AND 150
        AND dDOB BETWEEN '2005-01-01' AND '2010-10-10'
GROUP BY nCustID

-- Query Four
SELECT  nCustID ,
        sName1 ,
        sName2 ,
        sIdentifier
FROM    tblLarge
WHERE   nCustID >= 100
        AND nCustID <= 500
        AND sIdentifier LIKE 'B%'

5.        After executing all the given queries, we can get the following missing index recommendations if we run the DMV query listed above.

6.       Index creation: Based on the missing index recommendations, we may create the following two indexes which will help all queries.

CREATE NONCLUSTERED INDEX IX_nCustID
ON [dbo].[tblLarge] ([nCustID],[dDOB])
INCLUDE ([sName1],[sName2])
GO

CREATE NONCLUSTERED INDEX IX_sIdentifier
ON [dbo].[tblLarge] ([sIdentifier], [dDOB])
INCLUDE ([sName1],[sName2], [nCustID])
GO

7.       Our second query needs a little optimization. When a key column is index, then the Explicit ORDER BY on that column is not necessary. If we remove the “ORDER BY nCustID” the Sort operator in the execution plan will disappear.


Figure # 1A: Query Execution Plan with missing Indexes

Figure # 1B: Missing Index suggestions
Figure #2: Query Execution Plan after Missing Index creation

Summary:
Missing index DMVs are a great tool to use to understand the index requirements if it is being used cautiously. There are tons of robotic index creation scripts that are available to create missing indexes by utilizing DMVs and none of them consider the limitations of those DMV’s.  Mainly, the beginners get trapped and end up creating a number of inefficient indexes.

If we review the Query, Execution Plan and missing index DMVs while creating indexes, then this combination will be a great mechanism to create the most efficient indexes. However, solely using the missing DMV’s will not evaluate the situation effectively nor correctly.

4 comments:

  1. DISTINCT col1 = 100
    DISTINCT col2 = 10
    DISTINCT col3 = 60

    Index Key will be: Col2 + Col3 + Col1

    Shouldn't the order be col1, col3, col2 (most to least specific)?

    ReplyDelete
  2. Yes, it should be
    Col1 col3 col2.
    .
    My mistake, I'll correct it. Thanks!

    ReplyDelete
  3. Great article.

    For the sake of completeness, additional concerns:
    1. Missing indexes DMVs do not specify which statements triggered / will be affected by the missing index.
    Thus, hard to review their business importance.

    2. The total amount of seeks and scans by these statements (reported in sys.dm_db_missing_index_group_stats) does not indicate the total CPU consumed due too lack of that index.

    3. Missing indexes better be investigated together with existing indexes.
    Existing indexes usage is reported by sys.dm_db_index_usage_stats DMV (allows telling the productive usage out of the index overhead and system usage)

    4. Specifying clustered index's key column as part of non-clustering index INCLUDE columns is not wise indeed.
    However, specifying clustered index's key column as part of non-clustering index key may reduce the amount of non clustered index's leaf pages read during range scan. E.g. Clustered index on (C1, C2), non clustered index on (C3, C1, C4), and query WHERE C3 = 25 AND C1 = 67 AND C4 < 100.

    Tools that are focused on performance tuning, such as Precise SQLyzer (disclosure: I am an employee of Precise), provide missing indexes information for the most important SQL statements only.
    It shows the SQL statements that triggered the missing indexes recommendation, their true resource consumption as well as their execution plan.
    Thus, allows a more educated evaluation of the recommended indexes.
    In addition, it provides the necessary operational stats and usage stats for existing indexes allowing to identify unused indexes with high index overhead.

    Concentrating all relevant object and indexes information, forms a true Object Tuning solution.

    ReplyDelete