SQL Server – performance and other stories: Missing Index DMVs

Creating missing indexes is one of DBA’s core responsibilities in the production environment and makes the query execution much faster. A lot of enhancement has been done and new features have been added starting from SQL 2005 that make our day to day task more efficient and easier.

1. Execution plan with tooltips and details properties.

2. DMVs for missing indexes - index operational status and usages.

Where do you want to go from here?

While DMVs are great to begin with, there is a glitch with the output which we need to be aware of. When missing index information was added to the catalog views, SQL Server Query Optimizer does not consider existing indexes for the underlying table at all as suggests. It adds column name when a query is compiled or recompiled.

So if we consider these recommendations blindly, then we will end up creating a large number of indexes which is certainly not useful. Rather than improving the performance it will degrade the overall performance.

Missing Columns DMO:

Dynamic management object	Information returned
sys.dm_db_missing_index_group_stats (Transact-SQL)	Returns summary information about missing index groups.
sys.dm_db_missing_index_groups (Transact-SQL)	Returns information about a specific group of missing indexes
sys.dm_db_missing_index_details (Transact-SQL)	Returns detailed information about a missing index
sys.dm_db_missing_index_columns (Transact-SQL)	Returns information about the database table columns that are missing an index.

Things to keep in mind about missing indexes DMV:

The missing indexes feature is a lightweight tool for finding missing indexes that might significantly improve query performance. It does not provide adequate information to fine tune indexing configuration. Below are some limitations:

1. By default, the missing indexes feature is turned on.

2. No controls are provided to turn the feature on or off, or to reset any of the tables returned when the dynamic management objects are queried.

3. When SQL Server is restarted, all of the missing index information is dropped.

4. This feature can only be disabled if an instance of SQL Server is started by using the -x argument with the “sqlservr.exe” command-prompt utility.

5. Missing index DMVs are updated when a query is optimized by the query optimizer.

6. Missing index information from “Estimate Execution Plan” will differ from “Actual Execution Plan”.

7. It does not provide partitioning.

8. It suggests to include a clustered index key with the INCLUDE option.

9. It is not intended to fine tune an indexing configuration.

10. It cannot gather statistics for more than 500 missing index groups.

11. It does not specify an order for columns to be used in an index.

12. For queries involving only inequality predicates, it returns less accurate cost information.

13. It reports only include columns for some queries, so index key columns must be manually selected.

14. It returns only raw information about columns on which indexes might be missing.

15. It does not suggest filtered indexes.

16. It can return different costs for the same missing index group that appears multiple times in XML Showplans.

17. It does not consider trivial query plans.

18. If a transaction creates or drops a table, the rows containing missing index information about the dropped objects are removed from the missing index management object.

19. When the metadata for a table changes, all missing index information about that table is deleted from these dynamic management views.

Good practices:

Consider the following when using missing index DMVs to create or adjust indexes:

1. Try to consolidate all the recommendations into a fewer number of indexes.

2. Never include a clustered key in the index key columns or with the INCLUDE column.

3. If an existing index needs to be adjusted, always use DROP_EXISING=ON with the CREATE INDEX statement.

4. “unique_compile” in “sys.dm_db_missing_index_group_stats” will tell us how many times a query has been complied or recompiled because of this missing index group.

5. Review related queries and query execution plans along with missing index suggestions.

General Guidelines while using missing DMVs:

To convert the information returned by “sys.dm_db_missing_index_details” into a CREATE INDEX statement, consider the following the guidelines:

1. Equality columns should be put before the inequality columns, and together they should make the key of the index. For example

EQUALITY = col1, col5

INEQUALITY = col3, col7

Index key will be: col1 + col5 + col3 + col7

2. To determine an effective order for the equality columns, order them based on their selectivity: put the most selective columns (most unique value column) first (leftmost in the column list). For example, say col1, col2 and col3 has following distinct records

DISTINCT col1 = 100

DISTINCT col2 = 10

DISTINCT col3 = 60

Index Key will be: Col1 + Col3 + Col2

3. Included columns should be added to the CREATE INDEX statement using the INCLUDE clause minus the clustered key. For example,

xID = Clustered key

nCustID = Key column

sName1= Non key column

dDOB= Non key column

Good: CREATE INDEX IX_nCustID ON Table1 (nCustID) INCLUDE (sName1, dDOB)

Bad: CREATE INDEX IX_nCustID ON Table1 (nCustID) INCLUDE (xID, sName1, dDOB)

Ugly: CREATE INDEX IX_nCustID ON Table1 (nCustID, xID) INCLUDE (sName1, dDOB)

4. Column orders in INCLUDE option has no effect, for example INCLUDE (ColA, ColB, ColC) is the same as INCLUDE (ColB, ColA, ColC).

A practical way to create missing indexes:

In this exercise, we will be reviewing how to consolidate all recommendations about missing indexes and how to create the most beneficial one. To perform this test, Server 2012 is used however any SQL Server edition can be used.

DMV Query to find missing indexes and relevant information:

SELECT sys.schemas.schema_id

, sys.schemas.name AS schema_name

, sys.objects.object_id

, sys.objects.name AS object_name

, sys.objects.type

, partitions.rows_count

, partitions.size_mb

, migs.unique_compiles

, mid.equality_columns

, mid.inequality_columns

, mid.included_columns

, migs.user_seeks

, migs.user_scans

, migs.avg_total_user_cost

, migs.avg_user_impact

, migs.last_user_seek

, migs.last_user_scan

, migs.system_seeks

, migs.system_scans

, migs.avg_total_system_cost

, migs.avg_system_impact

, migs.last_system_seek

, migs.last_system_scan

, (convert(NUMERIC(19, 2), migs.user_seeks) + convert(NUMERIC(19, 2), migs.unique_compiles))

* convert(NUMERIC(19, 2), migs.avg_total_user_cost)

* convert(NUMERIC(19, 2), migs.avg_user_impact / 100.0) AS score

FROM

sys.objects

JOIN (SELECT object_id

, sum(CASE

WHEN index_id BETWEEN 0 AND 1 THEN

row_count

ELSE

END) AS rows_count

, convert(NUMERIC(19, 2), convert(NUMERIC(19, 3), sum(in_row_reserved_page_count + lob_reserved_page_count

+ row_overflow_reserved_page_count)) / convert(NUMERIC(19, 2), 128)) AS size_mb

FROM

sys.dm_db_partition_stats

WHERE

sys.dm_db_partition_stats.index_id BETWEEN 0 AND 1

GROUP BY

object_id

) AS partitions

ON sys.objects.object_id = partitions.object_id

JOIN sys.schemas

ON sys.objects.schema_id = sys.schemas.schema_id

JOIN sys.dm_db_missing_index_details mid

ON sys.objects.object_id = mid.object_id

JOIN sys.dm_db_missing_index_groups mig

ON mid.index_handle = mig.index_handle

JOIN sys.dm_db_missing_index_group_stats migs

ON mig.index_group_handle = migs.group_handle

WHERE

mid.database_id = db_id()

Let’s do a simple test:

1. Create the database and table:

CREATE DATABASE TestDB

USE [TestDB]

CREATE TABLE [tblLarge](

[xID] [int] IDENTITY(1,1) NOT NULL,

[sName1] [varchar](10) NULL,

[sName2] [varchar](13) NULL,

[sName3] [varchar](36) NULL,

[nCustID] INT NULL,

[sIdentifier] [char](2) NULL,

[dDOB] [date] NULL,

[nWage] [numeric](12, 2) NULL,

[sLicense] [char](7) NULL,

[bGender] [bit] NULL

) ON [PRIMARY]

2. Run the following INSERT statement in a loop to insert 1,300,000 records.

SET NOCOUNT ON

INSERT INTO tblLarge

( sName1 ,

sName2 ,

sName3 ,

nCustID ,

sIdentifier ,

dDOB ,

nWage ,

sLicense ,

bGender

)

VALUES ( LEFT(CAST(NEWID() AS VARCHAR(36)), RAND() * 10) ,

LEFT(CAST(NEWID() AS VARCHAR(36)), RAND() * 11) ,

CAST(NEWID() AS VARCHAR(36)) ,

FLOOR(RAND() * 10000) ,

LEFT(CAST(NEWID() AS VARCHAR(36)), 2) ,

DATEADD(dd, -RAND() * 20000, GETDATE()) ,

( RAND() * 1000 ) ,

SUBSTRING(CAST(NEWID() AS VARCHAR(36)), 6, 7) ,

COALESCE(ISNUMERIC(LEFT(CAST(NEWID() AS VARCHAR(36)), 1)), 0)

)

GO 1300000

3. Create a clustered index on the xID Column.

ALTER TABLE tblLarge ADD CONSTRAINT PK_tblLarge

PRIMARY KEY CLUSTERED (xID) WITH (FILLFACTOR=90)

4. Execute the following SELECT statements several times to get missing index suggestions:

-- First query

SELECT nCustID ,

sName1 ,

sName2 ,

dDOB

FROM tblLarge

WHERE dDOB BETWEEN '2010-05-29' AND '2010-10-25'

AND nCustID BETWEEN 10 AND 1000

ORDER BY nCustID

-- Query Two

SELECT nCustID ,

sName1 ,

sIdentifier

FROM tblLarge

WHERE sIdentifier = 'AA'

AND dDOB BETWEEN '2000-01-01' AND '2010-10-10'

ORDER BY nCustID

-- Query Three

SELECT nCustID ,

COUNT(nCustID) nCount

FROM tblLarge

WHERE nCustID BETWEEN 50 AND 150

AND dDOB BETWEEN '2005-01-01' AND '2010-10-10'

GROUP BY nCustID

-- Query Four

SELECT nCustID ,

sName1 ,

sName2 ,

sIdentifier

FROM tblLarge

WHERE nCustID >= 100

AND nCustID <= 500

AND sIdentifier LIKE 'B%'

5. After executing all the given queries, we can get the following missing index recommendations if we run the DMV query listed above.

6. Index creation: Based on the missing index recommendations, we may create the following two indexes which will help all queries.

CREATE NONCLUSTERED INDEX IX_nCustID

ON [dbo].[tblLarge] ([nCustID],[dDOB])

INCLUDE ([sName1],[sName2])

CREATE NONCLUSTERED INDEX IX_sIdentifier

ON [dbo].[tblLarge] ([sIdentifier], [dDOB])

INCLUDE ([sName1],[sName2], [nCustID])

7. Our second query needs a little optimization. When a key column is index, then the Explicit ORDER BY on that column is not necessary. If we remove the “ORDER BY nCustID” the Sort operator in the execution plan will disappear.

Figure # 1A: Query Execution Plan with missing Indexes

Figure # 1B: Missing Index suggestions

Figure #2: Query Execution Plan after Missing Index creation

Summary:

Missing index DMVs are a great tool to use to understand the index requirements if it is being used cautiously. There are tons of robotic index creation scripts that are available to create missing indexes by utilizing DMVs and none of them consider the limitations of those DMV’s. Mainly, the beginners get trapped and end up creating a number of inefficient indexes.

If we review the Query, Execution Plan and missing index DMVs while creating indexes, then this combination will be a great mechanism to create the most efficient indexes. However, solely using the missing DMV’s will not evaluate the situation effectively nor correctly.

4 comments:

TheSQLGuruJuly 2, 2013 at 9:19 PM
DISTINCT col1 = 100
DISTINCT col2 = 10
DISTINCT col3 = 60

Index Key will be: Col2 + Col3 + Col1

Shouldn't the order be col1, col3, col2 (most to least specific)?
Sarjen HaqueJuly 2, 2013 at 9:27 PM
Yes, it should be
Col1 col3 col2.
.
My mistake, I'll correct it. Thanks!
AnonymousJuly 4, 2013 at 7:22 AM
Great article.

For the sake of completeness, additional concerns:
1. Missing indexes DMVs do not specify which statements triggered / will be affected by the missing index.
Thus, hard to review their business importance.

2. The total amount of seeks and scans by these statements (reported in sys.dm_db_missing_index_group_stats) does not indicate the total CPU consumed due too lack of that index.

3. Missing indexes better be investigated together with existing indexes.
Existing indexes usage is reported by sys.dm_db_index_usage_stats DMV (allows telling the productive usage out of the index overhead and system usage)

4. Specifying clustered index's key column as part of non-clustering index INCLUDE columns is not wise indeed.
However, specifying clustered index's key column as part of non-clustering index key may reduce the amount of non clustered index's leaf pages read during range scan. E.g. Clustered index on (C1, C2), non clustered index on (C3, C1, C4), and query WHERE C3 = 25 AND C1 = 67 AND C4 < 100.

Tools that are focused on performance tuning, such as Precise SQLyzer (disclosure: I am an employee of Precise), provide missing indexes information for the most important SQL statements only.
It shows the SQL statements that triggered the missing indexes recommendation, their true resource consumption as well as their execution plan.
Thus, allows a more educated evaluation of the recommended indexes.
In addition, it provides the necessary operational stats and usage stats for existing indexes allowing to identify unused indexes with high index overhead.

Concentrating all relevant object and indexes information, forms a true Object Tuning solution.

SQL Server – performance and other stories

Total Pageviews

Saturday, April 20, 2013

Missing Index DMVs – misleading suggestions

4 comments: