Total Pageviews

Showing posts with label Index. Show all posts
Showing posts with label Index. Show all posts

Tuesday, September 3, 2013

Choosing an efficient clustered key - Other side of the moon

Other side of the moon
Choosing an efficient clustered key is a crucial factor of good database performance. However this factor is often neglected during database schema design time leading to poor performance. It also becomes difficult to resolve issues when the database grows into multi-terabytes in size and even using huge hardware will not reach the satisfactory performance goal.

What is a Clustered Key?
“Clustered indexes sort and store the data rows in the table based on their key values. There can only be one clustered index per table, because the data rows themselves can only be sorted in one order”. To improve performance, reducing the IO overhead is necessary to have a clustered key on almost all tables in a high transactional database. Although there are numerous guidelines and good practices that are available to understand the necessity of an appropriate clustered key in a table; the question is how often do we consider those practices while choosing a clustered key?
 
It is generally advisable that a clustered key should be the one which is narrow in length and has a unique value column (such as primary key). If the column is not unique then Database Engine will add a 4-byte uniqueifier value to each row to make the column unique. This added value is internal and can’t be seen or accessed by the user and has some internal overhead. However, the more inefficiency occurs when the clustered key is wider than needed.

Pitfall of in-efficient clustered key:
1.      Fragmentation: Rapidly introduces more fragmentation.
2.      Page Split: A huge number of page allocations and de-allocations happen.
3.      Space: Requires more disk & memory, and IO cost will be high.
4.      CPU Usage: Observe high CPU due to excessive page split.
5.      Slowness: Query response time decreases.
6.      Optimization: Index optimization requires more time.

Good Clustered Key:
1.      A unique key column is the best candidate for a clustered key.
2.      IDENTITY column is a good choice as they are sequential.
3.      The column which is used on a JOIN clause.
4.      The column used to retrieve data sequentially.
5.      The column used in SORT (GROUP or ORDER) operation frequently.
6.      Frequently used in range scan (such as BETWEEN, >=, =< )
7.      Static Column: such as EmployeeID, SSN.

In-efficient choice for clustered key:
1.      Wide Keys: multi-columns key. Such as LastName + FirstName + MiddleName or Address1 + Adddress2 + Phone, so on. “The key values from the clustered index are used by all non-clustered indexes as lookup keys. Any non-clustered indexes defined on the same table will be significantly larger because the non-clustered index entries contain the clustering key and also the key columns defined for that non-clustered index.
2.      GUID: Randomly generated unique values leads to highest possible fragmentation. NEWSEQUENTIALID() can be used  instead of NEWID() to create GUID to reduce fragmentation in a table.
3.      Data Changes: The column which has frequent value change is not a good choice for a clustered key.

Narrow vs. Wide Clustered Key Test:
Here we will be observing how a wide clustered key introduces performance issues. In our example,
(a)   “xID” is the Clustered Key which is a Primary Key and an Identity column.
(b)   Later we will create a multi-column clustered key by using “sName1”, “sName2” and “sName3” which are varchar columns.
(c)    We will insert 100,000 rows for this test
(d)   We will review fragmentation and page split for type of indexes.

DMV Query:
--To check table and index level changes:
SELECT  OBJECT_NAME(ios.object_id, ios.database_id) as table_name,
              ios.index_id ,
        si.name AS index_name,
        ios.leaf_insert_count +
        ios.leaf_update_count +
        ios.leaf_delete_count AS leaf_changes,
        ios.leaf_allocation_count AS leaf_page_splits,
        ios.nonleaf_insert_count +
        ios.nonleaf_update_count +
        ios.nonleaf_delete_count AS nonleaf_changes,
        ios.nonleaf_allocation_count AS nonleaf_page_splits,
        (ios.range_scan_count + ios.leaf_insert_count
            + ios.leaf_delete_count + ios.leaf_update_count
            + ios.leaf_page_merge_count + ios.singleton_lookup_count
           ) total_changes
FROM    sys.dm_db_index_operational_stats(DB_ID(), NULL, NULL, NULL) ios
        JOIN sys.objects so ON so.object_id = ios.object_id
        JOIN sys.indexes si ON si.object_id = ios.object_id
                               AND si.index_id = ios.index_id
        JOIN sys.schemas ss ON so.schema_id = ss.schema_id
WHERE   OBJECTPROPERTY(ios.object_id, 'IsUserTable') = 1
ORDER BY leaf_changes DESC

--To check index fragmentation:
SELECT  a.index_id ,
        b.name AS [object_name],
        CONVERT(NUMERIC(5,2),a.avg_fragmentation_in_percent) pct_avg_fragmentation
FROM    sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, NULL) AS a
        JOIN sys.indexes AS b ON a.object_id = b.object_id
                                 AND a.index_id = b.index_id;


Script to test:
CREATE DATABASE TestDB
GO
USE [TestDB]
GO

CREATE TABLE [tblLarge](
       [xID] [int] IDENTITY(1,1) NOT NULL,
       [sName1] [varchar](10) DEFAULT 'ABC' NOT NULL,
       [sName2] [varchar](13) DEFAULT 'ABC' NOT NULL,
       [sName3] [varchar](36) DEFAULT 'ABC'NOT NULL,
       [sIdentifier] [char](2) NULL,
       [dDOB] [date] NULL,
       [nWage] [numeric](12, 2) NULL,
       [sLicense] [char](7) NULL,
       [bGender] [bit] NULL
) ON [PRIMARY]
GO


-- Clustered key on xID
ALTER TABLE tblLarge ADD CONSTRAINT PK_tblLarge
PRIMARY KEY CLUSTERED (xID) WITH (FILLFACTOR=90)
GO

-- DROP constraint
ALTER TABLE [dbo].[tblLarge] DROP CONSTRAINT [PK_tblLarge]
GO

-- Multi-column clustered key
ALTER TABLE tblLarge ADD CONSTRAINT PK_tblLarge
PRIMARY KEY CLUSTERED (sName1, sName2, sName3) WITH (FILLFACTOR=90)
GO

-- Insert 100,000 records
            INSERT  INTO tblLarge
                    ( sName1 ,
                      sName2 ,
                      sName3 ,
                      sIdentifier ,
                      dDOB ,
                      nWage ,
                      sLicense ,
                      bGender
                    )
            VALUES  ( LEFT(CAST(NEWID() AS VARCHAR(36)), 8) ,
                      LEFT(CAST(NEWID() AS VARCHAR(36)), 13) ,
                      CAST(NEWID() AS VARCHAR(36)) ,
                      LEFT(CAST(NEWID() AS VARCHAR(36)), 2) ,
                      DATEADD(dd, -RAND() * 20000, GETDATE()) ,
                      ( RAND() * 1000 ) ,
                      SUBSTRING(CAST(NEWID() AS VARCHAR(36)), 6, 7) ,
                      COALESCE(ISNUMERIC(LEFT(CAST(NEWID() AS VARCHAR(36)),1)),0))
GO 100000


Fragmentation Comparison:
As you can see from the following picture given below that the fragmentation and page split has been increased dramatically when wide key has been used.

Figure#1: Narrow clustered key


Figure#2: Multi-column clustered key







Conclusion:
While using wide key or multi-columns for clustered index is supported by SQL Server, but we should not overlook the dangerous performance consequences that occurs silently. 


Reference:
Clustered Index Design Guidelines

Saturday, April 20, 2013

Missing Index DMVs – misleading suggestions

Creating missing indexes is one of DBA’s core responsibilities in the production environment and makes the query execution much faster. A lot of enhancement has been done and new features have been added starting from SQL 2005 that make our day to day task more efficient and easier.

1.       Execution plan with tooltips and details properties.
2.       DMVs for missing indexes - index operational status and usages.

Where do you want to go from here?
While DMVs are great to begin with, there is a glitch with the output which we need to be aware of. When missing index information was added to the catalog views, SQL Server Query Optimizer does not consider existing indexes for the underlying table at all as suggests. It adds column name when a query is compiled or recompiled.

So if we consider these recommendations blindly, then we will end up creating a large number of indexes which is certainly not useful. Rather than improving the performance it will degrade the overall performance.


Missing Columns DMO:
Dynamic management object
Information returned
Returns summary information about missing index groups.
Returns information about a specific group of missing indexes
Returns detailed information about a missing index
Returns information about the database table columns that are missing an index.


Things to keep in mind about missing indexes DMV:
The missing indexes feature is a lightweight tool for finding missing indexes that might significantly improve query performance. It does not provide adequate information to fine tune indexing configuration. Below are some limitations:

1.       By default, the missing indexes feature is turned on.
2.       No controls are provided to turn the feature on or off, or to reset any of the tables returned when the dynamic management objects are queried.
3.       When SQL Server is restarted, all of the missing index information is dropped.
4.       This feature can only be disabled if an instance of SQL Server is started by using the -x argument with the “sqlservr.exe” command-prompt utility.
5.       Missing index DMVs are updated when a query is optimized by the query optimizer.
6.       Missing index information from “Estimate Execution Plan” will differ from “Actual Execution Plan”.
7.       It does not provide partitioning.
8.       It suggests to include a clustered index key with the INCLUDE option.
9.       It is not intended to fine tune an indexing configuration.
10.   It cannot gather statistics for more than 500 missing index groups.
11.   It does not specify an order for columns to be used in an index.
12.   For queries involving only inequality predicates, it returns less accurate cost information.
13.   It reports only include columns for some queries, so index key columns must be manually selected.
14.   It returns only raw information about columns on which indexes might be missing.
15.   It does not suggest filtered indexes.
16.   It can return different costs for the same missing index group that appears multiple times in XML Showplans.
17.   It does not consider trivial query plans.
18.   If a transaction creates or drops a table, the rows containing missing index information about the dropped objects are removed from the missing index management object.
19.   When the metadata for a table changes, all missing index information about that table is deleted from these dynamic management views.

Good practices:
Consider the following when using missing index DMVs to create or adjust indexes:

1.       Try to consolidate all the recommendations into a fewer number of indexes.
2.       Never include a clustered key in the index key columns or with the INCLUDE column.
3.       If an existing index needs to be adjusted, always use DROP_EXISING=ON with the CREATE INDEX statement.
4.       unique_compile” in “sys.dm_db_missing_index_group_stats” will tell us how many times a query has been complied or recompiled because of this missing index group.
5.       Review related queries and query execution plans along with missing index suggestions.

General Guidelines while using missing DMVs:
To convert the information returned by “sys.dm_db_missing_index_details” into a CREATE INDEX statement, consider the following the guidelines:

1.       Equality columns should be put before the inequality columns, and together they should make the key of the index. For example

EQUALITY = col1, col5
INEQUALITY = col3, col7
Index key will be: col1 + col5 + col3 + col7

2.       To determine an effective order for the equality columns, order them based on their selectivity: put the most selective columns (most unique value column) first (leftmost in the column list). For example, say col1, col2 and col3 has following distinct records

DISTINCT col1 = 100
DISTINCT col2 = 10
DISTINCT col3 = 60

Index Key will be: Col1 + Col3 + Col2

3.       Included columns should be added to the CREATE INDEX statement using the INCLUDE clause minus the clustered key. For example,

xID = Clustered key
nCustID = Key column
sName1= Non key column
dDOB= Non key column

Good: CREATE INDEX IX_nCustID ON Table1 (nCustID) INCLUDE (sName1, dDOB)
Bad: CREATE INDEX IX_nCustID ON Table1 (nCustID) INCLUDE (xID, sName1, dDOB)
Ugly: CREATE INDEX IX_nCustID ON Table1 (nCustID, xID) INCLUDE (sName1, dDOB)

4.       Column orders in INCLUDE option has no effect, for example INCLUDE (ColA, ColB, ColC) is the same as INCLUDE (ColB, ColA, ColC).

A practical way to create missing indexes:
In this exercise, we will be reviewing how to consolidate all recommendations about missing indexes and how to create the most beneficial one. To perform this test, Server 2012 is used however any SQL Server edition can be used.

DMV Query to find missing indexes and relevant information:

SELECT sys.schemas.schema_id
     , sys.schemas.name AS schema_name
     , sys.objects.object_id
     , sys.objects.name AS object_name
     , sys.objects.type
     , partitions.rows_count
     , partitions.size_mb
     , migs.unique_compiles
     , mid.equality_columns
     , mid.inequality_columns
     , mid.included_columns
     , migs.user_seeks
     , migs.user_scans
     , migs.avg_total_user_cost
     , migs.avg_user_impact
     , migs.last_user_seek
     , migs.last_user_scan
     , migs.system_seeks
     , migs.system_scans
     , migs.avg_total_system_cost
     , migs.avg_system_impact
     , migs.last_system_seek
     , migs.last_system_scan
     , (convert(NUMERIC(19, 2), migs.user_seeks) + convert(NUMERIC(19, 2), migs.unique_compiles))
       * convert(NUMERIC(19, 2), migs.avg_total_user_cost)
       * convert(NUMERIC(19, 2), migs.avg_user_impact / 100.0) AS score
FROM
  sys.objects
  JOIN (SELECT object_id
             , sum(CASE
                 WHEN index_id BETWEEN 0 AND 1 THEN
                   row_count
                 ELSE
                   0
               END) AS rows_count
             , convert(NUMERIC(19, 2), convert(NUMERIC(19, 3), sum(in_row_reserved_page_count + lob_reserved_page_count
               + row_overflow_reserved_page_count)) / convert(NUMERIC(19, 2), 128)) AS size_mb
        FROM
          sys.dm_db_partition_stats
        WHERE
          sys.dm_db_partition_stats.index_id BETWEEN 0 AND 1
        GROUP BY
          object_id
       ) AS partitions
    ON sys.objects.object_id = partitions.object_id
  JOIN sys.schemas
    ON sys.objects.schema_id = sys.schemas.schema_id
  JOIN sys.dm_db_missing_index_details mid
    ON sys.objects.object_id = mid.object_id
  JOIN sys.dm_db_missing_index_groups mig
    ON mid.index_handle = mig.index_handle
  JOIN sys.dm_db_missing_index_group_stats migs
    ON mig.index_group_handle = migs.group_handle
WHERE
  mid.database_id = db_id()

Let’s do a simple test:

1.       Create the database and table:

CREATE DATABASE TestDB
GO
USE [TestDB]
GO

CREATE TABLE [tblLarge](
       [xID] [int] IDENTITY(1,1) NOT NULL,
       [sName1] [varchar](10) NULL,
       [sName2] [varchar](13) NULL,
       [sName3] [varchar](36) NULL,
       [nCustID] INT NULL,
       [sIdentifier] [char](2) NULL,
       [dDOB] [date] NULL,
       [nWage] [numeric](12, 2) NULL,
       [sLicense] [char](7) NULL,
       [bGender] [bit] NULL
) ON [PRIMARY]
GO

2.       Run the following INSERT statement in a loop to insert 1,300,000 records.

SET NOCOUNT ON
INSERT  INTO tblLarge
        ( sName1 ,
          sName2 ,
          sName3 ,
          nCustID ,
          sIdentifier ,
          dDOB ,
          nWage ,
          sLicense ,
          bGender
                   
        )
VALUES  ( LEFT(CAST(NEWID() AS VARCHAR(36)), RAND() * 10) ,
          LEFT(CAST(NEWID() AS VARCHAR(36)), RAND() * 11) ,
          CAST(NEWID() AS VARCHAR(36)) ,
          FLOOR(RAND() * 10000) ,
          LEFT(CAST(NEWID() AS VARCHAR(36)), 2) ,
          DATEADD(dd, -RAND() * 20000, GETDATE()) ,
          ( RAND() * 1000 ) ,
          SUBSTRING(CAST(NEWID() AS VARCHAR(36)), 6, 7) ,
          COALESCE(ISNUMERIC(LEFT(CAST(NEWID() AS VARCHAR(36)), 1)), 0)
        )
      
GO 1300000

3.       Create a clustered index on the xID Column.

ALTER TABLE tblLarge ADD CONSTRAINT PK_tblLarge
PRIMARY KEY CLUSTERED (xID) WITH (FILLFACTOR=90)

4.       Execute the following SELECT statements several times to get missing index suggestions:

-- First query
SELECT  nCustID ,
        sName1 ,
        sName2 ,
        dDOB
FROM    tblLarge
WHERE   dDOB BETWEEN '2010-05-29' AND '2010-10-25'
        AND nCustID BETWEEN 10 AND 1000
ORDER BY nCustID

-- Query Two
SELECT  nCustID ,
        sName1 ,
        sIdentifier
FROM    tblLarge
WHERE   sIdentifier = 'AA'
        AND dDOB BETWEEN '2000-01-01' AND '2010-10-10'
ORDER BY nCustID

-- Query Three
SELECT  nCustID ,
        COUNT(nCustID) nCount
FROM    tblLarge
WHERE   nCustID BETWEEN 50 AND 150
        AND dDOB BETWEEN '2005-01-01' AND '2010-10-10'
GROUP BY nCustID

-- Query Four
SELECT  nCustID ,
        sName1 ,
        sName2 ,
        sIdentifier
FROM    tblLarge
WHERE   nCustID >= 100
        AND nCustID <= 500
        AND sIdentifier LIKE 'B%'

5.        After executing all the given queries, we can get the following missing index recommendations if we run the DMV query listed above.

6.       Index creation: Based on the missing index recommendations, we may create the following two indexes which will help all queries.

CREATE NONCLUSTERED INDEX IX_nCustID
ON [dbo].[tblLarge] ([nCustID],[dDOB])
INCLUDE ([sName1],[sName2])
GO

CREATE NONCLUSTERED INDEX IX_sIdentifier
ON [dbo].[tblLarge] ([sIdentifier], [dDOB])
INCLUDE ([sName1],[sName2], [nCustID])
GO

7.       Our second query needs a little optimization. When a key column is index, then the Explicit ORDER BY on that column is not necessary. If we remove the “ORDER BY nCustID” the Sort operator in the execution plan will disappear.


Figure # 1A: Query Execution Plan with missing Indexes

Figure # 1B: Missing Index suggestions
Figure #2: Query Execution Plan after Missing Index creation

Summary:
Missing index DMVs are a great tool to use to understand the index requirements if it is being used cautiously. There are tons of robotic index creation scripts that are available to create missing indexes by utilizing DMVs and none of them consider the limitations of those DMV’s.  Mainly, the beginners get trapped and end up creating a number of inefficient indexes.

If we review the Query, Execution Plan and missing index DMVs while creating indexes, then this combination will be a great mechanism to create the most efficient indexes. However, solely using the missing DMV’s will not evaluate the situation effectively nor correctly.