Total Pageviews

Monday, May 6, 2013

“ANSI_NULLS”, “JOIN/WHERE” and “Query Optimizer” - story of the three stooges


A friend of mine who lives in Nova Scotia is managing a couple of large busy OLTP database servers. A few days ago, he asked me curiously, what’s going on with “ANSI_NULLS” here. Although, the “ANSI_NULLS” setting is simple but the misconfiguration or misuse of this setting can cause unbelievable performance impact. From my experience, I have seen many experts getting stressed to resolve this mysterious performance issue or to find out the root cause.

















There is a very close relation among “ANSI_NULLS”, “JOIN/WHERE” and “Query Optimizer” when a query and its search column contains NULL values. Microsoft SQL Server suggests that when a Stored Procedure (P), Trigger (T), and others get created the following two settings must be executed right before creating or modifying any P, V, IV, IF, FN, T and TF from the same connection. For example,

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

CREATE/ALTER PROCEDURE usp_PROC
AS
    BEGIN
       -- T-SQL Code goes here
    END
GO

The “ANSI_NULLS” setting is basically for SQL Server Query optimizer. That said, when the Query Optimizer creates an Execution Plan it checks the Index and table properties to incorporate any object level settings specified when a stored procedure is created. So if the “ANSI_NULLS” is tuned off then SQL Server Query Optimizer must first scan the table or index for “NULL” and then filter the result set with a “SORT” operator. This query execution behavior is by design.

MSDN explains “ANSI_NULLS” as follows:

1.      When SET ANSI_NULLS is ON, a SELECT statement that uses WHERE column_name = NULL returns zero rows even if there are null values in column_name. A SELECT statement that uses WHERE column_name <> NULL returns zero rows even if there are nonnull values in column_name.

When SET ANSI_NULLS is ON, all comparisons against a null value evaluate to UNKNOWN. When SET ANSI_NULLS is OFF, comparisons of all data against a null value evaluate to TRUE if the data value is NULL. If SET ANSI_NULLS is not specified, the setting of the ANSI_NULLS option of the current database applies.

2.      When SET ANSI_NULLS is OFF, the Equals (=) and Not Equal To (<>) comparison operators do not follow the ISO standard. A SELECT statement that uses WHERE column_name = NULL returns the rows that have null values in column_name. A SELECT statement that uses WHERE column_name <> NULL returns the rows that have nonnull values in the column. Also, a SELECT statement that uses WHERE column_name <> XYZ_value returns all rows that are not XYZ_value and that are not NULL.

The “ANSI_NULLS” setting can be overridden explicitly by putting the command in the top of the batch or inside the stored procedure. However, doing so will prevent plan reuse and query will be recompiled in every execution.

Good Practices and Good to Know:
1.      Always use “SET ANSI_NULLS ON” while creating stored procedures or indexes.
2.      Avoid using “SET ANSI_NULLS” inside a stored procedure.
3.      Indexed view and computed column requires “SET ANSI_NULLS ON” otherwise “UPDATE/INSERT” will fail.
4.      “LEFT JOIN” usually a costly operation; try to use “INNER JOIN” if possible.
5.      Use more filters on the “JOIN" or “WHERE” clause.
6.      Use “IS NULL” or “IS NOT NULL” keyword with the “JOIN” or the “WHERE” clause.
7.      Misuse of “ANSI_NULLS” causes parameter sniffing.
8.      “ANSI_NULLS” setting can be changed server-wide, per database basis and in a T-SQL batch/stored procedure.


Caution:
“In a future version of SQL Server, ANSI_NULLS will always be ON and any applications that explicitly set the option to OFF will generate an error. Avoid using this feature in new development work, and plan to modify applications that currently use this feature.”

Default SET Commands:

SET
Command
Applications using
ADO .Net, ODBC or OLE DB
SSMS,
Query Analyzer
SQLCMD,
OSQL, BCP,
SQL Server Agent
ISQL,
DB-Library
ANSI_NULL_DFLT_ON
ON
ON
ON
OFF
ANSI_NULLS
ON
ON
ON
OFF
ANSI_PADDING
ON
ON
ON
OFF
ANSI_WARNINGS
ON
ON
ON
OFF
CONACT_NULLS_YIELD_NULL
ON
ON
ON
OFF
QUOTED_IDENTIFIER
ON
ON
OFF
OFF
ARITHABORT
OFF
ON
OFF
OFF

Observing the Query Execution Behavior:
We will examine the query execution behavior with or without the “ANSI_NULLS” setting. Later we will try to optimize the Query so that it will run in any setting conditions.
Testing the Behavior:
1.      Create a sample database and two tables.
2.      Add records with and without “NULL” values.
3.      Create clustered key and index.
4.      Create Stored Procedure with ON and OFF settings.
5.      Rewrite the query to deal with any situations.

Script to Create database and tables:

SET NOCOUNT ON
IF OBJECT_ID('tblLarge') IS NOT NULL
   DROP TABLE tblLarge
GO

CREATE TABLE tblLarge
       (
         xID INT IDENTITY(1, 1) ,
         sName1 VARCHAR(100) ,
         sName2 VARCHAR(1000) ,
         sName3 VARCHAR(400) ,
         sIdentifier CHAR(100) ,
         dDOB DATETIME NULL,
         nWage NUMERIC(20, 2) ,
         sLicense VARCHAR(25)
       )
GO

/******************************************************
Add 1000 or more/less records with non-blank dDOB
******************************************************/
SET NOCOUNT ON
INSERT  INTO tblLarge
        ( sName1 ,
          sName2 ,
          sName3 ,
          sIdentifier ,
          dDOB ,
          nWage ,
          sLicense
        )
VALUES  ( LEFT(CAST(NEWID() AS VARCHAR(36)), RAND() * 50) ,    -- sName1
          LEFT(CAST(NEWID() AS VARCHAR(36)), RAND() * 60) ,    -- sName2
          LEFT(CAST(NEWID() AS VARCHAR(36)), RAND() * 70) ,    -- sName2
          LEFT(CAST(NEWID() AS VARCHAR(36)), 2) ,              -- sIdentifier    
          DATEADD(dd, -RAND() * 20000, GETDATE()) ,            -- dDOB or NULL
          ( RAND() * 1000 ) ,                                  -- nWage
          SUBSTRING(CAST(NEWID() AS VARCHAR(36)), 6, 7)        -- sLicense       
        )
GO 1000


/******************************************************
Add 1000 or more/less records with dDOB = NULL
******************************************************/
SET NOCOUNT ON
INSERT INTO tblLarge
( sName1 ,
sName2 ,
sName3 ,
sIdentifier ,
dDOB ,
nWage ,
sLicense
)
VALUES ( LEFT(CAST(NEWID() AS VARCHAR(36)), RAND() * 50) , -- sName1
LEFT(CAST(NEWID() AS VARCHAR(36)), RAND() * 60) , -- sName2
LEFT(CAST(NEWID() AS VARCHAR(36)), RAND() * 70) , -- sName2
LEFT(CAST(NEWID() AS VARCHAR(36)), 2) , -- sIdentifier
NULL,            -- dDOB is NULL
( RAND() * 1000 ) , -- nWage
SUBSTRING(CAST(NEWID() AS VARCHAR(36)), 6, 7) -- sLicense
)
GO 1000

Create indexes and another small table:
CREATE TABLE [dbo].[tblSmall](
                [sIdentifier] [char](100) NOT NULL,
                [nCount] [int] NULL)
GO

INSERT INTO tblSmall SELECT sIdentifier, COUNT(*) AS nCount
FROM tblLarge WHERE sIdentifier IS NOT NULL 
GROUP BY sIdentifier
GO

ALTER TABLE [tblLarge]
ADD  CONSTRAINT [PK_tblLarge]
PRIMARY KEY CLUSTERED ([xID] ASC)

CREATE NONCLUSTERED INDEX [IX_dDOB]
ON [tblLarge] ([dDOB] ASC)
INCLUDE([sIdentifier])

ALTER TABLE [tblSmall]
ADD  CONSTRAINT [PK_tblSmall]
PRIMARY KEY CLUSTERED ([sIdentifier] ASC)
GO

Script to create Stored Procedure:
We will create two Stored Procedures; one with “SET ANSI_NULLS” on and another with off.

-- stored procedure with "ANSI_NULLS ON"
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

CREATE PROC [dbo].[usp_ANSI_on]
    @dDOB AS DATETIME ,
    @sIdentifier AS CHAR(100)
AS
    BEGIN
        SELECT  b.xID ,
                b.sIdentifier ,
                b.dDOB ,
                a.nCount
        FROM    tblSmall a
                LEFT JOIN tblLarge b ON a.sIdentifier = b.sIdentifier
        WHERE   a.sIdentifier = @sIdentifier
                AND b.dDOB = @dDOB
    END
GO


-- stored procedure with "ANSI_NULLS OFF"
SET ANSI_NULLS OFF
GO
SET QUOTED_IDENTIFIER ON
GO

CREATE PROC [dbo].[usp_ANSI_off]
    @dDOB AS DATETIME ,
    @sIdentifier AS CHAR(100)
AS
    BEGIN
        SELECT  b.xID ,
                b.sIdentifier ,
                b.dDOB ,
                a.nCount
        FROM    tblSmall a
                LEFT JOIN tblLarge b ON a.sIdentifier = b.sIdentifier
        WHERE   a.sIdentifier = @sIdentifier
                AND b.dDOB = @dDOB
    END
GO


Executing two Stored Procedures:
From a new query window, run the two stored procedures and capture the actual execution plan. The following is my output from my test.

Figure# 1: Comparing the execution plans.


From the above actual execution plan, it seems that we need to create an index. So let’s create the missing index. Execute both SPs again.

CREATE NONCLUSTERED INDEX [IX_sIdentifier]
ON [tblLarge] (  [sIdentifier] ASC)
INCLUDE(dDOB)


Figure# 2: Comparing the execution plans after creating the index.


Review and Analysis:
SQL Server now stopped asking to create missing indexes, but unfortunately it is still using a simillar execution plan with a "SORT Operator" even after clearing the cache.

1.      So, it makes sense that “ANSI_NULLS” has an effect on the execution plan.
2.      “ANSI_NULLS OFF” cannot use index even though the appropriate index is there.
3.      CPU and overall query cost will be high as it has to scan or seek all “NULL” values for further processes.

Resolving the issue:
To resolve the costing issue and to improve its performance, we can perform the following two options:

1.      Create or modify Stored Procedures with “SET ANSI_NULLS ON”.
2.      Rewrite the query; in our case we can move the “dDOB” after the “JOIN” clause. This will work regardless of the “ANSI_NULLS” setting.

CREATE PROC [dbo].[usp_ANSI_off_optimize]
    @dDOB AS DATETIME ,
    @sIdentifier AS CHAR(100)
AS
    BEGIN
        SELECT  b.xID ,
                b.sIdentifier ,
                b.dDOB ,
                a.nCount
        FROM    tblSmall a
                LEFT JOIN tblLarge b ON a.sIdentifier = b.sIdentifier
                                        AND b.dDOB = @dDOB
        WHERE   a.sIdentifier = @sIdentifier
    END
GO

--Using JOIN
CREATE PROC [dbo].[usp_ANSI_off_optimize_JOIN]
       @dDOB AS DATETIME ,
       @sIdentifier AS CHAR(100)
AS
       BEGIN
              SELECT b.xID ,
                    b.sIdentifier ,
                    b.dDOB ,
                    a.nCount
             FROM   tblSmall a
                    JOIN tblLarge b ON a.sIdentifier = b.sIdentifier
                                       AND b.dDOB = @dDOB
             WHERE  a.sIdentifier = @sIdentifier
        END
 GO


Query to Identify “ANSI_NULLS” setting:
To identify where “ANSI_NULLS” ( as well “QUOTED_IDENTIFIER”) were used when Stored Procedures were created, the following query can be used.

/*******************************************
** Following two settings must be ON
** SET ANSI_NULLS ON
** SET QUOTED_IDENTIFIER ON
*******************************************/
SELECT  SCHEMA_NAME(s.schema_id) + '.' + s.name AS name ,
        s.create_date ,
        s.modify_date ,
        ( CASE WHEN OBJECTPROPERTY(s.object_id, 'ExecIsQuotedIdentOn') = 1 THEN 'ON'
               ELSE 'OFF'
          END ) AS 'Setting of QUOTED_IDENTIFIER at creation time' ,
        ( CASE WHEN OBJECTPROPERTY(s.object_id, 'ExecIsAnsiNullsOn') = 1 THEN 'ON'
               ELSE 'OFF'
          END ) AS 'Setting of ANSI_NULLS at creation time'
FROM    sys.objects s
WHERE   s.type IN ( 'P', 'TR', 'V', 'IF', 'FN', 'TF' )
        AND OBJECTPROPERTY(s.object_id, 'ExecIsQuotedIdentOn') = 0
        AND OBJECTPROPERTY(s.object_id, 'ExecIsAnsiNullsOn') = 0
ORDER BY s.name ASC

References:
SET ANSI_NULLS (Transact-SQL)
http://msdn.microsoft.com/en-CA/library/ms188048.aspx

OBJECTPROPERTY (Transact-SQL)
http://msdn.microsoft.com/en-us/library/ms176105.aspx

Slow in the Application, Fast in SSMS? Understanding Performance Mysteries.
http://www.sommarskog.se/query-plan-mysteries.html

2 comments:

  1. Nice Article !

    Really this will help to people of SQL Server Community.
    I have also prepared small note on this, what is quoted_identifier and ansi_null in SQL Server.

    http://www.dbrnd.com/2015/10/sql-server-what-is-quoted_identifier-onoff-and-ansi_null-onoff/

    ReplyDelete