Total Pageviews

Friday, March 8, 2013

OR vs. UNION ALL – have you thought about it?

Often we write a query which requires retrieving data based on two or more logical blocks of condition. Such as after the WHERE clause a search criterion may be as follows:

SELECT a, b, c FROM sometable
WHERE
((a=2 and b=4) AND c=6)
OR
((a=10 and b=20) AND c=30)

Logically, there is no issue with the construction of this query. It does the job just fine. However, is there any other technique or better way to write this query which is more efficient and cost effective? In respect to SQL Server Query Optimizer, yes we have a more efficient method if we use "UNION ALL". Basically, the “OR” is a squeezed and simplified structure of “UNION” .

A union may look like "Last Supper" - a paining by Leonardo Da Vinci.

In this test we will examine the same query in two different ways and compare the efficiency and cost.

Steps to follow:
1.       Create a database “TestDB”.
2.       Create a table “tblLarge” and populate it with 1 million records.
3.       Create clustered and a non-clustered covering index.
4.       Write two queries; one with using “OR” and another with “UNION ALL”.
5.       We will clear buffer and plan cache before every execution.

Let’s see it:

1.       Create the “TestDB”, create a Table “TblLarge” and populate it with One Million records
CREATE DATABASE TestDB
GO

USE TestDB
GO

CREATE TABLE tblLarge
    (
      xID INT IDENTITY(1, 1) ,
      sName1 VARCHAR(100) ,
      sName2 VARCHAR(1000) ,
      sName3 VARCHAR(400) ,
      sIdentifier CHAR(10) ,
      dDOB DATETIME ,
      nWage NUMERIC(20, 2) ,
      sLicense VARCHAR(25)
     )
GO

SET NOCOUNT ON
INSERT  INTO tblLarge
        ( sName1 ,
          sName2 ,
          sName3 ,
          sIdentifier ,
          dDOB ,
          nWage ,
          sLicense
        )
VALUES  ( LEFT(CAST(NEWID() AS VARCHAR(36)), RAND()*50) ,  -- sName1
          LEFT(CAST(NEWID() AS VARCHAR(36)), RAND()*60) ,  -- sName2
          LEFT(CAST(NEWID() AS VARCHAR(36)), RAND()*70) ,  -- sName2
          LEFT(CAST(NEWID() AS VARCHAR(36)), 2) ,          -- sIdentifier    
          DATEADD(dd, -RAND() * 20000, GETDATE()) ,        -- dDOB
          ( RAND() * 1000 ) ,                              -- nWage
          SUBSTRING(CAST(NEWID() AS VARCHAR(36)), 6, 7)    -- sLicense
         )
GO 1000000

2.       Create clustered and a non-clustered covering index:
ALTER TABLE dbo.tblLarge ADD CONSTRAINT PK_tblLarge PRIMARY KEY CLUSTERED (xID)
GO

CREATE NONCLUSTERED INDEX IX_tblLarge_sIdentifier ON dbo.tblLarge
       (sIdentifier)
INCLUDE (sName1, sName2, sName3) 
       WITH( STATISTICS_NORECOMPUTE = OFF)
GO

3.       Update statistics with FULL SCAN
UPDATE STATISTICS tblLarge WITH FULLSCAN

4.       The query with the OR logic. Use any search argument as your choice.
DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE

SELECT  xID ,
        sName1 ,
        sName2 ,
        sName3
FROM    tblLarge
WHERE   ( ( xID BETWEEN 200000 AND 200090 )
          AND sIdentifier = 'BC'
        )
        OR ( ( xID BETWEEN 500000 AND 500050 )
             AND sIdentifier = 'CD'
           )
GO

5.       The query with the UNION ALL. Use any search argument as your choice.
DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE

SELECT  xID ,
        sName1 ,
        sName2 ,
        sName3
FROM    tblLarge
WHERE   ( ( xID BETWEEN 200000 AND 200090 )
          AND sIdentifier = 'BC'
        )
UNION ALL
SELECT  xID ,
        sName1 ,
        sName2 ,
        sName3
FROM    tblLarge
WHERE   ( ( xID BETWEEN 500000 AND 500050 )
          AND sIdentifier = 'CD'
        )

Comparing – “OR” vs. “UNION ALL”
Following are the differences we can find easily.

1.       The “OR” version query has two operators (steam aggregation and merge join) whereas “UNION ALL” version has concatenation operator only.
2.       Cardinality estimate is inaccurate for “OR” version query.
3.       The sub-tree cost for “UNION ALL” query is way less than the “OR” version.
4.       Overall, the operational cost for “UNION ALL” query is 30% less than the other one.

Query Result:


Execution Plan:
 

"OR" Query Cost:


“UNION ALL” Query Cost:









Conclusion:
A query can be written in many different ways and it is a good idea to try different methods. There is a saying, “Query writing is an Art” and you are the Artist.

No comments:

Post a Comment