SQL Server – performance and other stories: Performance issues from ORDER BY/GROUP BY

It is very common and expected to see a query containing ORDER BY or GROUP BY clause for displaying or grouping purposes. It is also common that developers use ORDER BY clause from a habit without considering its necessity. As a result, queries become slower overtime as the number of records increases.

The Last Supper - by Leonardo da Vinci
Grouping example in Art

When a sort operation unable to acquire sufficient memory grant and it cannot be done in the memory and must happen in the tempdb. The heavier processing load inside the tempdb degrades overall SQL Server performance significantly. This situation is usually known as “spill to tempdb” or “spills in tempdb”. It is crucial to identify those sort warnings and avoid them whenever possible.

In my experience, I have seen ORDER BY/GROUP BY being used on a VARCHAR (8000) column while retrieving data; even unwisely used on a JOIN clause! Tweaking these queries is a bit tricky and most of the time it is impossible since the front-end application or business logic has already been built-in on this criterion. Creating an index on this column is not possible due to the 900 bytes restriction on an index key column. So, other than crossing fingers, there is nothing much to do to resolve the performance issue immediately.

Common Issues:

Following are some common issues that occur due to the misuse of ORDER BY/GROUP BY clause:

1. Rapid tempdb data file growth.

2. Increases disk I/O activities on tempdb and tempdb drive.

3. Introduces lock contention and escalation.

4. Increases memory grant for sort/hash operation.

5. Introduces parallel query plan.

Detecting the Issue:

Detecting performance issues that arise from sort operation is quite simple and straight forward. Following are some tips to identify issues:

1. Review the query and identify columns that are used on ORDER/GROUP clauses.

2. Review the Execution plan and identify “sort” operators.

3. Identify parallelism operators that perform the “distribute streams”, ”gather streams” and “repartition streams” in parallel execution plan.

4. Use SQL Profiler Trace event “sort warnings”.

5. Extended Event – “sort_warning”

6. Use PerfMom or sys.dm_os_performance to track “worktables created/sec” and “workfiles created/sec”

To resolve the performance Issue:

To resolve performance issues that occur from a sort operation, a couple of actions can be taken as follows:

1. Review the necessity of a sort operation in the query.

2. Try to perform a sort operation in the front-end.

3. Normalize the database schema.

4. Create single or multi-column indexes.

5. Apply filters on indexes.

6. Use TOP (n) when there is an “ORDER BY”, if possible.

7. Put more filters in the query to touch less data.

8. Update distribution statistics.

Observing the behavior:

To observe the common issues with ORDER BY/GROUP BY operations, let’s create a database, table and a simple select statement against 500,000 records.

CREATE DATABASE testDB

USE testDB

SET NOCOUNT ON

IF OBJECT_ID('tblLarge') IS NOT NULL

DROP TABLE tblLarge

CREATE TABLE tblLarge

(

xID INT IDENTITY(1, 1) ,

sName1 VARCHAR(100) ,

sName2 VARCHAR(1000) ,

sName3 VARCHAR(400) ,

sIdentifier CHAR(100) ,

dDOB DATETIME NULL ,

nWage NUMERIC(20, 2) ,

sLicense VARCHAR(25)

)

/*********************************

Add 500000 records

**********************************/

SET NOCOUNT ON

INSERT INTO tblLarge

( sName1 ,

sName2 ,

sName3 ,

sIdentifier ,

dDOB ,

nWage ,

sLicense

)

VALUES ( LEFT(CAST(NEWID() AS VARCHAR(36)), RAND() * 50) , -- sName1

LEFT(CAST(NEWID() AS VARCHAR(36)), RAND() * 60) , -- sName2

LEFT(CAST(NEWID() AS VARCHAR(36)), RAND() * 70) , -- sName2

LEFT(CAST(NEWID() AS VARCHAR(36)), 2) , -- sIdentifier

DATEADD(dd, -RAND() * 20000, GETDATE()) , -- dDOB

( RAND() * 1000 ) , -- nWage

SUBSTRING(CAST(NEWID() AS VARCHAR(36)), 6, 7) -- sLicense

)

GO 500000

/******************************************************

** Create a clustered index

******************************************************/

ALTER TABLE [tblLarge]

ADD CONSTRAINT [PK_tblLarge]

PRIMARY KEY CLUSTERED ([xID] ASC)

/***************************************************************

** To resolve the sort warning, create a non-clustered index

***************************************************************/

CREATE NONCLUSTERED INDEX [IX_sName1]

ON [tblLarge] ([sName1] ASC)

Simple SELECT Statement:

Following are some simple select statements to reproduce the behavior.

/******************************************************

** Simple select statement

******************************************************/

--First query

SELECT xID ,

sName1

FROM tblLarge

-- Second query with - ORDER BY

SELECT xID ,

sName1

FROM tblLarge

ORDER BY sName1

-- Third query - GROUP BY/ORDER BY

SELECT sName1 ,

COUNT(sName1) AS nCount

FROM tblLarge a

GROUP BY sName1

ORDER BY sName1

Using Extended Events (SQL 2012), SQL Profiler Trace and Execution Plan, “sort warning” are easily detectable and following are some outputs.

Figure#1: sort warning using Extended Events in SQL 2012

Figure#2A: sort warning detection using Execution Plan

Figure#2B: sort warning detection using Execution Plan

Figure#2: sort warning detection using SQL Profiler Trace

Learn More:

Understanding data vs. log usage for spills in tempdb

http://www.sqlskills.com/blogs/paul/understanding-data-vs-log-usage-for-spills-in-tempdb/

Query Memory Spills

http://www.sqlpassion.at/archive/2011/10/19/query-memory-spills/

Identifying and Solving Sort Warnings Problems in SQL Server

http://www.solidq.com/sqj/Pages/Relational/Identifying-and-Solving-Sort-Warnings-Problems-in-SQL-Server.aspx

SQL Server – performance and other stories

Total Pageviews

Thursday, June 13, 2013

Performance issues from ORDER BY/GROUP BY - spills in tempdb

3 comments: