SQL Server – performance and other stories: I/O

Showing posts with label I/O. Show all posts

Wednesday, April 13, 2016

Various sources of I/O issues

To achieve a reasonably good performance from SQL Server implementations, careful planning of the I/O subsystem and applying all good practices is crucial. At the beginning of a database application life cycle, I/O issues are undetectable as the application runs against a smaller database. As databases grow, database server performance issue starts being noticed. Tackling these issues later down the road is definitely unproductive and a cumbersome process.

Latency Measurement:

If an application is designed optimally and the I/O subsystem is configured correctly, then Microsoft has a very good recommendation about the I/O latency. These recommendations are well accepted by all the industries experts and can be evaluated against OLTP or DSS implementation (Microsoft TechNet source). Please note that the acceptance level of I/O Latency would slightly vary based on some factors such as random, sequential, and I/O size (8K, 64K, 128K, etc.).

PerfMon Counter	Threshold Value	Data or Log	Type of workload
Average Disk/sec Read & Average Disk/sec Write	1ms - 5ms	Log
	4ms - 20ms	Data	OLTP
	30ms or less	Data	DSS

I/O measurement and using PerfMon:

There are several performance counters and different technique existing that can be used to measure I/O performance, latency and IOPS. Following are some widely used Windows PerfMon counters that are trustworthy.

Measuring of disk latency:

· Average Disk sec/Read

· Average Disk sec/Write

· Average Disk sec/Transfer

Measuring disk throughputs:

· Disk Read Bytes/sec

· Disk Write Bytes/sec

· Disk Bytes/sec

Measuring IOPS:

· Disk Reads/sec

· Disk Writes/sec

· Disk Transfers/sec

Measuring a I/O requests if it splits into multiple requests:

· Split IO/Sec

When is a disk overwhelmed? Usually when the disk throughput increases, latency also increases more or less. However, when the disk throughput remains almost the same but the latency increases as time passes, it results in disk saturation or I/O bottleneck.

Source of I/O Issues:

There are numerous reasons why a disk experiences bottleneck on a SQL Server. Following are some handful factors:

· Inadequate memory for the buffer pool.

· Index fragmentation.

· Outdated statistics.

· Improper or non-optimal fill factor.

· Not using data compression (enterprise edition only).

· No isolation of Index, Data and Log files.

· Structure of database schema such as indexes, row width, data types.

· Not using T-SQL performance based Set-Base technique.

· Using nested views.

· Excessive sort operation such as ORDER BY and GROUP BY.

· Using Implicit Transaction.

· Using lengthy Transaction.

· Excessive using of NOLOCK hints.

· Using CURSOR method.

· Lack of covering indexes.

· Using wider key for clustered index.

· Workload nature - READ oriented vs. WRITE oriented.

· Non optimal RAID configuration.

· Volume alignment (also known as sector alignment).

· NTFS Block size (also known as cluster size or Allocation Unit Size).

· Suboptimal drivers or firmware used on the host HBAs or storage array.

· Improper queue depth settings on HBAs.

· Incorrectly configured multipath software and/or fiber switches.

Using and detecting I/O Issues using my tool:

In my tool, “I/O Response (ms)” represents the overall “I/O Latency” on a SQL Server Instance. The calculation method includes all the drives where data files are placed. The T-SQL code which has been used to calculate the “I/O Response (ms)” in my tool has been extracted from the SSMS “Activity Monitor”. There may be a fraction of a millisecond calculation variation but it will provide you the most critical current I/O and workload state.

Figure#1: Overall I/O Response (ms)

Under the “Database” tab, there is a “Database I/O” tab which calculates I/O operations which has been derived from “sys.dm_io_virtual_file_stats”. This will provide you with a far more granular and drill-down information about I/O which are occurring on various data and log files. To use this feature, multiple clicks on the lightning bolt button are required to activate and to view the current I/O performance data.

Figure#2: I/O Operation on Data and Log files

Read Gurus Articles:

Are I/O latencies killing your performance?

http://www.sqlskills.com/blogs/paul/are-io-latencies-killing-your-performance/

SQL Server Best Practices Article

https://technet.microsoft.com/library/Cc966412

Wednesday, July 3, 2013

Self-join incurs more I/O activities and increases locking overhead!

The Self-Join technique is commonly used to create a hierarchical tree set or finding duplicate records. It is also used to find previous or next values from a table by utilizing the inline table technique. Generally a self-join is a join in which a table is joined with itself. For example, when we need details about an employee and his manager where both employee and manager ID resides in the same table.

When we use self-join, it usually indicates that the table is not normalized. This may not be a problem for a small table but for a large and busy OLTP table with higher concurrency, this may lead to poor performance and degrade query response time.

The key point is that each self-joined table will be treated as a separate table while retrieving data from the disk, thus self-join incurs more I/O activities and increases locking overhead as it touches the same table twice or more. By adding appropriate indexes on JOIN and WHERE clause columns, it will reduce I/O activities and improve performance. It will be a good approach to avoid self-join whenever possible for heavy duty OLTP tables and queries.

Let’s perform a simple self-join test to see how the execution plan looks like. We can observe that as the retrieval number of records increases, each query behaves differently, such as requiring specific index and introducing parallel query.

Queries:

-- self-join one

select top (10) a.xid, a.sIdentifier

from tbllarge a

inner join tbllarge b on a.xid = b.xid

where a.sIdentifier ='A1'

-- self-join two

select top (100) a.xid, a.sIdentifier

from tbllarge a

inner join tbllarge b on a.xid = b.xid

where a.sIdentifier ='A1'

-- self-join three

select top (1000) a.xid, a.sIdentifier

from tbllarge a

inner join tbllarge b on a.xid = b.xid

where a.sIdentifier ='A1'

Execution Plan:

Thursday, May 16, 2013

WRITELOG waittype - Implicit vs. Explicit Transaction: disk or coding issue?

In a OLTP system where concurrency is high observing WRITELOG waittype is normal for a short period of time during peak usage. Generally, a high value of this waittype indicates slow response of the application. So, observing this waittype for a longer period of time and repeatedly for DML operations such as MERGE, INSERT, UPDATE and DELETE, obviously this indicates two specific issues:

1. Disk I/O sub-system

2. Transaction handing

If the underlying disk sub-system is the best in the industry and configured correctly, investigation shows nothing suspicious around I/O and if we still see the WRITELOG waittype then it is obviously the result of poor or improper handling of Transactions in a SQL batch.

Figure #1: WRITELOG waittype in a live system

Let’s explain what WRITELOG waittype is. This is a log management system waiting for dirty data pages to write on the disk. This waittype indicates that the SPID is waiting for a transaction log I/O request to be completed. Thus if the number of WRITELOG waittype is high that means there are a number of requests in the queue which are processing slowly. Although, many experts and even MS KB article (KB822101) indicates it is a disk bottleneck, in reality this may not be the case unless your disk system is IDE or SATA based or configured incorrectly. We need to investigate one more area before confirming that the disk is an issue.

Implicit and explicit transaction:

As the WRITELOG waittype is related to the transaction log management system, let’s try to understand the difference between the two.

SQL Server handles two types of transaction mode:

(a) Implicit transaction

(b) Explicit transaction

Implicit transaction: In Implicit transaction mode, the instance of the SQL Server Database Engine automatically starts a new transaction after the current transaction is committed or rolled back for each DML statement. We do not need to define the start of a transaction; each statement level transaction will be committed or rolled back automatically.

Implicit transaction can be turned on or off per batch basis. We can use the “SET IMPLICIT_TRANSACTIONS ON/OFF” statement to turn implicit transaction mode off. We can use the “COMMIT TRANSACTION” or “ROLLBACK TRANSACTION” statements to end each transaction.

Explicit transaction: An explicit transaction is one in which we explicitly define both the starting and ending of the transaction. To start an explicit transaction, we can use “BEGIN TRANSACTION” and then complete the transaction by using “COMMIT TRANSACTION” or “ROLLBACK TRANSACTION”.

The main difference between Implicit and Explicit transaction is that implicit transaction is auto, controlled by SQL Server Engine and statement level transaction, whereas explicit transaction is user defined and batch-scoped transaction. Some key comparisons are follows:

Indicator	Implicit Transaction	Explicit Transaction
Scope	Statement level	Batch level
Log generation	continuous chain of transactions	One transaction log entry
Log I/O	For each statement gets one I/O	Entire batch needs one I/O
WRITELOG Waittype	Sustained period of time	N/A or unless there is real disk issue
Response/duration	Longer due to several I/O activities	Fast, less I/O activities
T-Log Restore	Long, Redo/undo phase	Short redo/undo phase
Impact on T-Log	Extremely high	Less and insignificant
Write Duration	Longer and slow	Shorter and fast

The application which is utilizing the Implicit Transaction mechanism actually makes the I/O system busy and degrades application response time dramatically. As a result, various disk related performance counters show significantly high values and we start believing that the disk is the bottleneck.

Observing the WRITLOG waittype:

In the following test we will review how implicit and explicit transaction incurs WRITELOG waittype. The given script can be executed in any version of SQL Server and with any kind of disk system to understand the behavior.

Complete T-SQL Code to produce the behavior:

/******************************************************

** Implicit vs Explicit Transaction

** Author: Sarjen Haque

** Apply to: SQL 2005 to 2012

******************************************************/

DBCC SQLPERF('sys.dm_os_wait_stats', CLEAR);

IF OBJECT_ID('tempdb..#wait') IS NOT NULL

DROP TABLE #wait

/*************************************

** Create a test database

*************************************/

USE [master]

IF EXISTS ( SELECT name

FROM sys.databases

WHERE name = N'TestDB' )

BEGIN

ALTER DATABASE [TestDB] SET SINGLE_USER WITH ROLLBACK IMMEDIATE

DROP DATABASE [TestDB]

END

CREATE DATABASE TestDB

USE TestDB

/*************************************

** Create a a table

*************************************/

SET NOCOUNT ON

IF OBJECT_ID('tblLarge') IS NOT NULL

DROP TABLE tblLarge

CREATE TABLE tblLarge

(

xID INT IDENTITY(1, 1) ,

sName1 VARCHAR(100) ,

sName2 VARCHAR(1000) ,

sName3 VARCHAR(400) ,

sIdentifier CHAR(100) ,

dDOB DATETIME NULL ,

nWage NUMERIC(20, 2) ,

sLicense VARCHAR(25)

)

/*************************************

** Perform a checkpoint

*************************************/

CHECKPOINT

/*************************************

** Collect initial wait stats

*************************************/

SELECT 'w1' WaitRun ,

wait_type ,

wait_time_ms / 1000.0 AS WaitSec ,

( wait_time_ms - signal_wait_time_ms ) / 1000.0 AS ResourceSec ,

signal_wait_time_ms / 1000.0 AS SignalSec ,

waiting_tasks_count AS WaitCount

INTO #Wait

FROM sys.dm_os_wait_stats

WHERE wait_type LIKE 'WRITELOG%'

/*************************************

** Using explicit Transaction

*************************************/

SET NOCOUNT ON

USE TestDB

DECLARE @n INT

SET @n=1

BEGIN TRANSACTION

WHILE @n <= 10000

BEGIN

INSERT INTO tblLarge

( sName1 ,

sName2 ,

sName3 ,

sIdentifier ,

dDOB ,

nWage ,

sLicense

)

VALUES ( LEFT(CAST(NEWID() AS VARCHAR(36)), RAND() * 50) , -- sName1

LEFT(CAST(NEWID() AS VARCHAR(36)), RAND() * 60) , -- sName2

LEFT(CAST(NEWID() AS VARCHAR(36)), RAND() * 70) , -- sName2

LEFT(CAST(NEWID() AS VARCHAR(36)), 2) , -- sIdentifier

DATEADD(dd, -RAND() * 20000, GETDATE()) , -- dDOB

( RAND() * 1000 ) , -- nWage

SUBSTRING(CAST(NEWID() AS VARCHAR(36)), 6, 7) -- sLicense

)

SET @n = @n + 1

END

COMMIT TRANSACTION

/***************************************

** Collect wait stats for Explicit TRN

***************************************/

INSERT INTO #wait

SELECT 'w2' WaitRun ,

wait_type ,

wait_time_ms / 1000.0 AS WaitSec ,

( wait_time_ms - signal_wait_time_ms ) / 1000.0 AS ResourceSec ,

signal_wait_time_ms / 1000.0 AS SignalSec ,

waiting_tasks_count AS WaitCount

FROM sys.dm_os_wait_stats

WHERE wait_type LIKE 'WRITELOG%'

/*****************************************

** Check for Log flush for Explicit TRN

*****************************************/

SELECT 'Explicit' AS Transaction_Type ,

SPID ,

Operation ,

COUNT(Operation) [OperationCount] ,

[Transaction Name] ,

COUNT([Transaction Name]) AS [TransactionNameCount]

FROM ::

fn_dblog(NULL, NULL)

WHERE operation LIKE '%LOP_BEGIN_XACT%'

AND [SPID] = @@spid

AND [SPID] IS NOT NULL

GROUP BY SPID ,

Operation ,

[Transaction Name]

/************************************

** Using implicit Transaction

************************************/

SET NOCOUNT ON

USE TestDB

DECLARE @n INT

SET @n=1

WHILE @n <= 10000

BEGIN

INSERT INTO tblLarge

( sName1 ,

sName2 ,

sName3 ,

sIdentifier ,

dDOB ,

nWage ,

sLicense

)

VALUES ( LEFT(CAST(NEWID() AS VARCHAR(36)), RAND() * 50) , -- sName1

LEFT(CAST(NEWID() AS VARCHAR(36)), RAND() * 60) , -- sName2

LEFT(CAST(NEWID() AS VARCHAR(36)), RAND() * 70) , -- sName2

LEFT(CAST(NEWID() AS VARCHAR(36)), 2) , -- sIdentifier

DATEADD(dd, -RAND() * 20000, GETDATE()) , -- dDOB

( RAND() * 1000 ) , -- nWage

SUBSTRING(CAST(NEWID() AS VARCHAR(36)), 6, 7) -- sLicense

)

SET @n = @n + 1

END

/***************************************

** Collect wait stats for Implicit TRN

***************************************/

INSERT INTO #wait

SELECT 'w3' WaitRun ,

wait_type ,

wait_time_ms / 1000.0 AS WaitSec ,

( wait_time_ms - signal_wait_time_ms ) / 1000.0 AS ResourceSec ,

signal_wait_time_ms / 1000.0 AS SignalSec ,

waiting_tasks_count AS WaitCount

FROM sys.dm_os_wait_stats

WHERE wait_type LIKE 'WRITELOG%'

/************************************

** Check for Log flush

************************************/

SELECT 'Implicit' AS Transaction_Type ,

SPID ,

Operation ,

COUNT(Operation) [OperationCount] ,

[Transaction Name] ,

COUNT([Transaction Name]) AS [TransactionNameCount]

FROM ::

fn_dblog(NULL, NULL)

WHERE operation LIKE '%LOP_BEGIN_XACT%'

AND [SPID] = @@spid

AND [SPID] IS NOT NULL

GROUP BY SPID ,

Operation ,

[Transaction Name]

/********************************************************

** Compare the waittype collection

********************************************************/

SELECT a.wait_type ,

'Explicit Transaction' AS Transaction_Type ,

( b.WaitSec - a.WaitSec ) AS TotalwaitSec ,

( b.ResourceSec - a.ResourceSec ) AS TotalResourceSec ,

( b.SignalSec - a.SignalSec ) AS TotalSignalSec ,

( b.WaitCount - a.WaitCount ) AS TotalWaitCount

FROM ( SELECT *

FROM #wait

WHERE waitrun = 'w1'

) a

JOIN ( SELECT *

FROM #wait

WHERE waitrun = 'w2'

) b ON a.wait_type = b.wait_type

UNION ALL

SELECT a.wait_type ,

'Implicit Transaction' AS Transaction_Type ,

( b.WaitSec - a.WaitSec ) AS TotalwaitSec ,

( b.ResourceSec - a.ResourceSec ) AS TotalResourceSec ,

( b.SignalSec - a.SignalSec ) AS TotalSignalSec ,

( b.WaitCount - a.WaitCount ) AS TotalWaitCount

FROM ( SELECT *

FROM #wait

WHERE waitrun = 'w2'

) a

JOIN ( SELECT *

FROM #wait

WHERE waitrun = 'w3'

) b ON a.wait_type = b.wait_type

Figure #2: implicit vs. explicit

Figure #3: implicit vs. explicit

Analysis:

The analysis is simple and easy to understand from the output. From the above pictures we can see that the implicit transaction is always slow, signal wait is high and requires more I/O activities to complete a task.

Summary:

We saw that WRITELOG waittype does not necessarily indicate a disk throughput problem. It is the SQL Server Transaction Log management system who is doing more work to write each transaction into the log file. So before concluding, I would recommend to review your application code and see if you can use explicit transactions. I am sure that even though you use an Enterprise Class SSD system, implicit transaction may not improve the application performance dramatically.

Total Pageviews

Wednesday, April 13, 2016

Various sources of I/O issues

Wednesday, July 3, 2013

Self-join incurs more I/O activities and increases locking overhead!

Thursday, May 16, 2013

WRITELOG waittype - Implicit vs. Explicit Transaction: disk or coding issue?