Total Pageviews

Showing posts with label I/O. Show all posts
Showing posts with label I/O. Show all posts

Wednesday, April 13, 2016

Various sources of I/O issues

To achieve a reasonably good performance from SQL Server implementations, careful planning of the I/O subsystem and applying all good practices is crucial. At the beginning of a database application life cycle, I/O issues are undetectable as the application runs against a smaller database. As databases grow, database server performance issue starts being noticed. Tackling these issues later down the road is definitely unproductive and a cumbersome process.

Latency Measurement:
If an application is designed optimally and the I/O subsystem is configured correctly, then Microsoft has a very good recommendation about the I/O latency. These recommendations are well accepted by all the industries experts and can be evaluated against OLTP or DSS implementation (Microsoft TechNet source). Please note that the acceptance level of I/O Latency would slightly vary based on some factors such as random, sequential, and I/O size (8K, 64K, 128K, etc.).

PerfMon Counter
Threshold Value
Data or Log
Type of workload
Average Disk/sec Read & Average Disk/sec Write
1ms - 5ms
Log

4ms - 20ms
Data
OLTP
30ms or less
Data
DSS

I/O measurement and using PerfMon:
There are several performance counters and different technique existing that can be used to measure I/O performance, latency and IOPS. Following are some widely used Windows PerfMon counters that are trustworthy.

Measuring of disk latency:
·         Average Disk sec/Read
·         Average Disk sec/Write
·         Average Disk sec/Transfer

Measuring disk throughputs:
·         Disk Read Bytes/sec
·         Disk Write Bytes/sec
·         Disk Bytes/sec

Measuring IOPS:
·         Disk Reads/sec
·         Disk Writes/sec
·         Disk Transfers/sec

Measuring a I/O requests if it splits into multiple requests:
·         Split IO/Sec

When is a disk overwhelmed? Usually when the disk throughput increases, latency also increases more or less. However, when the disk throughput remains almost the same but the latency increases as time passes, it results in disk saturation or I/O bottleneck.

Source of I/O Issues:
There are numerous reasons why a disk experiences bottleneck on a SQL Server. Following are some handful factors:

·         Inadequate memory for the buffer pool.
·         Index fragmentation.
·         Outdated statistics.
·         Improper or non-optimal fill factor.
·         Not using data compression (enterprise edition only).
·         No isolation of Index, Data and Log files.
·         Structure of database schema such as indexes, row width, data types.
·         Not using T-SQL performance based Set-Base technique.
·         Using nested views.
·         Excessive sort operation such as ORDER BY and GROUP BY.
·         Using Implicit Transaction.
·         Using lengthy Transaction.
·         Excessive using of NOLOCK hints.
·         Using CURSOR method.
·         Lack of covering indexes.
·         Using wider key for clustered index.
·         Workload nature - READ oriented vs. WRITE oriented.
·         Non optimal RAID configuration.
·         Volume alignment (also known as sector alignment).
·         NTFS Block size (also known as cluster size or Allocation Unit Size).
·         Suboptimal drivers or firmware used on the host HBAs or storage array.
·         Improper queue depth settings on HBAs.
·         Incorrectly configured multipath software and/or fiber switches.

Using and detecting I/O Issues using my tool:
In my tool, “I/O Response (ms)” represents the overall “I/O Latency” on a SQL Server Instance. The calculation method includes all the drives where data files are placed. The T-SQL code which has been used to calculate the “I/O Response (ms)” in my tool has been extracted from the SSMS “Activity Monitor”. There may be a fraction of a millisecond calculation variation but it will provide you the most critical current I/O and workload state.

Figure#1: Overall I/O Response (ms)

Under the “Database” tab, there is a “Database I/O” tab which calculates I/O operations which has been derived from “sys.dm_io_virtual_file_stats”. This will provide you with a far more granular and drill-down information about I/O which are occurring on various data and log files. To use this feature, multiple clicks on the lightning bolt button are required to activate and to view the current I/O performance data.

Figure#2: I/O Operation on Data and Log files

Read Gurus Articles:
Are I/O latencies killing your performance?

SQL Server Best Practices Article

Wednesday, July 3, 2013

Self-join incurs more I/O activities and increases locking overhead!

The Self-Join technique is commonly used to create a hierarchical tree set or finding duplicate records. It is also used to find previous or next values from a table by utilizing the inline table technique. Generally a self-join is a join in which a table is joined with itself. For example, when we need details about an employee and his manager where both employee and manager ID resides in the same table.

When we use self-join, it usually indicates that the table is not normalized. This may not be a problem for a small table but for a large and busy OLTP table with higher concurrency, this may lead to poor performance and degrade query response time.

The key point is that each self-joined table will be treated as a separate table while retrieving data from the disk, thus self-join incurs more I/O activities and increases locking overhead as it touches the same table twice or more. By adding appropriate indexes on JOIN and WHERE clause columns, it will reduce I/O activities and improve performance. It will be a good approach to avoid self-join whenever possible for heavy duty OLTP tables and queries.

Let’s perform a simple self-join test to see how the execution plan looks like. We can observe that as the retrieval number of records increases, each query behaves differently, such as requiring specific index and introducing parallel query.

Queries:
-- self-join one
select top (10)  a.xid, a.sIdentifier
from    tbllarge a
        inner join tbllarge b on a.xid = b.xid
where   a.sIdentifier ='A1'

-- self-join two
select top (100)  a.xid, a.sIdentifier
from    tbllarge a
        inner join tbllarge b on a.xid = b.xid
where   a.sIdentifier ='A1'

-- self-join three
select top (1000)  a.xid, a.sIdentifier
from    tbllarge a
        inner join tbllarge b on a.xid = b.xid
where   a.sIdentifier ='A1'

Execution Plan:

Thursday, May 16, 2013

WRITELOG waittype - Implicit vs. Explicit Transaction: disk or coding issue?

In a OLTP system where concurrency is high observing WRITELOG waittype is normal for a short period of time during peak usage. Generally, a high value of this waittype indicates slow response of the application. So, observing this waittype for a longer period of time and repeatedly for DML operations such as MERGE, INSERT, UPDATE and DELETE, obviously this indicates two specific issues:

1.      Disk I/O sub-system
2.      Transaction handing

If the underlying disk sub-system is the best in the industry and configured correctly, investigation shows nothing suspicious around I/O and if we still see the WRITELOG waittype then it is obviously the result of poor or improper handling of Transactions in a SQL batch.

Figure #1: WRITELOG waittype in a live system
Let’s explain what WRITELOG waittype is. This is a log management system waiting for dirty data pages to write on the disk. This waittype indicates that the SPID is waiting for a transaction log I/O request to be completed.  Thus if the number of WRITELOG waittype is high that means there are a number of requests in the queue which are processing slowly. Although, many experts and even MS KB article (KB822101) indicates it is a disk bottleneck, in reality this may not be the case unless your disk system is IDE or SATA based or configured incorrectly. We need to investigate one more area before confirming that the disk is an issue.

Implicit and explicit transaction:
As the WRITELOG waittype is related to the transaction log management system, let’s try to understand the difference between the two.

SQL Server handles two types of transaction mode:
(a)   Implicit transaction
(b)   Explicit transaction

Implicit transaction: In Implicit transaction mode, the instance of the SQL Server Database Engine automatically starts a new transaction after the current transaction is committed or rolled back for each DML statement. We do not need to define the start of a transaction; each statement level transaction will be committed or rolled back automatically.

Implicit transaction can be turned on or off per batch basis. We can use the “SET IMPLICIT_TRANSACTIONS ON/OFF” statement to turn implicit transaction mode off. We can use the “COMMIT TRANSACTION” or “ROLLBACK TRANSACTION” statements to end each transaction.

Explicit transaction: An explicit transaction is one in which we explicitly define both the starting and ending of the transaction. To start an explicit transaction, we can use “BEGIN TRANSACTION” and then complete the transaction by using “COMMIT TRANSACTION” or “ROLLBACK TRANSACTION”.

The main difference between Implicit and Explicit transaction is that implicit transaction is auto, controlled by SQL Server Engine and statement level transaction, whereas explicit transaction is user defined and batch-scoped transaction. Some key comparisons are follows:

Indicator
Implicit Transaction
Explicit Transaction
Scope
Statement level
Batch level
Log generation
continuous chain of transactions
One transaction log entry
Log I/O
For each statement gets one I/O
Entire batch needs one I/O
WRITELOG Waittype
Sustained period of time
N/A or unless there is real disk issue
Response/duration
Longer due to several I/O activities
Fast, less I/O activities
T-Log Restore
Long, Redo/undo phase
Short redo/undo phase
Impact on T-Log
Extremely high
Less and insignificant
Write Duration
Longer and slow
Shorter and fast


The application which is utilizing the Implicit Transaction mechanism actually makes the I/O system busy and degrades application response time dramatically. As a result, various disk related performance counters show significantly high values and we start believing that the disk is the bottleneck.

Observing the WRITLOG waittype:
In the following test we will review how implicit and explicit transaction incurs WRITELOG waittype. The given script can be executed in any version of SQL Server and with any kind of disk system to understand the behavior.

Complete T-SQL Code to produce the behavior:
/******************************************************
** Implicit vs Explicit Transaction
** Author: Sarjen Haque
** Apply to: SQL 2005 to 2012
******************************************************/
DBCC SQLPERF('sys.dm_os_wait_stats', CLEAR);

IF OBJECT_ID('tempdb..#wait') IS NOT NULL
   DROP TABLE #wait
GO

/*************************************
** Create a test database
*************************************/
USE [master]
GO
IF EXISTS ( SELECT  name
            FROM    sys.databases
            WHERE   name = N'TestDB' )
   BEGIN
         ALTER DATABASE [TestDB] SET  SINGLE_USER WITH ROLLBACK IMMEDIATE
         DROP DATABASE [TestDB]
   END
GO

CREATE DATABASE TestDB
GO

USE TestDB
GO
/*************************************
** Create a a table
*************************************/
SET NOCOUNT ON
IF OBJECT_ID('tblLarge') IS NOT NULL
   DROP TABLE tblLarge
GO

CREATE TABLE tblLarge
       (
         xID INT IDENTITY(1, 1) ,
         sName1 VARCHAR(100) ,
         sName2 VARCHAR(1000) ,
         sName3 VARCHAR(400) ,
         sIdentifier CHAR(100) ,
         dDOB DATETIME NULL ,
         nWage NUMERIC(20, 2) ,
         sLicense VARCHAR(25)
       )
GO
/*************************************
** Perform a checkpoint
*************************************/
CHECKPOINT


/*************************************
** Collect initial wait stats
*************************************/
SELECT  'w1' WaitRun ,
        wait_type ,
        wait_time_ms / 1000.0 AS WaitSec ,
        ( wait_time_ms - signal_wait_time_ms ) / 1000.0 AS ResourceSec ,
        signal_wait_time_ms / 1000.0 AS SignalSec ,
        waiting_tasks_count AS WaitCount
INTO    #Wait
FROM    sys.dm_os_wait_stats
WHERE   wait_type LIKE 'WRITELOG%'

GO
/*************************************
** Using explicit Transaction
*************************************/
SET NOCOUNT ON
USE TestDB
GO

DECLARE @n INT
SET @n=1

BEGIN TRANSACTION
WHILE @n <= 10000
      BEGIN
            INSERT  INTO tblLarge
                    ( sName1 ,
                      sName2 ,
                      sName3 ,
                      sIdentifier ,
                      dDOB ,
                      nWage ,
                      sLicense
                    )
            VALUES  ( LEFT(CAST(NEWID() AS VARCHAR(36)), RAND() * 50) ,    -- sName1
                      LEFT(CAST(NEWID() AS VARCHAR(36)), RAND() * 60) ,    -- sName2
                      LEFT(CAST(NEWID() AS VARCHAR(36)), RAND() * 70) ,    -- sName2
                      LEFT(CAST(NEWID() AS VARCHAR(36)), 2) ,              -- sIdentifier    
                      DATEADD(dd, -RAND() * 20000, GETDATE()) ,            -- dDOB
                      ( RAND() * 1000 ) ,                                  -- nWage
                      SUBSTRING(CAST(NEWID() AS VARCHAR(36)), 6, 7)        -- sLicense       
                    )
            SET @n = @n + 1
      END
COMMIT TRANSACTION

GO
/***************************************
** Collect wait stats for Explicit TRN
***************************************/
INSERT  INTO #wait
        SELECT  'w2' WaitRun ,
                wait_type ,
                wait_time_ms / 1000.0 AS WaitSec ,
                ( wait_time_ms - signal_wait_time_ms ) / 1000.0 AS ResourceSec ,
                signal_wait_time_ms / 1000.0 AS SignalSec ,
                waiting_tasks_count AS WaitCount
        FROM    sys.dm_os_wait_stats
        WHERE   wait_type LIKE 'WRITELOG%'
GO
/*****************************************
** Check for Log flush for Explicit TRN
*****************************************/
SELECT  'Explicit' AS Transaction_Type ,
        SPID ,
        Operation ,
        COUNT(Operation) [OperationCount] ,
        [Transaction Name] ,
        COUNT([Transaction Name]) AS [TransactionNameCount]
FROM    ::
        fn_dblog(NULL, NULL)
WHERE   operation LIKE '%LOP_BEGIN_XACT%'
        AND [SPID] = @@spid
        AND [SPID] IS NOT NULL
GROUP BY SPID ,
        Operation ,
        [Transaction Name]
GO

/************************************
** Using implicit Transaction
************************************/
SET NOCOUNT ON
USE TestDB
GO

DECLARE @n INT
SET @n=1

WHILE @n <= 10000
      BEGIN
            INSERT  INTO tblLarge
                    ( sName1 ,
                      sName2 ,
                      sName3 ,
                      sIdentifier ,
                      dDOB ,
                      nWage ,
                      sLicense
                    )
            VALUES  ( LEFT(CAST(NEWID() AS VARCHAR(36)), RAND() * 50) ,    -- sName1
                      LEFT(CAST(NEWID() AS VARCHAR(36)), RAND() * 60) ,    -- sName2
                      LEFT(CAST(NEWID() AS VARCHAR(36)), RAND() * 70) ,    -- sName2
                      LEFT(CAST(NEWID() AS VARCHAR(36)), 2) ,              -- sIdentifier    
                      DATEADD(dd, -RAND() * 20000, GETDATE()) ,            -- dDOB
                      ( RAND() * 1000 ) ,                                  -- nWage
                      SUBSTRING(CAST(NEWID() AS VARCHAR(36)), 6, 7)        -- sLicense       
                    )
            SET @n = @n + 1
      END

GO
/***************************************
** Collect wait stats for Implicit TRN
***************************************/
INSERT  INTO #wait
        SELECT  'w3' WaitRun ,
                wait_type ,
                wait_time_ms / 1000.0 AS WaitSec ,
                ( wait_time_ms - signal_wait_time_ms ) / 1000.0 AS ResourceSec ,
                signal_wait_time_ms / 1000.0 AS SignalSec ,
                waiting_tasks_count AS WaitCount
        FROM    sys.dm_os_wait_stats
        WHERE   wait_type LIKE 'WRITELOG%'
GO

/************************************
** Check for Log flush
************************************/
SELECT  'Implicit' AS Transaction_Type ,
        SPID ,
        Operation ,
        COUNT(Operation) [OperationCount] ,
        [Transaction Name] ,
        COUNT([Transaction Name]) AS [TransactionNameCount]
FROM    ::
        fn_dblog(NULL, NULL)
WHERE   operation LIKE '%LOP_BEGIN_XACT%'
        AND [SPID] = @@spid
        AND [SPID] IS NOT NULL
GROUP BY SPID ,
        Operation ,
        [Transaction Name]
     
GO

/********************************************************
** Compare the waittype collection
********************************************************/
SELECT  a.wait_type ,
        'Explicit Transaction' AS Transaction_Type ,
        ( b.WaitSec - a.WaitSec ) AS TotalwaitSec ,
        ( b.ResourceSec - a.ResourceSec ) AS TotalResourceSec ,
        ( b.SignalSec - a.SignalSec ) AS TotalSignalSec ,
        ( b.WaitCount - a.WaitCount ) AS TotalWaitCount
FROM    ( SELECT    *
          FROM      #wait
          WHERE     waitrun = 'w1'
        ) a
        JOIN ( SELECT   *
               FROM     #wait
               WHERE    waitrun = 'w2'
             ) b ON a.wait_type = b.wait_type
UNION ALL
SELECT  a.wait_type ,
        'Implicit Transaction' AS Transaction_Type ,
        ( b.WaitSec - a.WaitSec ) AS TotalwaitSec ,
        ( b.ResourceSec - a.ResourceSec ) AS TotalResourceSec ,
        ( b.SignalSec - a.SignalSec ) AS TotalSignalSec ,
        ( b.WaitCount - a.WaitCount ) AS TotalWaitCount
FROM    ( SELECT    *
          FROM      #wait
          WHERE     waitrun = 'w2'
        ) a
        JOIN ( SELECT   *
               FROM     #wait
               WHERE    waitrun = 'w3'
             ) b ON a.wait_type = b.wait_type

GO

Figure #2: implicit vs. explicit

Figure #2: implicit vs. explicit

Figure #3: implicit vs. explicit 

 
Analysis:
The analysis is simple and easy to understand from the output. From the above pictures we can see that the implicit transaction is always slow, signal wait is high and requires more I/O activities to complete a task.

Summary:
We saw that WRITELOG waittype does not necessarily indicate a disk throughput problem. It is the SQL Server Transaction Log management system who is doing more work to write each transaction into the log file. So before concluding, I would recommend to review your application code and see if you can use explicit transactions. I am sure that even though you use an Enterprise Class SSD system, implicit transaction may not improve the application performance dramatically.