A lot of us often see the following type of
sporadic messages although everything is running as usual. After the
application or user receives the network related messages, there will be some
trouble occurring with the system however it will behave normally again after
each event.
Some logged messages can be found in SQL
Server Error Log and Windows Event logs, some can be popped up within the
application and others can be detected by querying the SQL Server Extended
Event.
Some observed Error Messages:
·
ERROR [08S01]
[Microsoft][SQL NativeClient]Communication link failure
·
System.Data.SqlClient.SqlException:
A transport-level error has occurred when sending the request to the server.
(provider: TCP Provider, error: 0 - An existing connection was forcibly closed
by the remote host.)
·
Network error 0x40
occurred while sending data to the client on process ID XX batch ID 0. A common
cause for this error is if the client disconnected without reading the entire
response from the server. This connection will be terminated.
·
A transport-level
error has occurred when receiving results from the server. (provider: TCP
Provider, error: 0 - The semaphore timeout period has expired.)
·
A fatal error
occurred while reading the input stream from the network. The session will be
terminated (input error: XXX, output error: XXX).
·
Network error code
0x2746 occurred while establishing a connection; the connection has been
closed.
Possible mystery behind Network
Latency:
There could be a number of reasons why this
happens. Following are some of the potential areas where network latency and
overheads can occur sporadically or continuously.
A.
Implicit
Transaction: Implicit transaction mode generates a
continuous chain of transactions which causes a delay to commit or rollback
each transaction. This significantly degrades the performance and reduces
concurrency in a high throughput OLTP system.
B.
Network
bandwidth bound application: Database
Mirroring, Always ON and Transitional Replication are usually Network bound
application processes. Reading and writing large amounts of data or documenting
continuously may increase latency and degrade network performance.
C.
TCP
Chimney offloads: “TCP
Chimney Offload transfers Transmission Control Protocol (TCP) traffic
processing, such as packet segmentation and reassembly processing tasks, from a
computer’s CPU to a network adapter that supports TCP Chimney Offload. Moving
TCP/IP processing from the CPU to the network adapter can free the CPU to
perform more application-level functions. TCP Chimney Offload can offload the
processing for both TCP/IPv4 and TCP/IPv6 connections if supported by the
network adapter”. Thus if this setting is incorrect on the both server OS and
NIC level, then performance issues are guaranteed.
D.
NIC
Teaming: "NIC teaming makes two or more physical
NICs appear as a single virtual one to the application, which isolates
application from failures of an individual NIC. There are two types of teaming:
1. Switch
independent (the switch does not know the NICs are teamed) and
2. Switch
dependent (the network switch participates in the teaming).
NIC teaming can
also result in bandwidth aggregation such that, for example, four 1GB/Sec NICs
can provide an aggregate 4GB/Sec throughput. In the event of problems, the
teaming has to be disabled in order to verify it is not causing the
problem."
E.
Jumbo
Frames: Jumbo frames are only available on gigabit
networks, and all devices in the network path must support them (switches,
routers, NICs, and so on). If all the networking hardware doesn't support
end-to-end Jumbo Frames they should not be used and should be disabled.
"Jumbo
frames" extends Ethernet to 9000 bytes and is large enough to carry an 8
KB application datagram (e.g. NFS) plus packet header overhead. If Jumbo
frames is supported on a Gigabit Network then “Network Packet Size”
configuration in SQL Server can be increased to 8192 in a high throughput
environment.
To check whether
the jumbo frames is supported by the target machine, execute following command:
ping
<IP or server name> -f –l 9000
Figure: Jumbo frames
support check
F.
Maximum
Transmission Unit (MTU): The
Maximum Transmission Unit (MTU) feature is an advanced configuration that
allows determining the largest data size permitted on a network connection.
Generally, if the MTU is too large for the connection, computer will experience
packet loss or dropping connection.
To determine the
correct MTU size for a network, we have to do a specific ping test on the
destination (target server).For Windows environment, use the following command
for the ping test:
Syntax: ping [server
or IP address] –f –l xxxx
Example: ping
192.168.0.50 –f –l 1472
This test can
start from 1472 until we can reach the exact packet size which will not result
to the prompt “Packet needs to be fragmented but DF set.” You may adjust
(decrease/increase) the value by 10.
Implicit Transaction
and TCP Chimney OffLoads:
The DML centric application which mostly
depends on implicit transactions of SQL Server generates a lot of LOG I/O which
initiates further delay on log writing process and introduces network
latency. Turning of TCP Chimney offloads is one of the “first things
to do” that can be considered in order to eliminate any possibilities in this
area.
“TCP Chimney
Offloads” functionality does not work with virtualized Windows OS (vmware and
hyper-v), therefore it can be disabled permanently.
This is one of the potential overlooked areas by many sysadmin, which needs further investigation if the received error message looks similar to the issues which have been mentioned before in the article. Although it is not the best practice, it is still recommended by many Network Experts that the “TCP Chimney Offloads” setting must be disabled in both Windows OS and NIC level to reduce latency in SQL Server implementation whether it is virtual or physical.
Windows OS and TCP Chimney offloads:
In Windows 2003 - TCP Chimney Offload is enabled by default.
In Windows 2008 - TCP Chimney Offload is disabled by default.
In Windows 2008 R2 - TCP Chimney Offload
is automatic by
default.
In Windows 2012 -TCP Chimney Offload is disabled by default.
How to check TCP
Chimney Offloads:
·
In Windows 2003 use the following
command:
netsh
interface ip show offload
·
In Windows 2008 and above use the
following command:
netsh
int tcp show global
Figure: Screenshot
from Windows 2008 R2
Screenshot from
Windows 2012
XE Events
(Extended Events) Query:
/***************************************************************
**
Errors reported along with the error counts
****************************************************************/
SET NOCOUNT ON
--
Store the XML data in a temporary table
SELECT CAST(xet.target_data AS XML) AS XMLDATA
INTO #xeTmpTbl
FROM sys.dm_xe_session_targets xet
JOIN sys.dm_xe_sessions xe ON ( xe.address = xet.event_session_address )
WHERE xe.name = 'system_health'
--
Get statistical information about all the errors reported
;WITH myXEinfo ( EventXML )
AS ( SELECT C.query('.') EventXML
FROM #xeTmpTbl a
CROSS APPLY a.XMLDATA.nodes('/RingBufferTarget/event') AS T ( C )
),
myXEErrorInfo ( EventTime, ErrorNum )
AS ( SELECT EventXML.value('(/event/@timestamp)[1]', 'datetime') AS EventTime ,
EventXML.value('(/event/data/value)[1]', 'int') AS ErrorNum
FROM myXEinfo
WHERE EventXML.value('(/event/@name)[1]', 'varchar(255)') = 'error_reported'
)
SELECT ErrorNum ,
MAX(EventTime) AS LastRecordedEvent ,
MIN(EventTime) AS FirstRecordedEvent ,
COUNT(*) AS Occurrences ,
b.[text] AS ErrDescription
FROM myXEErrorInfo a
INNER JOIN sys.messages b ON a.ErrorNum = b.message_id
WHERE b.language_id = SERVERPROPERTY('LCID')
GROUP BY a.ErrorNum ,
b.[text]
-- Get
information about each of the errors reported
;
WITH myXEinfo ( EventXML )
AS ( SELECT C.query('.') EventXML
FROM #xeTmpTbl a
CROSS APPLY a.XMLDATA.nodes('/RingBufferTarget/event') AS T ( C )
WHERE C.query('.').value('(/event/@name)[1]', 'varchar(255)') = 'error_reported'
)
SELECT EventXML.value('(/event/@timestamp)[1]', 'datetime') AS EventTime ,
EventXML.value('(/event/data/value)[1]', 'int') AS ErrNum ,
EventXML.value('(/event/data/value)[2]', 'int') AS ErrSeverity ,
EventXML.value('(/event/data/value)[3]', 'int') AS ErrState ,
EventXML.value('(/event/data/value)[5]', 'varchar(max)') AS ErrText
--EventXML.value('(/event/action/value)[2]', 'varchar(10)')
AS Session_ID
FROM myXEinfo
ORDER BY EventTime DESC
-- Drop
the temporary table
DROP TABLE #xeTmpTbl
/**************************************************************************
**
Extract Ring Buffer Information for SQL Server 2008 instances and above
**************************************************************************/
SELECT CONVERT (VARCHAR(30), GETDATE(), 121) AS run_time ,
DATEADD(ms, ( a.[record_time] - sys.ms_ticks ), GETDATE()) AS [notification_time] ,
a.* ,
sys.ms_ticks AS [current_time]
FROM ( SELECT x.value('(//Record/Error/ErrorCode)[1]', 'varchar(30)') AS [error_code] ,
x.value('(//Record/Error/CallingAPIName)[1]', 'varchar(255)') AS [calling_API_name] ,
x.value('(//Record/Error/APIName)[1]', 'varchar(255)') AS [API_name] ,
x.value('(//Record/Error/SPID)[1]', 'int') AS [SPID] ,
x.value('(//Record/@id)[1]', 'bigint') AS [record_id] ,
x.value('(//Record/@type)[1]', 'varchar(30)') AS [type] ,
x.value('(//Record/@time)[1]', 'bigint') AS [record_time]
FROM ( SELECT CAST (record AS XML)
FROM sys.dm_os_ring_buffers
WHERE ring_buffer_type = 'RING_BUFFER_SECURITY_ERROR'
) AS R ( x )
) a
CROSS JOIN sys.dm_os_sys_info sys
ORDER BY a.[record_time] ASC
Reference:
·
Using
TCP Chimney Offload
·
TCP Offloading again?!
·
Network
Changes Affect Windows Server 2012
·
Information about the TCP
Chimney Offload, Receive Side Scaling, and Network Direct Memory Access
features in Windows Server 2008
·
Error
message when an application connects to SQL Server on a server that is running
Windows Server 2003: "General Network error," "Communication
link failure," or "A transport-level error"
·
How
to Disable TCP Chimney, TCPIP Offload Engine (TOE) or TCP Segmentation Offload
(TSO).
·
TCP Chimney Offload –
Possible Performance and Concurrency Impacts to SQL Server Workloads
This is a terrific bit of information to know about. When sysadmins are not in the same shoes as SQL DBA's, configuration items like this can unnoticed during performance troubleshooting. Many thanks!
ReplyDeleteJeff Bennett
SQL DBA, Missouri, US
very good article, nice work.
ReplyDeleteIt is extremely interesting for me to read the article. Thank you for it. I like such topics and everything connected to them. I would like to read a bit more on that blog soon.
ReplyDeleteUltra-low latency MAC
The two most important protocols are TCP offload (Transmission Control Protocol) and IP (Internet Protocol), which are what give the whole name.
ReplyDeleteTCP offload.
Thanks ...
great information!
ReplyDeleteI had a customer with a application error ID 4014
"A fatal error occurred while reading the input stream from the network. The session will be terminated (input error: XXX, output error: XXX)"
in ring buffer conncetivity I saw a lot of NetworkErrorFound or DisconnectDueToReadError
after disabling NIC teaming and disabling TCP CHimney Offload in network card and operating system (server 2008R2) problem disapper
Users are happy :]
thanks!! great BLOG
Good to know, thanks!
DeleteHi,
ReplyDeleteIs sufficient just to disable TCP chimney offload at the OS level without disabling it at the virtual network card (vmxnet3) level ?
Thanks for sharing such a great blog..Keep posting..Very informative post
ReplyDelete