Total Pageviews

Tuesday, August 8, 2023

NUMA and soft-NUMA in SQL Server: To get additional I/O threads

Performance can be improved significantly if the SQL Server engine detects physical NUMA nodes on the Windows system. Along with Hardware NUMA, Microsoft also introduced soft-NUMA (software-based NUMA) architecture to create extra virtual NUMA nodes inside SQL OS. Starting from SQL Server 2016 (13.x), if the Database Engine detects more than eight “physical cores per NUMA node” or more than eight “socket”, soft-NUMA nodes will be created automatically. The creation of soft-NUMA nodes enables the SQL Server database engine to create more I/O threads to enhance the demanding SQL Server transactional workload.

The soft-NUMA creation process starts during the startup of the SQL Server service. By default, soft-NUMA creation is enabled in SQL Server and can be disabled or re-enabled by using the ALTER SERVER CONFIGURATION (Transact-SQL) statement with the SET SOFTNUMA argument. Changing the value of this setting requires a restart of the database engine to take effect.

Purpose of soft-NUMA: The purpose of soft-NUMA is to create an artificial grouping of CPU Cores where each group represents a soft-NUMA node. This creation of NUMA within the SQL Server will allow the database engine (SQL OS) to create extra “LAZY WRITER”, “LOG WRITER” and “RESOURCE MONITOR” threads per NUMA node. The SQL Server database engine will automatically decide and create soft-NUMA and the above threads based on existing NUMA and CPU cores.

Please note that soft-NUMA architecture will not create separate local memory nodes for every NUMA node. Instead, all the virtual soft-NUMA nodes will be using the same memory node where CPU group belongs to and which was originally exposed to the SQL Server. This means that there will be no local memory support for the soft-NUMA node.

Benefits of soft-NUMA:  Since SQL Server is a fully NUMA-aware application, having extra “LAZY WRITER”, “RESOURCE MONITOR” and “LOG WRITER” threads can provide significant performance improvement.  Additional benefits:

  1. Creates multiple “LAZY WRITER” threads, one per each NUMA node.
  2. Creates multiple “RESOURCE MONITOR” threads, one per each NUMA node.
  3. Might be able to create two or more “LOG WRITER” threads based on each NUMA node.
  4. Reduces “Non-Yielding Scheduler” errors and increases SQL Server responsiveness.
  5. Improves CHECKPOINT and I/O operations.
  6. Reduction of LATCH contention.

Less than 9 CPU Cores: Whether the SQL Server is installed directly on the hardware or running on a Virtual Machine, soft-NUMA creation requirements will be the same. If we run SQL Server on a VM with 8 CPU Cores we cannot have soft-NUMA, however, we can easily manipulate CPU Topology within the Hypervisor level to create or expose two vNUMA nodes to the Windows Server. SQL Server will treat this as a physical NUMA and will create two real NUMA nodes and associated I/O Threads.

Beware of creating multiple vNUMA with small amount of memory, it will not improve performance rather will introduce performance problem due to remote memory access. You can evaluate NUMA node memory usage by the following DMV:

SELECT * FROM sys.dm_os_memory_node_access_stats;

Query/DMV used to investigate: 

-- Hardware NUMA/Memory Node
SELECT @@servername AS 'sql_intance',
       'Orginal NUMA/Memory Node' AS 'Memory Node',
       memory_node_id,
       CONVERT(DECIMAL(18,2),(virtual_address_space_reserved_kb / 1024.0)) AS virtual_address_space_reserved_mb,
       CONVERT(DECIMAL(18,2),(virtual_address_space_committed_kb / 1024.0)) AS virtual_address_space_committed_mb,
       CONVERT(DECIMAL(18,2),(locked_page_allocations_kb / 1024.0)) AS locked_page_allocations_mb,
       CONVERT(DECIMAL(18,2),(pages_kb / 1024.0)) AS pages_mb,
       CONVERT(DECIMAL(18,2),(shared_memory_reserved_kb / 1024.0)) AS shared_memory_reserved_mb,
       CONVERT(DECIMAL(18,2),(shared_memory_committed_kb / 1024.0)) AS shared_memory_committed_mb,
       CONVERT(DECIMAL(18,2),(foreign_committed_kb / 1024.0)) AS foreign_committed_mb,
       CONVERT(DECIMAL(18,2),(target_kb / 1024.0)) AS target_mb
FROM sys.dm_os_memory_nodes
WHERE memory_node_id <> 64;

-- Hardware information after applying soft-NUMASELECT @@servername AS 'sql_intance',
       virtual_machine_type_desc,
       cpu_count,
       softnuma_configuration_desc,
       socket_count,
       cores_per_socket
FROM sys.dm_os_sys_info;

-- soft-NUMA nodesSELECT @@servername AS 'sql_intance',
       'Memory Node with soft-NUMA' AS 'Memory Node',
       node_id,
       node_state_desc,
       cpu_count
FROM sys.dm_os_nodes
WHERE node_state_desc = 'ONLINE';
-- SELECT @@servername AS 'sql_intance',
       spid,
       lastwaittype,
       cmd,
       status
FROM sys.sysprocesses
WHERE cmd IN ( 'LAZY WRITER', 'RESOURCE MONITOR', 'LOG WRITER' ) 

Following are few examples of NUMA and soft_NUMA creation: This configuration was conducted in a ESXi 8.0, Windows Server 2022 and SQL Server 2022 environment. Regardless of the environment, the final outcome will be the same.

1 vSocket, 8 vCores per vSocket, No CPU Topology applied 

Microsoft SQL Server 2022 (RTM) - 16.0.1000.6 (X64)
SQL Server detected 1 sockets with 8 cores per socket and 8 logical processors per socket, 8 total logical processors; using 8 logical processors based on SQL Server licensing.
CPU vectorization level(s) detected:  SSE SSE2 SSE3 SSSE3 SSE41 SSE42 AVX AVX2 POPCNT BMI1 BMI2 AVX512 (F CD BW DQ VL)
Node configuration: node 0: CPU mask: 0x00000000000000ff:0 Active CPU mask: 0x00000000000000ff:0.
Total Log Writer threads: 2, Node CPUs: 4, Nodes: 1, Log Writer threads per CPU: 1, Log Writer threads per Node: 2


 

1 vSocket, 9 vCores per vSocket, No CPU Topology applied :

Microsoft SQL Server 2022 (RTM) - 16.0.1000.6 (X64)
SQL Server detected 1 sockets with 9 cores per socket and 9 logical processors per socket, 9 total logical processors; using 9 logical processors based on SQL Server licensing.
Automatic soft-NUMA was enabled because SQL Server has detected hardware NUMA nodes with greater than 8 physical cores.
Node configuration: node 0: CPU mask: 0x000000000000001f:0 Active CPU mask: 0x000000000000001f:0.
Node configuration: node 1: CPU mask: 0x00000000000001e0:0 Active CPU mask: 0x00000000000001e0:0.
Total Log Writer threads: 2, Node CPUs: 2, Nodes: 2, Log Writer threads per CPU: 1, Log Writer threads per Node: 2




1 vSocket, 10 vCores per vSocket, No CPU Topology applied :

Microsoft SQL Server 2022 (RTM) - 16.0.1000.6 (X64)
SQL Server detected 1 sockets with 10 cores per socket and 10 logical processors per socket, 10 total logical processors; using 10 logical processors based on SQL Server licensing.
Automatic soft-NUMA was enabled because SQL Server has detected hardware NUMA nodes with greater than 8 physical cores.
CPU vectorization level(s) detected:  SSE SSE2 SSE3 SSSE3 SSE41 SSE42 AVX AVX2 POPCNT BMI1 BMI2 AVX512 (F CD BW DQ VL)
Node configuration: node 0: CPU mask: 0x000000000000001f:0 Active CPU mask: 0x000000000000001f:0.
Node configuration: node 1: CPU mask: 0x00000000000003e0:0 Active CPU mask: 0x00000000000003e0:0.
Total Log Writer threads: 2, Node CPUs: 2, Nodes: 2, Log Writer threads per CPU: 1, Log Writer threads per Node: 2



 

2 vSocket, 5 vCores per vSocket, No CPU Topology applied:

Microsoft SQL Server 2022 (RTM) - 16.0.1000.6 (X64)
SQL Server detected 2 sockets with 5 cores per socket and 5 logical processors per socket, 10 total logical processors; using 10 logical processors based on SQL Server licensing.
Automatic soft-NUMA was enabled because SQL Server has detected hardware NUMA nodes with greater than 8 physical cores.
CPU vectorization level(s) detected:  SSE SSE2 SSE3 SSSE3 SSE41 SSE42 AVX AVX2 POPCNT BMI1 BMI2 AVX512 (F CD BW DQ VL)
Node configuration: node 0: CPU mask: 0x000000000000001f:0 Active CPU mask: 0x000000000000001f:0.
Node configuration: node 1: CPU mask: 0x00000000000003e0:0 Active CPU mask: 0x00000000000003e0:0.
Total Log Writer threads: 2, Node CPUs: 2, Nodes: 2, Log Writer threads per CPU: 1, Log Writer threads per Node: 2



 

2 vSocket, 5 vCores per vSocket, 2 vNUMA, CPU Topology applied :

Microsoft SQL Server 2022 (RTM) - 16.0.1000.6 (X64)
SQL Server detected 2 sockets with 5 cores per socket and 5 logical processors per socket, 10 total logical processors; using 10 logical processors based on SQL Server licensing.
CPU vectorization level(s) detected:  SSE SSE2 SSE3 SSSE3 SSE41 SSE42 AVX AVX2 POPCNT BMI1 BMI2 AVX512 (F CD BW DQ VL)
Node configuration: node 0: CPU mask: 0x000000000000001f:0 Active CPU mask: 0x000000000000001f:0.
Node configuration: node 1: CPU mask: 0x00000000000003e0:0 Active CPU mask: 0x00000000000003e0:0.
Total Log Writer threads: 2, Node CPUs: 2, Nodes: 2, Log Writer threads per CPU: 1, Log Writer threads per Node: 2

2 vSocket, 4 vCores per vSocket, 2 vNUMA, CPU Topology applied :

Microsoft SQL Server 2022 (RTM) - 16.0.1000.6 (X64)
SQL Server detected 2 sockets with 4 cores per socket and 4 logical processors per socket, 8 total logical processors; using 8 logical processors based on SQL Server licensing.
CPU vectorization level(s) detected:  SSE SSE2 SSE3 SSSE3 SSE41 SSE42 AVX AVX2 POPCNT BMI1 BMI2 AVX512 (F CD BW DQ VL)
Node configuration: node 0: CPU mask: 0x000000000000000f:0 Active CPU mask: 0x000000000000000f:0.
Node configuration: node 1: CPU mask: 0x00000000000000f0:0 Active CPU mask: 0x00000000000000f0:0.
Total Log Writer threads: 2, Node CPUs: 2, Nodes: 2, Log Writer threads per CPU: 1, Log Writer threads per Node: 2


References:

Soft-NUMA (SQL Server):
https://learn.microsoft.com/en-us/sql/database-engine/configure-windows/soft-numa-sql-server?view=sql-server-ver16#automatic-soft-numa

How It Works: Soft NUMA, I/O Completion Thread, Lazy Writer, Workers and Memory Nodes:
https://techcommunity.microsoft.com/t5/sql-server-support-blog/how-it-works-soft-numa-i-o-completion-thread-lazy-writer-workers/ba-p/316044

Saturday, August 5, 2023

Paravirtualized Network Adaptor: Changing E1000e to VMXNET3:

A very common and misleading practice made by many is to accept the default value during installation or configuration. This practice may be acceptable or even suitable in some scenarios, but it may not be optimal for a targeted workload since the default values can eventually cause widespread performance issues.

While creating a Virtual Machine in vSphere ESXi, there are many mandatory values come as default for CPU, Memory, Network Card, Socket, I/O controller and so on that need to be decided on. The VMware architect sets most of the required hardware resources at the bare minimum values necessary to create a Virtual Machine regardless of the guest OS. Should we accept these defaults? Probably not. However, many administrators and associates continue to accept these bare minimum hardware default values without realizing the performance consequences.

Network Adaptor: Currently there are three types of Network Adaptors available and the E1000e is the default. E1000e is an emulated version of the “Intel 82574 Gigabit Ethernet NIC” and the guest OS will recognize it as “Intel(R) 82574L Gigabit Network Connection”. If this adaptor is selected for the guest OS (Windows, Linux), the required driver for this adaptor is already built-in within the OS and has no interaction with VMWare Tools driver.

A few disadvantages of E1000e:

1.     It is not Paravirtualized, thus VM performance is not guaranteed.

2.     Only supports basic network connectivity.

3.     Does not support RSS (Receive Side Scaling).

4.     Uses far more CPU on the hypervisor.

5.     May cause memory leakage and high CPU on the guest OS.

6.     Network packets drop.

VMXNET3: It is a Paravirtualized Network Adaptor developed by VMWare and it is recommended for VM to gain substantial performance benefit from the Virtual Machine. To take advantage of this adaptor, VMWare Tools need to be installed on the Virtual Machine.

Changing E1000e to VMXNET3: There are several ways to change the Network Adaptor type from E1000e to VMXNET3. Before doing this, you should record all network configurations from the existing network adaptor. If you would like to keep the MAC address of the of E1000e (existing network card), please write it down beforehand.

I found that using PowerCLI is the easiest and safest way to change the network adaptor type from E100e to VMXNET3.

Method 1: Using PowerCLI to change NIC type while preserving the original MAC address of E1000e:

·       Note down the network configuration, including values such as IP, subnet, Gateway, DNS, etc. and take a snapshot of the VM.

·       Turn off the VM.

·       Connect to the Esxi Server, my Esxi server IP is 192.168.0.22:

 Connect-VIServer -Server 192.168.0.22

·       Check the Network Type:

Get-VM win01| get-networkadapter

·       Change the NIC type from E1000e to VMXNET3:

get-vm win01 | Get-NetworkAdapter | set-networkadapter -type vmxnet3 -confirm:$false

·       Turn on the VM.

·       In a Windows VM, open the device manager (devmgmt.msc) and enable “Show hidden devices” under the view menu.

·       Uninstall the “Intel(R) 82574L Gigabit Network Connection”.

Using PowerCLI to change Network Adaptor:



Method 2: By editing the vmx file:

·       Similar to above, note down the network configuration details.

·       Take a snapshot of the VM.

·       Turn off the VM.

·       Open the datastore where the VM resides.

·       Right click and download the vmx file to local desktop.

·       Edit the vmx file and replace the adaptor type to vmxnext3 as follows: 

ethernet0.virtualDev = "vmxnet3"

·       Add the following:

ethernet0.CheckMACAddress = "FALSE"

·       Save the vmx file, then upload the edited version and replace the original version.

·       Turn on the VM.

·       In a Windows VM, open the device manager (devmgmt.msc) and enable “Show hidden devices” under the view menu.

  • Uninstall the “Intel(R) 82574L Gigabit Network Connection”.

Reference:

Choosing a network adapter for your virtual machine (1001805):

https://kb.vmware.com/s/article/1001805

Understanding full virtualization, paravirtualization, and hardware assist:
https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/VMware_paravirtualization.pdf

VMXNET3 vs E1000E and E1000:

https://rickardnobel.se/vmxnet3-vs-e1000e-and-e1000-part-1/

https://rickardnobel.se/vmxnet3-vs-e1000e-and-e1000-part-2/