Total Pageviews

Tuesday, August 8, 2023

NUMA and soft-NUMA in SQL Server: To get additional I/O threads

Performance can be improved significantly if the SQL Server engine detects physical NUMA nodes on the Windows system. Along with Hardware NUMA, Microsoft also introduced soft-NUMA (software-based NUMA) architecture to create extra virtual NUMA nodes inside SQL OS. Starting from SQL Server 2016 (13.x), if the Database Engine detects more than eight “physical cores per NUMA node” or more than eight “socket”, soft-NUMA nodes will be created automatically. The creation of soft-NUMA nodes enables the SQL Server database engine to create more I/O threads to enhance the demanding SQL Server transactional workload.

The soft-NUMA creation process starts during the startup of the SQL Server service. By default, soft-NUMA creation is enabled in SQL Server and can be disabled or re-enabled by using the ALTER SERVER CONFIGURATION (Transact-SQL) statement with the SET SOFTNUMA argument. Changing the value of this setting requires a restart of the database engine to take effect.

Purpose of soft-NUMA: The purpose of soft-NUMA is to create an artificial grouping of CPU Cores where each group represents a soft-NUMA node. This creation of NUMA within the SQL Server will allow the database engine (SQL OS) to create extra “LAZY WRITER”, “LOG WRITER” and “RESOURCE MONITOR” threads per NUMA node. The SQL Server database engine will automatically decide and create soft-NUMA and the above threads based on existing NUMA and CPU cores.

Please note that soft-NUMA architecture will not create separate local memory nodes for every NUMA node. Instead, all the virtual soft-NUMA nodes will be using the same memory node where CPU group belongs to and which was originally exposed to the SQL Server. This means that there will be no local memory support for the soft-NUMA node.

Benefits of soft-NUMA:  Since SQL Server is a fully NUMA-aware application, having extra “LAZY WRITER”, “RESOURCE MONITOR” and “LOG WRITER” threads can provide significant performance improvement.  Additional benefits:

  1. Creates multiple “LAZY WRITER” threads, one per each NUMA node.
  2. Creates multiple “RESOURCE MONITOR” threads, one per each NUMA node.
  3. Might be able to create two or more “LOG WRITER” threads based on each NUMA node.
  4. Reduces “Non-Yielding Scheduler” errors and increases SQL Server responsiveness.
  5. Improves CHECKPOINT and I/O operations.
  6. Reduction of LATCH contention.

Less than 9 CPU Cores: Whether the SQL Server is installed directly on the hardware or running on a Virtual Machine, soft-NUMA creation requirements will be the same. If we run SQL Server on a VM with 8 CPU Cores we cannot have soft-NUMA, however, we can easily manipulate CPU Topology within the Hypervisor level to create or expose two vNUMA nodes to the Windows Server. SQL Server will treat this as a physical NUMA and will create two real NUMA nodes and associated I/O Threads.

Beware of creating multiple vNUMA with small amount of memory, it will not improve performance rather will introduce performance problem due to remote memory access. You can evaluate NUMA node memory usage by the following DMV:

SELECT * FROM sys.dm_os_memory_node_access_stats;

Query/DMV used to investigate: 

-- Hardware NUMA/Memory Node
SELECT @@servername AS 'sql_intance',
       'Orginal NUMA/Memory Node' AS 'Memory Node',
       memory_node_id,
       CONVERT(DECIMAL(18,2),(virtual_address_space_reserved_kb / 1024.0)) AS virtual_address_space_reserved_mb,
       CONVERT(DECIMAL(18,2),(virtual_address_space_committed_kb / 1024.0)) AS virtual_address_space_committed_mb,
       CONVERT(DECIMAL(18,2),(locked_page_allocations_kb / 1024.0)) AS locked_page_allocations_mb,
       CONVERT(DECIMAL(18,2),(pages_kb / 1024.0)) AS pages_mb,
       CONVERT(DECIMAL(18,2),(shared_memory_reserved_kb / 1024.0)) AS shared_memory_reserved_mb,
       CONVERT(DECIMAL(18,2),(shared_memory_committed_kb / 1024.0)) AS shared_memory_committed_mb,
       CONVERT(DECIMAL(18,2),(foreign_committed_kb / 1024.0)) AS foreign_committed_mb,
       CONVERT(DECIMAL(18,2),(target_kb / 1024.0)) AS target_mb
FROM sys.dm_os_memory_nodes
WHERE memory_node_id <> 64;

-- Hardware information after applying soft-NUMASELECT @@servername AS 'sql_intance',
       virtual_machine_type_desc,
       cpu_count,
       softnuma_configuration_desc,
       socket_count,
       cores_per_socket
FROM sys.dm_os_sys_info;

-- soft-NUMA nodesSELECT @@servername AS 'sql_intance',
       'Memory Node with soft-NUMA' AS 'Memory Node',
       node_id,
       node_state_desc,
       cpu_count
FROM sys.dm_os_nodes
WHERE node_state_desc = 'ONLINE';
-- SELECT @@servername AS 'sql_intance',
       spid,
       lastwaittype,
       cmd,
       status
FROM sys.sysprocesses
WHERE cmd IN ( 'LAZY WRITER', 'RESOURCE MONITOR', 'LOG WRITER' ) 

Following are few examples of NUMA and soft_NUMA creation: This configuration was conducted in a ESXi 8.0, Windows Server 2022 and SQL Server 2022 environment. Regardless of the environment, the final outcome will be the same.

1 vSocket, 8 vCores per vSocket, No CPU Topology applied 

Microsoft SQL Server 2022 (RTM) - 16.0.1000.6 (X64)
SQL Server detected 1 sockets with 8 cores per socket and 8 logical processors per socket, 8 total logical processors; using 8 logical processors based on SQL Server licensing.
CPU vectorization level(s) detected:  SSE SSE2 SSE3 SSSE3 SSE41 SSE42 AVX AVX2 POPCNT BMI1 BMI2 AVX512 (F CD BW DQ VL)
Node configuration: node 0: CPU mask: 0x00000000000000ff:0 Active CPU mask: 0x00000000000000ff:0.
Total Log Writer threads: 2, Node CPUs: 4, Nodes: 1, Log Writer threads per CPU: 1, Log Writer threads per Node: 2


 

1 vSocket, 9 vCores per vSocket, No CPU Topology applied :

Microsoft SQL Server 2022 (RTM) - 16.0.1000.6 (X64)
SQL Server detected 1 sockets with 9 cores per socket and 9 logical processors per socket, 9 total logical processors; using 9 logical processors based on SQL Server licensing.
Automatic soft-NUMA was enabled because SQL Server has detected hardware NUMA nodes with greater than 8 physical cores.
Node configuration: node 0: CPU mask: 0x000000000000001f:0 Active CPU mask: 0x000000000000001f:0.
Node configuration: node 1: CPU mask: 0x00000000000001e0:0 Active CPU mask: 0x00000000000001e0:0.
Total Log Writer threads: 2, Node CPUs: 2, Nodes: 2, Log Writer threads per CPU: 1, Log Writer threads per Node: 2




1 vSocket, 10 vCores per vSocket, No CPU Topology applied :

Microsoft SQL Server 2022 (RTM) - 16.0.1000.6 (X64)
SQL Server detected 1 sockets with 10 cores per socket and 10 logical processors per socket, 10 total logical processors; using 10 logical processors based on SQL Server licensing.
Automatic soft-NUMA was enabled because SQL Server has detected hardware NUMA nodes with greater than 8 physical cores.
CPU vectorization level(s) detected:  SSE SSE2 SSE3 SSSE3 SSE41 SSE42 AVX AVX2 POPCNT BMI1 BMI2 AVX512 (F CD BW DQ VL)
Node configuration: node 0: CPU mask: 0x000000000000001f:0 Active CPU mask: 0x000000000000001f:0.
Node configuration: node 1: CPU mask: 0x00000000000003e0:0 Active CPU mask: 0x00000000000003e0:0.
Total Log Writer threads: 2, Node CPUs: 2, Nodes: 2, Log Writer threads per CPU: 1, Log Writer threads per Node: 2



 

2 vSocket, 5 vCores per vSocket, No CPU Topology applied:

Microsoft SQL Server 2022 (RTM) - 16.0.1000.6 (X64)
SQL Server detected 2 sockets with 5 cores per socket and 5 logical processors per socket, 10 total logical processors; using 10 logical processors based on SQL Server licensing.
Automatic soft-NUMA was enabled because SQL Server has detected hardware NUMA nodes with greater than 8 physical cores.
CPU vectorization level(s) detected:  SSE SSE2 SSE3 SSSE3 SSE41 SSE42 AVX AVX2 POPCNT BMI1 BMI2 AVX512 (F CD BW DQ VL)
Node configuration: node 0: CPU mask: 0x000000000000001f:0 Active CPU mask: 0x000000000000001f:0.
Node configuration: node 1: CPU mask: 0x00000000000003e0:0 Active CPU mask: 0x00000000000003e0:0.
Total Log Writer threads: 2, Node CPUs: 2, Nodes: 2, Log Writer threads per CPU: 1, Log Writer threads per Node: 2



 

2 vSocket, 5 vCores per vSocket, 2 vNUMA, CPU Topology applied :

Microsoft SQL Server 2022 (RTM) - 16.0.1000.6 (X64)
SQL Server detected 2 sockets with 5 cores per socket and 5 logical processors per socket, 10 total logical processors; using 10 logical processors based on SQL Server licensing.
CPU vectorization level(s) detected:  SSE SSE2 SSE3 SSSE3 SSE41 SSE42 AVX AVX2 POPCNT BMI1 BMI2 AVX512 (F CD BW DQ VL)
Node configuration: node 0: CPU mask: 0x000000000000001f:0 Active CPU mask: 0x000000000000001f:0.
Node configuration: node 1: CPU mask: 0x00000000000003e0:0 Active CPU mask: 0x00000000000003e0:0.
Total Log Writer threads: 2, Node CPUs: 2, Nodes: 2, Log Writer threads per CPU: 1, Log Writer threads per Node: 2

2 vSocket, 4 vCores per vSocket, 2 vNUMA, CPU Topology applied :

Microsoft SQL Server 2022 (RTM) - 16.0.1000.6 (X64)
SQL Server detected 2 sockets with 4 cores per socket and 4 logical processors per socket, 8 total logical processors; using 8 logical processors based on SQL Server licensing.
CPU vectorization level(s) detected:  SSE SSE2 SSE3 SSSE3 SSE41 SSE42 AVX AVX2 POPCNT BMI1 BMI2 AVX512 (F CD BW DQ VL)
Node configuration: node 0: CPU mask: 0x000000000000000f:0 Active CPU mask: 0x000000000000000f:0.
Node configuration: node 1: CPU mask: 0x00000000000000f0:0 Active CPU mask: 0x00000000000000f0:0.
Total Log Writer threads: 2, Node CPUs: 2, Nodes: 2, Log Writer threads per CPU: 1, Log Writer threads per Node: 2


References:

Soft-NUMA (SQL Server):
https://learn.microsoft.com/en-us/sql/database-engine/configure-windows/soft-numa-sql-server?view=sql-server-ver16#automatic-soft-numa

How It Works: Soft NUMA, I/O Completion Thread, Lazy Writer, Workers and Memory Nodes:
https://techcommunity.microsoft.com/t5/sql-server-support-blog/how-it-works-soft-numa-i-o-completion-thread-lazy-writer-workers/ba-p/316044

No comments:

Post a Comment