Query Optimizer vs. HR Manager
Let’s say you’re working in a company where there’s a couple of departments. Each department has a certain number of employees.
feed your dragon every day |
The HR Manager came back from his vacation and someone asked him “how many staff do you have in the Finance department?” What will be the answer? As the list has not been updated, he will not be able to answer correctly. He needs to update the staff list for each department by utilizing the old Chinese abacus method. This will not be much enjoyable if the company is big and the HR Manager has to do a lot of work to correct the staff list.
This analogy is applicable to SQL Server optimizer. If the internal data distribution (statistics) is outdated, SQL Server will correct it. So, instead of SQL Server why don’t we just update it in the first place and on a regular basis?
Why is it so important?
To generate the optimal execution plan, SQL Server needs up-to-date statistics. Due to poor quality statistics, SQL Server creates sub-optimal plans and causes severe performance issues, such as high CPU, Table/Index Scan, and a Parallel plan. Query will start taking longer durations to finish then expected and locking overhead will introduce blocking and even a time-out issue. Not a good experience for the application user.
How statistics are created?
1. When using a column in a query condition such as a WHERE or JOIN clause causes statistics to be created when automatic statistics creation is enabled.
2. The query optimizer creates statistics automatically for indexes as a byproduct on tables or views when the index is created. These statistics (single or multi-column) are created on the key columns of the index.
3. Statistics can be created manually with CREATE STATISTICS command.
4. Using “sp_createstats” for all columns of all tables in a database explicitly. It creates single column statistics.
Query Optimizer of SQL Server always creates single column statistics with prefix “_WA_Sys_”.
Auto statistics update and threshold:
Generally SQL Server determines whether to update statistics based on changes to column modification counters (colmodctr).
There is an algorithm which triggers SQL Server to updates statistics. SQL Server keeps track of the number of changes for each column and triggers the statistics recompilation mechanism and the threshold is known as Recompilation threshold (RT). RT depends on the table type (permanent versus temporary), and the number of rows in the table (cardinality) when a query plan is compiled. The recompilation thresholds for all tables referenced in a batch are stored with the query plans of that batch.
RT is calculated as follows. (Where n is the number of rows in the table when the statistics were gathered)
Permanent table
If n <= 500, RT = 500.
If n > 500, RT = 500 + 0.20 * n.
Temporary table
If n < 6, RT = 6.
If 6 <= n <= 500, RT = 500.
If n > 500, RT = 500 + 0.20 * n.
Table variable
Since table variable has no statistics, RT does not exist.
An important new feature for high-throughput OLTP environments is the ability to asynchronously (AUTO_UPDATE_STATISTICS_ASYNC ON) update statistics automatically.
Manual statistics update and drawback:
1. When an index is created or rebuilt, only associated statistics will be updated and nothing else.
2. When executing ALTER INDEX ALL … on a table, only the statistics associated with indexes are updated. Automatic or manual statistics created on the table will never be updated.
3. “REORGANIZE” never updates any statistics.
Auto update and the consequence:
The statistics which are created automatically by SQL Server Query optimizer or manually, how will those be updated? If statistics auto creation and update mechanism have not been disabled then SQL Server will attempt to update those statistics whenever required. For a large table this might cause performance issues such as blocking if it triggers during a peak hour. And if for some reason it did not get updated, the old execution plan (which is now sub-optimal anyway due to changes of underlying data) will be used to execute the query. As a result query performance suffers badly.
How will you update “user or system created statistics”?
As “Automatic or user defined statistics created on the table will never be updated” by ALTER INDRX (ALL) REBUILD command, then the better way to update those statistics is by using “sp_updatestats” or manually each statistics at a time or we will have to depend on SQL Server auto-update mechanism solely. The last two processes may not be an effective option in many environments. Disabling auto update statistics is not recommend, and we also should not rely on auto mechanism completely and on the other hand we want to update the statistics beforehand.
“sp_updatestats” – how does it work?
MSDN says “sp_updatestats updates only the statistics that require updating based on the rowmodctr information in the sys.sysindexes catalog view, thus avoiding unnecessary updates of statistics on unchanged rows”. Also note that “sp_updatestats updates statistics on disabled non-clustered indexes and does not update statistics on disabled clustered indexes.”
“sp_updatestats” utilizes rowmodctr column and as well it utilizes another undocumented internal function “stats_ver_current” to conditionally check of an existing statistics to decide whether it needs to be updated or not.
A better approach to optimize databases:
Though each environment is unique statistics optimization is mandatory for the sake of the best possible query execution plans regardless of the size of table or the database. “sp_updatestats” will ensure the complete update of all statistics irrespective to how they were created. In my experience I found that updating statistics is much more important and crucial than rebuilding indexes.
A good way to optimize indexes and statistics:
1. First rebuild all indexes based on a predefined fragmentation level; say 30%. Some indexes will not be touched and there will be some outdated statistics leftover.
2. In the second step, run “sp_updatesstats”, which will take care of all the statistics.
A sample Script on how to use “sp_updatesstats” against all databases:
Following both scripts does the same job. You can use whichever you prefer.
/*******************************************************
** Statistics update on all db
*******************************************************/
EXEC sp_MSforeachdb 'if ''?'' NOT IN (''tempdb'') use ? exec sp_updatestats RESAMPLE'
/*******************************************************
** Statistics update on all db
*******************************************************/
declare @name varchar(100)
declare db_cursor cursor
for
select s.[name]
from sys.databases s
where s.[name] not in ( 'tempdb' )
and s.[state_desc] = 'ONLINE'
and s.[user_access_desc] = 'MULTI_USER'
order by s.[name]
open db_cursor
fetch next from db_cursor into @name
while @@FETCH_STATUS = 0
begin
exec ('use ' + @name )
print '-----------------------------------------------'
print 'Updating stats on db: ' + @name
print '-----------------------------------------------'
exec sp_updatestats
fetch next from db_cursor into @name
end
close db_cursor
deallocate db_cursor
Output of “sp_updatestats”
References:
sp_updatestats (Transact-SQL)
Statistics Used by the Query Optimizer in Microsoft SQL Server
Statistics Used by the Query Optimizer in Microsoft SQL Server
Statistical maintenance functionality (autostats) in SQL Server
Batch Compilation, Recompilation, and Plan Caching Issues in SQL Server 2005