update statistics

Description

Updates information about the distribution of key values in specified indexes, for all columns in an index, table, or partition, and resets the data change counters for global nonclustered indexes.

Syntax

update statistics table_name
	[[partition data_partition_name] 
		[ (column1, column2, …) | (column1), (column2), …] |
	index_name [partition index_partition_name]] 
	[using step values | [out_of_range [on | off| default]]]
	[with consumers = consumers][, sampling=N percent]
		[, no_hashing | partial_hashing | hashing]
		[, max_resource_granularity = N [percent]] 
		[, histogram_tuning_factor = int ]
		[, print_progress = int]

Parameters

table_name: when used with update statistics, table_name is the name of the table with which the index is associated. table_name is required, since Transact-SQL does not require index names to be unique in a database.

index_name: is the name of the index to be updated. If an index name is not specified, the distribution statistics for all the indexes in the specified table are updated.

data_partition_name: is the name of the partition to be updated. Statistics for each local index partition on the data partition is updated. Does not update statistics for global indexes.

index_partition_name: is the name of the index partition to be updated.

(column1, column2, ...): creates the same statistics that are created if a local index is created and included on this column tuple. That is, creates a histogram on column1, and create multiattribute distributions on all column prefixes (column1, column2), (column1, column2, column3).

(column1), (column2): comma-separated list of columns on which you are running update statistics. Allows for a single update statistics command to create a histogram on each column in the list, but does not create any multi-attribute distributions.

using step values: specifies the number of histogram steps. The default value is 20, for columns where no statistics exist. If you need to change the default for this, use sp_configure to modify the number of histogram steps parameter. If statistics for a column already exist in sysstatistics, the default value is the current number of steps.

The steps are applied to each partition of a partitioned table—for example, update statistics uses the default value of 20 steps for each data and index partition involved in the scan for updating statistics. If global statistics are generated through an index scan of a global index, then 20 steps are applied by default. If partition statistics are generated, either through a data scan or local index scan, then 20 steps are applied by default for each partition.

If the histogram steps specified through using step values is M, and the histogram_tuning_factor parameter is N, then update statistics uses between 0 and M*N steps, depending on the number of frequency cells that update statistics isolates and whether any range cells exist.

out_of_range [on | off | default]: Column statistics for rapidly growing tables may become out-of-date when update statistics completes, resulting in out-of-range SARGs (search clauses) that select a greater range of values than described by the column’s histogram. Out-of-range SARGS have a selectivity of 0. The out_of_range] histogram adjustment feature adjusts a column’s histogram, and assigns an appropriate selectivity value to such SARGs.

Histogram adjustment for out of range SARGS is enabled server wide by default.

out_of_range [on | off | default] specifies an out-of-range histogram adjustment at the column level. One of:

on – Enables out-of-range histogram adjustment for column_name.

off – Disables out-of-range histogram adjustment for column_name.

default – Affects the out-of-range histogram adjustment depending on the value of trace flag 15355:

Disables out-of-range histogram adjustment when Traceflag 15355 is on.

Enables out-of-range histogram adjustment when Traceflag 15355 is off.

with consumers = consumers: specifies the number of consumer processes to be used for a sort when column_list is provided and parallel query processing is enabled. The consumers option specifies the degree of parallelism applied to a sort performed for statistics update on a single data partition. For example, if update statistics with a column list is applied to a table with three data partitions, data from each of the three partitions is sorted separately and the consumers option is applied during each of the sort. The three sorts themselves are not performed in parallel.

The value for the max parallel degree configuration parameter must be be greater than the value for with consumers. For example, if with consumers is set to 2, then max parallel degree must be at least 3.

with sampling = N percent: specifies the percentage of the column to be randomly sampled in order to gather statistics. The value for N is any number between 1 and 100. Sampling applies to all update statistics types:

update statistics table_name (col_name)

update index statistics

update all statistics

index: specifies that statistics for all columns in an index are to be updated.

[no_hashing | partial_hashing | hashing]: indicates the level of hash-based statistics update statistics gathers. One of:

no_hashing – (the default) update statistics uses the algorithm from Adaptive Server versions earlier than 15.7, gathering sort-based statistics.

partial_hashing – update statistics uses the algorithm for columns with fewer than 65536 unique values. If update statistics encounters unique column counts that are greater than or equal to the 65536 threshold, it uses an extra scan with a sort.

hashing – update statistics uses low-domain and high-domain hashing to create the histograms.

max_resource_granularity = N percent: limits the amount of tempdb buffer cache used with update statistics and hashing.

with histogram_tuning_factor = integer: determines update statistics’s distribution granularity.

print_progress = int: Determines if update statistics displays progress messages.

0 – (the default) command does not display any progress messages

1 – command displays progress messages

Examples

Example 1

Generates statistics for the price column of the titles table:

update statistics titles (price) using 40 values

Example 2

Updates statistics on the data partition smallsales Adaptive Server creates histograms for each leading column and densities for the composite columns of each local index of the data partition. Statistics are not updated for global indexes:

update statistics titles partition smallsales

Example 3

Updates statistics on the data partition smallsales. Adaptive Server creates histograms on column col1 and creates densities for the composite columns col1 and col2:

update statistics titles partition smallsales (col1, col2)

Example 4

When an out_of_range SARG is detected for a column, the optimizer adjusts the column’s histogram and assigns an appropriate selectivity value to the out-of-range clause:

update statistics TOFO_FUOP_ORD(OrdDt) using
out_of_range on

Example 5

If trace flag 15355 is turned on, the column’s histogram is not adjusted for out-of-range SARGs:

update statistics TOFO_FUOP_ORD(OrdDt) using 
out_of_range default

Example 6

Runs update statistics on the authors table with a histogram_tuning_factor of 5 percent:

update index statistics authors with histogram_tuning_factor = 5

Usage

Adaptive Server keeps statistics about the distribution of the key values in each index, and uses these statistics in its decisions about which index (es) to use in query processing.

When you create a nonclustered index on a table that contains data, update statistics is automatically run for the new index. When you create a clustered index on a table that contains data, update statistics is automatically run for all indexes.

Running update statistics on an empty table does not affect the system tables.

The optimization of your queries depends on the accuracy of the statistics. If there is significant change in the key values in your index, you should rerun update statistics on that index or column. Use the update statistics command if a great deal of data in an indexed column has been added, changed, or removed (that is, if you suspect that the distribution of key values has changed).

You should also run update statistics on system tables with a large number of rows. If you have permission to run the command on a user table, it is no different with respect to system table. Without statistics, there is always a chance for system stored procedures to perform poorly.

update statistics skips global indexes when you run the command on a data partition.

update statistics, when used with a table name and an index name, updates statistics for the leading column of an index. If update statistics is used with just a table name, it updates statistics for the leading columns of all indexes on the table.

If a comma-separated list, (col1), (col2)..., is used and hashing is enabled, then one scan can be used to gather statistics, if you do not exceed the resource granularity. If sorting is enabled, then one scan for each is used. If you use partial hashing, one scan can be used for low-domain columns, if you do not exceed the resource granularity, and one scan is used for the sort for each of the high-domain columns (that is, if you have three columns, you have three sorts).

Specifying the name of an unindexed column or the nonleading column of an index generates statistics for that column without creating an index.

Specifying more than one column in a column list (for example (col1, col2, ... )) generates or updates a histogram for the first column, and density statistics for all prefix subsets of the list of columns. Hash-based statistics cannot be used in this case.

If you use update statistics to generate statistics for nonleading columns of clustered indexes and nonindexed columns, update statistics must scan the table and perform a sort.

Adaptive Server ignores sampling for update statistics unless you specify a column list (such as (col1), (col2)..., or (col1, col2, col3)). Use update all statistics or update index statistics if you require sampling in situations other than a column list. If you specify a column list, and these columns are nonleading columns of clustered indexes and nonindexed columns, update statistics must scan the table and perform a sort, or use hash-based algorithms.

If you use update statistics on a specific partition, you update global statistics implicitly as well.

Adaptive Server raises error 16015 if you attempt to use out_of_range options for update statistics alongside other options such as consumers or sampling.

Adaptive Server raises error 16016 if you specify out_of_range options for a column that currently has no column level statistics.

update statistics regenerates and updates the table statistics stored in systabstats for each data and index partition of the table the command updates. If you run the update statistics command for a specific data partition, the table statistitics are generated and updated only for that data partition and for any local index partitions. Global indexes are skipped. If you run the update statistics for a specific index partition, only the table statistics for that index partition are updated.

The with consumers clause is designed for use on partitioned tables on RAID devices, which appear to Adaptive Server as a single I/O device, but which are capable of producing the high throughput required for parallel sorting. See “Using Statistics to Improve Performance” in the Performance and Tuning Series: Query Processing and Abstract Plans.

Table 1-36 shows the types of scans performed during update statistics, the types of locks acquired, and when sorts are needed.

**Table 1-36: Locking, scans, and sorts during update statistics**
update statistics specifying		Scans and sorts performed	Locking
Table name
	Allpages-locked table	Table scan, plus a leaf-level scan of each nonclustered index	Level 1; shared intent table lock, shared lock on current page
	Data-only-locked table	Table scan, plus a leaf-level scan of each nonclustered index and the clustered index, if one exists	Level 0; dirty reads
Table name and clustered index name
	Allpages-locked table	Table scan	Level 1; shared intent table lock, shared lock on current page
	Data-only-locked table	Leaf level index scan¹	Level 0; dirty reads
Table name and nonclustered index name
	Allpages-locked table	Leaf level index scan¹	Level 1; shared intent table lock, shared lock on current page
	Data-only-locked table	Leaf level index scan¹	Level 0; dirty reads
Table name and column name
	Allpages-locked table	Table scan; creates a worktable and sorts the worktable	Level 1; shared intent table lock, shared lock on current page
	Data-only-locked table	Table scan; creates a worktable and sorts the worktable	Level 0; dirty reads

¹ See “When does Adaptive Server perform scans and sorts” in Chapter 10, “Using Statistics to Improve Performance,” in the Performance and Tuning Series: Query Processing and Abstract Plans” to determine which indexes get scanned if the column for which the statistics is being sampled exists in two or more indexes.

The update index statistics command generates a series of update statistics operations that use the same locking, scanning, and sorting as the equivalent index-level and column-level command. For example, if the salesdetail table has a nonclustered index named sales_det_ix on salesdetail (stor_id, ord_num, title_id), the update index statistics salesdetail command performs these update statistics operations:

update statistics salesdetail sales_det_ix 
update statistics salesdetail (ord_num)
update statistics salesdetail (title_id)

update statistics is not run on system tables in the master database during upgrade from earlier versions. Indexes exist on columns queried by most system procedures, and running update statistics on these tables is not required for normal usage. However, running update statistics is allowed on all system tables in all databases, except those that are not normal tables. These tables, which are built from internal structures when queried, include syscurconfigs, sysengines, sysgams, syslisteners, syslocks, syslogs, syslogshold, sysmonitors, sysprocesses, syssecmechs, systestlog and systransactions.

You do not need to run update statistics on Replication Server RSSD tables. Running updates statistics on these tables can result in Replication Server errors if you run it while Replication Server attempts to access the RSSD tables. RSSD tables and their format are specific to Replication Server processing.

These update statistics parameters retain their values on all affected columns and override any configuration settings until the statistics for the columns are deleted, or until you run sp_modifystats ... REMOVE_STICKINESS on the columns:

using step values
out_of_range
no_hashing
partial_hashing
hashing
histogram_tuning_factor
sampling = N percent

consumers and max_resource_granularity do not retain their values.

For example, if you issue:

update statistics table_name(column1) with no_hashing

Subsequent update statistics commands on this table use the default configuration value for update statistics hashing on all columns except column1, which continues to use no_hashing until you delete the column1 statistics.

with consumers and with sampling apply to sorts and not to hashing. You may include the partial_hashing parameter with with consumers and with sampling parameters, but with consumers and with sampling apply only to sorts on the high-domain columns. If you explicitly include the with hashing parameter with the with consumers or with sampling parameters, Adaptive Server ignores those parameters.

Using hash-based statistics

The partial_hashing parameter uses hashing on columns only when it produces histograms that are as accurate and of equivalent quality as those produced by sorting (low-domain cases), otherwise, the partial_hashing parameter uses sorting for high-domain cases.
Hashing may produce less accurate histograms than sorting in some cases (high-domain cases).
Although hash-based statistics do not require the tempdb disk space or procedure cache memory used by sorting, it may use a significant amount of tempdb buffer cache memory.
Large sorts used for the no_hashing parameter may scavenge statements in the statement cache and stored procedures to release procedure cache memory to support the sort.
max_resource_granularity limits the amount of tempdb buffer cache memory used for hashing or partial_hashing. It does not affect the amount of memory used by no_hashing parameter or sorting.
If you include the partial_hashing parameter, and a previous histogram on a column exists that indicates a high-domain column, Adaptive Server assumes that sort-based statistics are required on this column. If no previous histogram exists on a column, then Adaptive Server assumes this column is low-domain until the high-domain limit is reached.

update statistics and sampling

Adaptive Server scans samples of data pages. If you specify an index in update statistics, such as in the following:

update statistics table_name [index_name] with sampling = N percent

This command creates and updates statistics on the leading column of all indexes on the specified table, or the leading column of a specified index.

When you use the sampling = N percent option with the using steps value , you must specify the sampling = N percent option last:

update statistics titles (type) 
    using 40 value
    with sampling = 10 percent

If you do not, you get an error message:

update statistics titles (type) 
    with sampling = 10 percent 
    using 40 value

Msg 156, Level 15, State 2:
Line 1:
Incorrect syntax near the keyword 'using'.

create index and stored procedures

Adaptive Server automatically recompiles stored procedures after executing update statistics statements. Although ad hoc queries that you start before executing update statistics still continue to work, they do not take advantage of the new statistics.In Adaptive Server versions 12.5 and earlier, update statistics was ignored by cached stored procedures.

Standards

ANSI SQL – Compliance level: Transact-SQL extension.

Permissions

You must be the table owner or a user with update statistics permission on the table to run update statistics.