Using sampling for update statistics

The optimizer for Adaptive Server uses the statistics on a database to set up and optimize queries. To generate optimal results, the statistics must be as current as possible.

Run the update statistics commands against data sets, such as tables, to update information about the distribution of key values in specified indexes or columns, for all columns in an index, or for all columns in a table. The commands revise histograms and density values for column-level statistics. The results are then used by the optimizer to calculate the best query plan.

Run update statistics using a sampling method, which can reduce the I/O and time when your maintenance window is small and the data set is large. If you are updating a large data set or table that is in constant use, being truncated and repopulated, you may want to do a statistical sampling to reduce the time and the size of the I/O. Because sampling does not update the density values, run a full update statistics prior to using sampling for an accurate density value.

Use caution with sampling, since the results are not fully accurate. Balance changes to histogram values against the savings in I/O.

Sampling does not update density values created by a non-sampling update statistics command. Since the density changes very slowly, replacing an accurate density with an approximation calculated by sampling usually does not improve the estimate. Density values created by a sampling update statistics command is updated. Sybase recommends that you use that one non-sampling update statistics command to establish an accurate density, which can be followed by numerous sampling update statistics commands. To have sampling update statistics update the density, delete the column statistics before using update statistics with sampling.

When you are deciding whether or not to use sampling, consider the size of the data set, the time constraints you are working with, and if the histogram produced is as accurate as needed.

The percentage to use when sampling depends on your needs. Test various percentages until you receive a result that reflects the most accurate information on a particular data set; for example:

update statistics authors(auth_id) with sampling = 5 percent

Set server-wide sampling percent using:

sp_configure 'sampling percent', 5

This command sets a server-wide sampling of 5% for update statistics that allows you to execute update statistics without the sampling syntax. The percentage can be between zero (0) and one hundred (100) percent.