Chapter 10: Hash-Based Update Statistics

Adaptive Server versions 15.7 ESD #2 and let you gather hash-based statistics on minor index attributes and unindexed columns instead of using sort-based statistics. Using hash-based instead of sort-based statistics improves performance by reducing the number of required scans, and by avoiding disk-based sorting.

Hash-based statistics allow greater flexibility than sort-based statistics:

Running hash-based statistics should require less time to run sort-based statistics, increasing the amount of work you can accomplish during a maintenance window.
Because hash-based statistics require less procedure cache, you may be able to run update statistics on data-only-locked tables outside the maintenance window since the tempdb buffer cache, which usually uses the default data cache, is typically much larger than the procedure cache.
Generally, hash-based statistics do not require large tempdb disk allocations. If you previously increased the size of tempdb to accommodate large sorts from update statistics, you may be able to redeploy this space.
update index statistics and update all statistics with hashing may run faster than sorting with sampling. However, an exception may be update statistics table_name(col_name).

Hash-based statistics use a low-domain algorithm for columns with fewer than 65536 unique column values, and a high-domain algorithm for columns with 65536 or more unique column values. Of the two algorithms, low-domain hashing produces the more accurate histogram because Adaptive Server uses the actual counts of all the domains values to create the histogram. High-domain hashing may produce a less accurate histogram because Adaptive Server produces an in-memory intermediate histogram that it updates for each block of 65536 unique values.

Because gathering hash-based statistics is CPU-intensive, you may want to create an execution class with EC3 attributes to which you can assign the update statistics login. Adaptive Server gives lower priority to update statistics maintenance sessions, reducing the impact when the maintenance window is small or nonexistent.

When running update statistics, Sybase recommends that you:

Use partitioned tables so only the active partitions require update statistics maintenance
Use datachange to determine when to run update statistics
Avoid running update statistics on allpages-locked tables when concurrency may be an issue (since update statistics uses level 1 page locking on allpages-locked tables, which has less concurrency than the dirty reads on data-only-locked tables)