Parallel create index with Hash-Based Statistics for High Domains

Parallel create index in Adaptive Server 15.7 SP110 and later supports hash-based statistics gathering on high domain minor attribute columns of the index (that is, columns with 65536 or more unique column values).

Hash-based statistics gathering allows a database to immediately use newly created indexes without requiring the index scan from subsequent update index statistics tab_name index_name with hashing commands, which gather column statistics on the minor attributes of the index for query optimization (update statistics supports only serial hash-based statistics gathering).

The index must have more than one column for this feature to have effect.

Note: The major attribute of an index (that is, the first column of the index) continues to use legacy sort-based statistics gathering, whether or not you enable hash-based statistics gathering.

Parallel create index in earlier versions of Adaptive Server supported only hash-based statistics gathering on low-domain minor attribute columns of a composite index.

Each parallel thread creates its respective portion of the index, similar to earlier versions of create index. However, in versions 15.7 SP110 and later, while the index rows are being processed, hash based statistics gathering is invoked on each minor attribute of each index row. Each thread has a thread-local hash table for each column, so that the amount of tempdb buffer cache used increases proportionally by the number of parallel threads specified. In the case of high-domain, hash-based statistics gathering, additional memory is required to produce the final histogram .

The max_resource_granularity value limits the amount of memory used by all threads. If this limit is exceeded, one column is selected as a “victim” and the memory recycled is used to continue processing the remaining columns. If a victim is chosen due insufficient tempdb cache resources, the query processor does not generate statistics for the respective column. Typically, the high domain histograms created by parallelism is not the same as the histogram created by serial hash based processing, but in both cases the histogram cell weights are accurate for the respective cell boundaries.