ClusteredHashGroupBy algorithm (GrByHClust)

Sometimes values in the grouping columns of the input table are clustered, so that similar values appear close together. For example, if a table contains a column that is always set to the current date, all rows with a single date tend to be relatively close within the table. ClusteredHashGroupBy exploits this clustering.

The optimizer may use ClusteredHashGroupBy when grouping tables that are significantly larger than the available memory. In particular, it is effective when the HAVING predicate returns only a small proportion of rows.

ClusteredHashGroupBy can lead to significant wasted work on the part of the optimizer if it is chosen in an environment where data is being updated at the same time that queries are being executed. ClusteredHashGroupBy is therefore most appropriate for OLAP workloads characterized by occasional batch-style updates and read-based queries. Set the optimization_workload option to OLAP to instruct the optimizer that it should include ClusteredHashGroupBy in the possibilities it investigates. See optimization_workload option.

When creating an index or foreign key that can be used in an OLAP workload, specify the FOR OLAP WORKLOAD clause. This clause causes the database server to maintain a statistic used by ClusteredHashGroupBy regarding the maximum page distance between two rows within the same key. See CREATE INDEX statement, CREATE TABLE statement, and ALTER TABLE statement.

For more information about OLAP workloads, see OLAP support.

Discuss this page in DocCommentXchange.