HashDistinct algorithm (DistH)

HashDistinct takes a single input and returns all distinct rows. HashDistinct does this by reading its input, and building an in-memory hash table. If an input row is found in the hash table, it is ignored; otherwise, it is written to a work table. If the input does not completely fit into the in-memory hash table, it is partitioned into smaller work tables, and processed recursively.

HashDistinct also:

works very well if the distinct rows fit into an in-memory table, irrespective of the total number of rows in the input.
uses a work table, and as such can provide insensitive or value sensitive semantics.
returns a row when it finds one that has not previously been returned. However, the results of a hash distinct must be fully materialized before returning from the query. If necessary, the optimizer adds a work table to the execution plan to ensure this.
locks the rows of its input.

The optimizer avoids generating access plans using the hash distinct algorithm if it detects that a low memory situation may occur during query execution. If HashDistinct executes in an environment where there is very little cache memory available, then it is not able to complete. In this case, HashDistinct discards its interim results, and an internal low memory approach is used instead.

The amount of memory that can be used by a HashDistinct operator is dependent upon the multiprogramming level of the server, and the number of active connections. See Threading in SQL Anywhere, and Configuring the database server's multiprogramming level.

Discuss this page in DocCommentXchange.