Hash filter algorithms (HF, HFP)

A hash filter, sometimes referred to as a bloom filter, is a data structure that represents the distribution of values in a column or set of columns. The hash filter can be viewed as a (long) bit string where a 1 bit indicates the existence of a particular row, and a 0 bit indicates the lack of any row at that bit position. By hashing the values from a set of rows into bit positions in the filter, the database server can quickly determine whether or not there is a matching row of that value (subject to the existence of hash collisions).

For example, consider the plan:

R<idx> *JH S<seq> JH* T<idx>

Here you are joining R to S and T. The database server reads all of the rows of R before reading any row from T. If a hash filter is built using the rows of R returned by the index scan, then the database server can immediately reject rows of T that can not possibly join with R. This reduces the number of rows that must be stored in the second hash join.

A hash filter may be used in queries that satisfy both of the following conditions:

An operation in the query reads its entire input before returning a row to later operations. For example, a hash join of two tables on a single column requires that all the relevant rows from one of the inputs be read to form the hash table for the join.
A subsequent operation in the query's access plan refers to the rows in the result of that operation. For example, a second join on the same column as the first would use only those rows that satisfied the first join.

In this circumstance, the hash filter constructed as a result of the first join can substantially improve the performance of the second join. This is achieved by performing a lookaside operation into the hash filter's bit string to determine if any row has been previously processed successfully by the first join—if no such row exists, the hash table probe for the second join can be avoided entirely since the lack of a 1 bit in the hash filter signifies that the probe would fail to yield a match.

Send feedback about this page via email or DocCommentXchange