Query Plan Optimization with Bloom Filters

Adaptive Server 15.7 SP100 introduces a query plan optimization feature, bloom filtering, which improves join performance.

A bloom filter provides early filtering of rows that cannot be joined before they would reach the join operator. Bloom filter is considered only when one child of the join operator must materialize its derived table before the other child is opened. The materializing child is called the build side. The bloom filter uses a bit array to represent qualifying join key values from the build side. The join then pushes the bit array to the other side, called the probe side. The probe side then uses the bit array to test whether a row, with certain join key values, might match the build side values. If not, the row is filtered from further processing. Bloom filter provides a low-cost approach for eliminating rows before they participate in a full join algorithm.

In Adaptive Server, bloom filtering is implemented for hash joins, sort merge joins, and reformatting based nested loop joins. It is applicable only to equi-joins. For hash joins, the build side is part of the hash join itself. For sort merge joins, the build side is the left side sort operator. For reformatting based nested loop joins, the build side is the insert operator for the worktable index creation. Bloom filtering has an added CPU cost when it builds and probes the bit array. The optimizer uses bloom filtering in the final plan if doing so is estimated to reduce the total cost of the join.

The bloom filter is enabled only when optimization criteria join_bloom_filter is on. The join_bloom_filter is on by default under the allrows_dss optimization goal when the optimization level is set to match the 15.7 SP100 or later, as in ase_current.

To view all available current optimizer settings, enter:

sp_options "show"

The memory cost of the bit array has an upper bound of about 512K bytes for the bit array of each bloom filter during execution. For most scenarios, memory cost less than 16K.

The performance of bloom filtering mostly benefits large joins. Test results for query execution elapse time in Adaptive Server with bloom filters have shown reductions of 15% to 60%. The added overhead to the plan execution is often very small. The benefits also vary depending on the datatypes of the join key columns. Integer datatypes show the best potential performance gain, while unichar and univarchar datatypes show less.