Tuning query performance with index hashing

You can tune the performance of your queries by choosing a specific size for the maximum hash. A hash key represents the actual values of the indexed column. An index hash key aims to avoid the expensive operation of finding, loading, and then unpacking the rows to determine the indexed value. It prevents these operations by including enough of the actual row data with a row ID.

A row ID allows UltraLite to locate the actual row data in the database file. If you set the hash size to 0 (which disables index hashing), then the index entry only contains this row ID. If you set the hash size to anything other than 0, then a hash key is also used. A hash key can contain all or part of the transformed data in that row, and is stored with the row ID in the index page.

How much row data the hash key includes is determined:

  • Partly by the maximum hash size property you configure. See Choosing an optimal hash size.
  • Partly by how much is actually needed for the data type of the column.
A hash example

The value of an index hash maintains the order of the actual row data of indexed columns. For example, if you have indexed a LastName column for a table called Employees, you may see four names ordered as follows:

Anders

Anderseck

Andersen

Anderson

If you hashed the first six letters, your hash keys for these row values would appear as follows:

Anders

Anders

Anders

Anders

While these entries look the same, note that the first Anders in the list is used to represent the actual row value of Anders. The last Anders in the list, however, is used to represent the actual row value Anderson.

Now, consider the following statement:

SELECT * 
FROM Employees
WHERE LastName = 'Andersen';

If the Employees table only contained a very high proportion of names similar to Andersen, then the hash key may not offer enough uniqueness to gain any performance benefits. In this case, UltraLite cannot determine if any of the hash keys actually meets the conditions of this statement. When duplicate index hash keys exist, UltraLite still needs to:

  1. Find the table row that matches the row ID in question.
  2. Load and then unpack the data so the value can be evaluated.

Performance benefits only occur when UltraLite can discern a proportionate number of unique hash so that the query condition evaluation is immediate to the index itself. For example, if the Employees table had thousands of names, there is still enough benefit to be gained by a hash of six letters. However, if the Employees table only contained an inordinate number of names that begin with Anders*, then you should hash at least seven letters so the degree of unique keys increases. Therefore, the original four names at the start of this example how are now represented with these hash keys:

Anders

Anderse

Anderse

Anderso

Now, only two of the four row values would need to be unpacked and evaluated, rather than all four.

See also