HG Index Loads

Relative to other indexes, the HG indexes are more expense to maintain during data loads and deletions. A main contributor to the performance of the HG index is the location of the data within the HG index structure: the sparsity or density of the operation.

Dense HG operations are those in which the affected rows are tightly grouped around certain keys. Sparse operations are those where there may be just a few rows per key that must be affected. For instance, dates on data are typically grouped around the time the operation was logged, data modified, etc. This means that new data will be placed at the end of the HG index structure. When deleting data in the date HG index, said data would typically come off in chunks of days, weeks, months, etc and thus be removed from the beginning of the HG btree or be tightly grouped around a few keys for deletion. These operations are very fast, relatively speaking, as SAP Sybase IQ will operate on a few pages and affect a tremendous number of rows.

Data that is rather sparse, like Prices, Customer IDs, City, Country, etc., are very different. As “pricing" data, for instance, is loaded each value will vary widely across all data already in the index. If the column is tracking stock prices the numeric field to store that data will be densely updated because the data being changed will be across the nearly the entire range of values already loaded. These operations are slower due to the amount of index pages that must be maintained for each row being affected. A worst case scenario is that SAP Sybase IQ is forced to read and write 1 page for EACH ROW being loaded or deleted. While this can be less than optimal, SAP Sybase IQ has been design to parallel process phase 2 of the HG index loads and the deletes so that the impact is greatly reduced.

All of this is well and good, but how does it affect the data model design and indexing? Typical tuning and optimization within SAP Sybase IQ generally boils down to indexes or the lack thereof. Knowing how the indexes can be affected by the data and loading is an important aspect when deciding which indexes to put in place and which to leave off. Because HG indexes take, relatively, more time to load than other indexes they are often the subject of focus when it comes to use and design. Certainly, HG indexes can help with query performance. There are times, though, where adding an index may have a slight positive impact on queries but have more of an impact to data loads. In these situations, it is important to understand why the load or delete took longer and what can be done about it.

The sparsity or density of new data with respect to currently loaded data plays a critical role in this. If a relatively random column of a Customer ID must be indexed for fast query performance and an index must be on that column. Suppose, though, that a primary key exists on the table and it is the Customer ID and a Date field storing a transaction datetime. If the ordering were left as (customer_id, transaction_date) the data would be sparsely loaded or deleted from the table in most case. Data being loaded will be done so by transaction date. Since the Customer ID column is first in the multicolumn index, though, it will force SAP Sybase IQ to touch data throughout the entire HG index structure.

A simple change in order to (transaction_date, customer_id) changes this behavior. The index is still in place to control referential integrity for the primary key. The ordering of the columns is immaterial for primary key enforcement. As such, we can change the column order without causing any downstream ill effects. This simple change will now force all new data being loaded by transaction date to be inserted at the end of the HG index structure in a very dense manner. Over time the loads will perform consistently as the data is, generally, always going to the end of the HG structure.

Simply changing the column ordering in a multicolumn index can have drastic impacts on performance. The size of the HG index shouldn't change much as the data is still the same width regardless of order. What will change is how fast the data is loaded or deleted from the table.

Related concepts
Indexing Tips
When and Where to use Indexes
Simple Index Selection Criteria
Multi-Column Indexes