Improved tokenization

Tokenization can now be applied to columns with more than 64K distinct values.

When you create a permanent table in a Sybase IQ database, IQ stores all column values in a default index. This default index, called an FP (fast projection) index, optimizes projections and enables certain kind of search conditions to be evaluated.

Each column has one FP index, and each FP is an array of n fixed-length entries where n is the number of rows in the table. Each column value is stored sequentially in ascending RecordID order.

With a small number of distinct or unique values, such as a state, date, or month field, an optimized form of the FP can be created that will reduce the number of disk pages required, dramatically reducing both the storage required for a column and I/O costs for projection.

These optimized FP indexes have two pieces: 1. a lookup table where each distinct value in the column appears exactly once and 2. the logical array of column cell values, except instead to storing the actual cell values each element of the logical array is a key into the lookup table to where the cell value is stored.

The sp_iqindexmetadata stored procedure generates a report describing a specified index or indexes belonging to a specified owner or table. The output allows easy checking of whether a given index is a 1-byte, 2-byte, 3-byte, or flat style FP index. For details, see “sp_iqindexmetadata procedure” in Chapter 7, “System Procedures,” in Reference: Building Blocks, Tables, and Procedures.


FP(1) index

When the number of values is within 256, a 1-Byte FP index is created. The FP index starts with this form.The actual key value of each row is stored in the lookup table and the FP entry (lookup key) contains the index into the lookup table. For a 1-Byte FP index, each FP entry occupies 1 byte.


FP(2) index

When the number of values exceeds 256, but is less than or equal to 65536, the same lookup table grows. Each FP entry (lookup key) contains the index into the lookup table but occupies 2 bytes instead of 1.

To avoid the overhead of converting 1-byte entries into 2-byte entries, users can specify the IQ UNIQUE value to be greater than 256 and less than or equal to 65536 at table creation time.


FP(3) index

When the distinct count exceeds 65536, you can create a 3-byte FP index. The FP(3) index is structurally similar to FP(1) and FP(2) indexes, with the following key differences:

The 3-byte index stores values in a column (column data), as long as the distinct count does not exceed 16777216. Users can create a 3-byte index on columns only if the size of column data is greater than 3 bytes. Therefore, you cannot create an FP(3) index on columns with data types BIT, TINYINT, SMALLINT, CHAR(<=3), VARCHAR(<=3), BINARY(<=3) and VARBINARY(<=3). Sybase IQ also does not support FP(3) indexes for LONG VARCHAR and LONG VARBINARY data types.

To create an FP(3) index, either the MINIMIZE_STORAGE must be set ON, or the column must have been created with an IQ UNIQUE constraint value between 65537 and 16777216, including these two values. An FP(3) may also be created after rollover from an FP(2) index, once the unique count exceeds 65536, but only for data sizes shown in Table 1-1 and excluding data types BIT, TINYINT, SMALLINT, CHAR(<=3), VARCHAR(<=3), BINARY(<=3) and VARBINARY(<=3). See “Using IQ UNIQUE constraint on columns” in Chapter 9, “Ensuring Data Integrity,” in the System Administration Guide: Volume 1 and the “MINIMIZE_STORAGE option” in Chapter 2, “Database Options,” in the Reference: Statements and Options.

Behavior changes

There are some differences in the behavior of FP indexes from earlier releases.

Table 1-1: Sybase IQ fast projection (FP) indexes

Distinct count

Column data size = 1 byte

Column data size = 2 bytes

Column data size = 3 bytes

Column data size >3 bytes

<257

FP(1)

FP(1)

FP(1)

FP(1)

257 - 65536

FP(2)

FP(2)

FP(2)

65537-16777216

Flat

FP(3)

>16777216

Flat

Like FP(1) and FP(2) indexes, the FP(3) index is not supported for columns whose data type is wider than 255 bytes or less than four bytes wide.

The creation of an FP(3) index, overflow, or a forced transition to an FP(3) index is permitted only if the space used by the lookup table is less than the current value of the FP_LOOKUP_SIZE option and less than the portion of the main cache specified by the current setting of FP_LOOKUP_SIZE_PPM.

The maximum number of lookup pages used in Sybase IQ is controlled by the FP_LOOKUP_SIZE option and the FP_LOOKUP_SIZE_PPM option, whichever is lower. See “FP_LOOKUP_SIZE option” and “FP_LOOKUP_SIZE_PPM option” in Chapter 2, “Database Options,” in Reference: Statements and Options.

Table 1-2 calculates the maximum number of distinct values that can be supported in an FP(3) index based on the following formula:

FP_LOOKUP_SIZE / (Column-Data size + Cardinality size)

Cardinality size is the space reserved to store cardinality of all individual data in the lookup store. Cardinality size can have a value of either 4 or 8 bytes. In this example, it has a maximum value of 8 bytes.

Table 1-2: Maximum unique values in FP(3)

Column data type width (bytes)

FP_LOOKUP_SIZE (MB)

4

8

32

64

128

255

1 MB

87381

65536

26214

14563

7710

3986

4 MB

349525

262144

104857

58254

30840

15947

8 MB

699050

524288

209715

116508

61680

31895

16 MB

1398101

1048576

419430

233016

123361

63791

32 MB

2796202

2097152

838860

466033

246723

127583

64 MB

5592405

4194304

1677721

932067

493447

255166

128 MB

11184810

8388608

3355443

1864135

986895

510333

256 MB

16777216

16777216

6710886

3728270

1973790

1020667

NoteNotes The values illustrated in Table 1-2 are estimates for the number of unique values in a column for the given value of option FP_LOOKUP_SIZE; actual values may vary. Such variations are possible because counts can be stored as 4 bytes or 8 bytes.

Table 1-2 is based on the condition that the value of FP_LOOKUP_SIZE is less than or equal to the value of FP_LOOKUP_SIZE_PPM.


Configuring FP(3) indexes

You may need to adjust the temporary cache size when configuring 3-byte indexes. You can set values using the server startup command line parameter -iqtc or using the sa_server_option system procedure temp_cache_memory_mb option as follows:

CALL sa_server_option('temp_cache_memory_mb', value)

The enumerated FP indexes use a hash object to manage the values represented in the column. The size of the hash object used with a 3-byte FP can get large, depending on the number of distinct values and the width of the column. With a large enough temporary cache allocation, increasing the value of the option HASH_PINNABLE_CACHE_PERCENT above the default value of 20 percent can improve performance by allowing the entire hash object to remain in the cache.

Cache usage

In order to maximize the use of FP(3) indexes, set the FP_LOOKUP_SIZE option a value larger than the default of 16MB. Refer to Table 1-2 for maximum distinct counts allowed on a column for an FP(3) index. Table 1-2 also contains examples, with values less than 16777216, where a rollover to a Flat FP occurs for a smaller unique count than the expected 16777216.

Loads

Columns with a 3-byte index have additional cache requirement for loading data. Set FP_LOOKUP_SIZE to an appropriate value before loading columns with 3-bytes indexes.

If a scarcity of pinned buffers occurs, Sybase IQ returns a warning in the .iqmsg file, which also contains notification of possible thrashing:

Warning: Hash Insert forced buffer unpinning detected for FP Index
Warning: Hash Insert thrashing detected for FP Index

Flat FP index

When the number of distinct values exceeds 16777216, no lookup table is created. Each FP entry contains an actual column cell value.

If MINIMIZE_STORAGE is on, you can avoid the overhead of converting lookup FP entries into flat style. When the distinct row count for a particular field increases beyond 16777216, then an FP(3) index is automatically converted to a flat style FP index. Specify the IQ UNIQUE value to be greater than 16777216 at table creation time to create flat style FP.

NoteWhen you create a table with the DATE data type, a 2-byte FP index is created on the DATE field, which is independent of the settings in database option MINIMIZE_STORAGE.

If you want to create a 3-byte FP or flat style FP index on the DATE field, use the following IQ UNIQUE values when creating the table: