Sybase IQ binary load format

Description

Sybase IQ uses the FORMAT BINARY and BINARY column specification clauses to produce data files that can be read by the LOAD TABLE statement.

To speed data loading into Sybase IQ, customers can create data files in Sybase IQ binary format and load this data into Sybase IQ using the FORMAT BINARY syntax of the LOAD TABLE command.

You can find instructions for creating a load script using the LOAD TABLE syntax and specifying the load specification in Reference: Statements and Options.

Create data files with these binary formats to load into columns with the corresponding data types. In most cases, Sybase IQ uses the platform-specific binary format. These data types are exceptions that use binary formats that are specific to Sybase IQ:


IQ binary load format and load efficiency

The Sybase IQ binary load format is a fixed width format.

In general, fixed width loads complete faster than variable width loads. When the load logic knows the length of a column and row, the data is processed more efficiently. Using delimiters to separate columns and rows that vary in width causes the load to spend time scanning the input data looking for them.

The IQ Binary Load Format is a fixed width load. The load can determine the width of each column and length of each row from information in the table definition.

NoteBinary load format is endian-sensitive. This is because binary load format utilizes native binary data types to represent data.


Operating system native data types

Data for the following data types is stored in native operating system binary format and can be written to data files directly in that format. Sybase IQ reads the respective number of bytes directly into the associated data types without conversion.

By default, VARCHAR and VARBINARY columns are read in as many bytes as specified by LOAD TABLE column-spec.


DATE

DATE column data is stored in Sybase IQ as four bytes (a 32-bit unsigned integer) representing the number of days since 0000-01-01. To convert a calendar date to the Sybase IQ binary format, use:

For a given year, month, and day:

year = current_year - 1;
days_in_year_0000 = 366;
binaryDateValue = (year * 365)
+ (year / 4)
- (year / 100)
+ (year / 400)
days_in_year_0000
day_of_current_year 
-1;

For the day_of_current_year value in the formula above, consider the following example: February 12 is day 43.


TIME

TIME data is stored as a 64-bit unsigned quantity that represents a number in microseconds (in other words, 1.0e-6 seconds). The microsecond quantity is computed using: For a given hour, minute, second, and microsecond (usec):

binaryTimeValue = (hour * 3600 + minute * 60 + second + microsecond ) * 1000000

TIMESTAMP

TIMESTAMP data is stored as a 64-bit unsigned integer and represents a quantity in microseconds. You can compute a binary TIMESTAMP value using:

For a given year, month, day, hour, minute, second, and microsecond:

Compute binaryDateValue for the date as shown above. Compute binaryTimeValue for the time as shown above.

binaryDateTimeValue = binaryDateValue *
    86400000000 + binaryTimeValue

NUMERIC and DECIMAL

Formats for NUMERIC and DECIMAL data types vary as a function of precision. The value must be right-padded with zeroes to the full scale of the value. The value must also be fully left-padded with zeroes, but padding happens automatically with binary programming. Once the values are padded, the decimal point is removed. For example, the value 12.34 looks like:

After the value is padded and the decimal point is removed, the following rules apply:


Inserting NULL

The most expedient way to insert NULL values is to use the NULL BYTE in the input file and specify WITH NULL BYTE in the column specification of the LOAD TABLE statement. This is done by terminating each data field in the input file with 'x00' or 'x01'. Terminating a data field in the input file with 'x01' instructs the load to insert NULL into the column. For example:

create table d1 ( c1 date );
load table d1 ( c1 binary with null byte ) from 'filename' quotes off escapes off format binary;

If the content of the load input file is 000b32cb00000b32cc00, two rows will be loaded to the table. The first row will be May 7, 2009 and the second May 8, 2009. Notice that a NULL BYTE was added to the input file after each binary date. If you want NULL loaded into the first row, change the value of the NULL BYTE in the input file to 'x01'.

000b32cb01000b32cc00

The NULL portion of the column specification indicates how to treat certain input values as NULL values, when loading into the table column. These characters can include BLANKS, ZEROS, or any other list of literals you define. When you specify a NULL value or read a NULL value from the source file, the destination column must be able to contain NULLs.

ZEROS is interpreted as follows:

Another way to load NULLs during a binary load is not to supply data for the column in the LOAD TABLE statement, if the destination column accepts null values. For example:

CREATE TABLE t1 ( c1 INT, c2 INT );
LOAD TABLE T1 ( c2 BINARY ) FROM 'data.inp'
    FORMAT BINARY 
    QUOTES OFF 
    ESCAPES OFF;

SELECT * FROM T1;
c1     c2
NULL   1234
Null   1234 

View the input data file, which uses big-endian byte ordering:

od -x data.inp
0000 04d2 0000 04d2