Sybase IQ binary load format

Description

Sybase IQ uses the FORMAT BINARY and BINARY column specification clauses to produce data files that can be read by the LOAD TABLE statement.

To speed data loading into Sybase IQ, customers can create data files in Sybase IQ binary format and load this data into Sybase IQ using the FORMAT BINARY syntax of the LOAD TABLE command.

You can find instructions for creating a load script using the LOAD TABLE syntax and specifying the load specification in Reference: Statements and Options.

Create data files with these binary formats to load into columns with the corresponding data types. In most cases, Sybase IQ uses the platform-specific binary format. These data types are exceptions that use binary formats that are specific to Sybase IQ:

DATE
TIME
DATETIME
NUMERIC

IQ binary load format and load efficiency

The Sybase IQ binary load format is a fixed width format.

In general, fixed width loads complete faster than variable width loads. When the load logic knows the length of a column and row, the data is processed more efficiently. Using delimiters to separate columns and rows that vary in width causes the load to spend time scanning the input data looking for them.

The IQ Binary Load Format is a fixed width load. The load can determine the width of each column and length of each row from information in the table definition.

Binary load format is endian-sensitive. This is because binary load format utilizes native binary data types to represent data.

Operating system native data types

Data for the following data types is stored in native operating system binary format and can be written to data files directly in that format. Sybase IQ reads the respective number of bytes directly into the associated data types without conversion.

BIT (1 byte)
TINYINT (1 byte)
SMALLINT (2 bytes)
INT/UNSIGNED INT (4 bytes)
BIGINT/UNSIGNED BIGINT (8 bytes)
FLOAT (4 bytes)
DOUBLE (8 bytes)
CHAR/VARCHAR (character data)
BINARY/VARBINARY (binary data)

By default, VARCHAR and VARBINARY columns are read in as many bytes as specified by LOAD TABLE column-spec.

DATE

DATE column data is stored in Sybase IQ as four bytes (a 32-bit unsigned integer) representing the number of days since 0000-01-01. To convert a calendar date to the Sybase IQ binary format, use:

For a given year, month, and day:

year = current_year - 1;
days_in_year_0000 = 366;
binaryDateValue = (year * 365)
+ (year / 4)
- (year / 100)
+ (year / 400)
days_in_year_0000
day_of_current_year 
-1;

For the day_of_current_year value in the formula above, consider the following example: February 12 is day 43.

TIME

TIME data is stored as a 64-bit unsigned quantity that represents a number in microseconds (in other words, 1.0e-6 seconds). The microsecond quantity is computed using: For a given hour, minute, second, and microsecond (usec):

binaryTimeValue = (hour * 3600 + minute * 60 + second + microsecond ) * 1000000

TIMESTAMP

TIMESTAMP data is stored as a 64-bit unsigned integer and represents a quantity in microseconds. You can compute a binary TIMESTAMP value using:

For a given year, month, day, hour, minute, second, and microsecond:

Compute binaryDateValue for the date as shown above. Compute binaryTimeValue for the time as shown above.

binaryDateTimeValue = binaryDateValue *
    86400000000 + binaryTimeValue

NUMERIC and DECIMAL

Formats for NUMERIC and DECIMAL data types vary as a function of precision. The value must be right-padded with zeroes to the full scale of the value. The value must also be fully left-padded with zeroes, but padding happens automatically with binary programming. Once the values are padded, the decimal point is removed. For example, the value 12.34 looks like:

NUMERIC(4,2): 1234
NUMERIC(6,4): 123400
NUMERIC(8,4): 00123400
NUMERIC(12,6): 000012340000
NUMERIC(16,8): 0000001234000000

After the value is padded and the decimal point is removed, the following rules apply:

If precision <= 4, then the binary format is identical to native operating system binary format for 2 byte integer quantity.
If precision is between 5 and 9, then the binary format is identical to native operating system binary format for a 4 byte integer quantity.
If precision is between 10 and 18, then the binary format is identical to native operating system binary format for an 8 byte integer quantity.
If precision >= 19, then there is a special format that uses the following C++ struct definition:
```
struct {
unsigned char sign; // sign 1 for +, 0 for -
unsigned char ndig; // # digits
unsigned char exp; // exponent
unsigned short digits[80];
}; 
```
Exponent is excess-80 form, unless the value is zero. A “zero” value is represented as:
```
sign = 1
ndig = 0
exp = 0
```
The maximum exponent value is 159. The maximum number of supported digits is 288. “digits[0]” contains the least-significant digits. Digits are stored in a packed representation with two digits per “unsigned short” (2-byte) quantity. For a given “digit:”
- ```
lower order digit = digit[i] & 0x00FF 
```
- ```
high order digit => digit[i] & 0xFF00 
```
For example, consider the value 100 loaded into a NUMERIC(20) column. The binary layout of this value is:
```
0x0101 0x5000 0x0064 0x0000 0x0000 ...... 

Sign - 0x01
Number digits - 0x01
Exponent - 0x50
Digits - 0x0064 
```
As another example, consider the value 32769:
```
0x0102 0x5000 0x0ad1 0x0003 0x0000 0x0000 ....  

Sign - 0x01 
Number digits - 0x02 
Exponent - 0x50 
Digits - 0x0ad1 0x0003 
```
If you translate the digits into base 10, you have:
```
0x0ad1 => 2769  0x0003 => 3 
```

Inserting NULL

The most expedient way to insert NULL values is to use the NULL BYTE in the input file and specify WITH NULL BYTE in the column specification of the LOAD TABLE statement. This is done by terminating each data field in the input file with 'x00' or 'x01'. Terminating a data field in the input file with 'x01' instructs the load to insert NULL into the column. For example:

create table d1 ( c1 date );
load table d1 ( c1 binary with null byte ) from 'filename' quotes off escapes off format binary;

If the content of the load input file is 000b32cb00000b32cc00, two rows will be loaded to the table. The first row will be May 7, 2009 and the second May 8, 2009. Notice that a NULL BYTE was added to the input file after each binary date. If you want NULL loaded into the first row, change the value of the NULL BYTE in the input file to 'x01'.

000b32cb01000b32cc00

The NULL portion of the column specification indicates how to treat certain input values as NULL values, when loading into the table column. These characters can include BLANKS, ZEROS, or any other list of literals you define. When you specify a NULL value or read a NULL value from the source file, the destination column must be able to contain NULLs.

ZEROS is interpreted as follows:

The column is set to NULL if the input data is all binary zeros (not character zeros).

If the input data is character zero:

NULL(ZEROS) never causes the column to be NULL.

NULL('0') causes the column to be NULL. For example:

Load:

CREATE TABLE t1 ( c1 INT, c2 INT );

View the input data file, which uses big-endian byte ordering:

od -x data.inp
3030 3030 0000 04d2

Execute:

LOAD TABLE t1 ( c1 ASCII(4) NULL( '0000' ), 
    c2 BINARY )
FROM 'data.inp'
    FORMAT BINARY
    QUOTES OFF
    ESCAPES OFF;

The results:

SELECT * FROM t1;
c1     c2
NULL   1234

If the input data is binary zero (all bits clear):
- NULL(ZEROS) causes the column to be NULL.
- NULL('0') never causes the column to be NULL, for example:
  
  Load:
```
CREATE TABLE t1 ( c1 INT, C2 INT ); 
```
  VIEW the input data file, which uses big-endian byte ordering:
```
od -x data.inp
0000 0000 0000 04d2 
```
  Execute:
```
LOAD TABLE t1 ( c1 ASCII(4) NULL( zeros ), 
    c2 BINARY )
FROM 'data.inp' 
    FORMAT BINARY
    QUOTES OFF 
    ESCAPES OFF;
```
  The results:
```
SELECT * FROM T1;
c1     c2
NULL   1234 
```
  As another example, if your LOAD TABLE statement includes col1 date('yymmdd') null(zeros) and the data to load is 000000, you receive an error indicating that 000000 cannot be converted to a DATE(4). To get LOAD TABLE to insert a NULL value in col1 when the data is 000000, either write the NULL clause as null(‘000000’), or modify the data to equal binary zeros and use NULL(ZEROS).

Another way to load NULLs during a binary load is not to supply data for the column in the LOAD TABLE statement, if the destination column accepts null values. For example:

CREATE TABLE t1 ( c1 INT, c2 INT );
LOAD TABLE T1 ( c2 BINARY ) FROM 'data.inp'
    FORMAT BINARY 
    QUOTES OFF 
    ESCAPES OFF;

SELECT * FROM T1;
c1     c2
NULL   1234
Null   1234

View the input data file, which uses big-endian byte ordering:

od -x data.inp
0000 04d2 0000 04d2