Data extraction options

The data extraction facility improves performance dramatically for queries with a large result set. This facility currently consists of a set of database options, which are set using the SET OPTION command. Like other database options, the data extraction options can be set either as temporary or permanent. Ordinarily these options are set as temporary. The extract options are set for a connection.

The extract options allow the user to redirect the output of a SELECT statement from the standard interface to go directly to one or more disk files or named pipes. There are two advantages of using the extract options:

The extract options

There are 27 options that control the behavior of extract (listed with allowed values for the option, followed by the default value):

Option Name

Allowed Values

Default value

Temp_Extract_Append

ON or OFF

OFF

Temp_Extract_Binary

ON or OFF

OFF

Temp_Extract_Column_Delimiter

string

','

Temp_Extract_Directory

string

''

Temp_Extract_Name1

string

''

Temp_Extract_Name2

string

''

Temp_Extract_Name3

string

''

Temp_Extract_Name4

string

''

Temp_Extract_Name5

string

''

Temp_Extract_Name6

string

''

Temp_Extract_Name7

string

''

Temp_Extract_Name8

string

''

Temp_Extract_Null_As_Empty

ON or OFF

OFF

Temp_Extract_Null_As_Zero

ON or OFF

OFF

Temp_Extract_Quote

string

''

Temp_Extract_Quotes

ON or OFF

OFF

Temp_Extract_Quotes_All

ON or OFF

OFF

Temp_Extract_Row_Delimiter

string

''

Temp_Extract_Size1

platform specific*

0

Temp_Extract_Size2

platform specific*

0

Temp_Extract_Size3

platform specific*

0

Temp_Extract_Size4

platform specific*

0

Temp_Extract_Size5

platform specific*

0

Temp_Extract_Size6

platform specific*

0

Temp_Extract_Size7

platform specific*

0

Temp_Extract_Size8

platform specific*

0

Temp_Extract_Swap

ON or OFF

OFF

*The default values for the Temp_Extract_Sizen options are platform specific:

When large file systems, such as JFS2, support file size larger than the default value, set TEMP_EXTRACT_SIZEn to the value that the file system allows. For example, to support lTB set option:

TEMP_EXTRACT_SIZE1 = 1073741824 KB

NoteFor all database options that accept integer values, Sybase IQ truncates any decimal option-value setting to an integer value. For example, the value 3.8 is truncated to 3.

The most important of these options is TEMP_EXTRACT_NAME1. If TEMP_EXTRACT_NAME1 is set to its default setting (the empty string), extraction is disabled and no output is redirected. To enable extraction, set Temp_Extract_Name1 to a possible path name. Extract starts extracting into a file with that name. Be sure to choose the path name to a file that is not otherwise in use. If the file does not already exist, the data extraction facility creates the file.

Both the directory or folder containing the named file and the named file must have write permission set for the user name used to start IQ (for example, sybase). In append mode, the data extraction facility adds extracted rows to the end of the file and does not overwrite the data that is already in the file. If the file does not exist, the data extraction facility creates the file.

WARNING! If you choose the path name of an existing file and the TEMP_EXTRACT_APPEND option is set OFF (the default), the file contents are overwritten. This may be what you want if the file is for a weekly report, for example, but is not what you want if the file is one of your database files.

The options TEMP_EXTRACT_NAME2 through TEMP_EXTRACT_NAME8 are used in addition to TEMP_EXTRACT_NAME1 to specify the names of multiple output files. These options must be used sequentially. For example, TEMP_EXTRACT_NAME3 has no effect unless both the options TEMP_EXTRACT_NAME1 and TEMP_EXTRACT_NAME2 are already set.

The options TEMP_EXTRACT_SIZE1 through TEMP_EXTRACT_SIZE8 are used to specify the maximum size of the corresponding output files. TEMP_EXTRACT_SIZE1 specifies the maximum size of the output file specified by TEMP_EXTRACT_NAME1, TEMP_EXTRACT_SIZE2 specifies the maximum size of the output file specified by TEMP_EXTRACT_NAME2, and so on.

Note that the default for the data extraction size options is 0. IQ converts this default to the following values:

Device type

Size

disk file

AIX and HP-UX: 0 – 64GB

Sun Solaris & Linux: 0 – 512GB

Windows: 0 – 128GB

tape*

524288KB (0.5GB)

other

unlimited

*Tape devices currently are not supported.

TEMP_EXTRACT_APPEND is not compatible with the TEMP_EXTRACT_SIZEn options. If you try to restrict the size of the extract append output file, Sybase IQ reports an error.

If you are extracting to a single disk file or a single named pipe, leave the options TEMP_EXTRACT_NAME2 through TEMP_EXTRACT_NAME8 and TEMP_EXTRACT_SIZE1 through TEMP_EXTRACT_SIZE8 at their default values.

NoteIf the SELECT returns no rows and there is no output to redirect, an empty file of zero length is created. If multiple extract files are specified and there is not enough data to fill all of the files, all of the files are still created.

Controlling access

The TEMP_EXTRACT_DIRECTORY option controls whether a user is allowed to use the data extraction facility. It also controls the directory into which temp extract files are placed and overrides a directory path specified in the TEMP_EXTRACT_NAMEn options.

If the TEMP_EXTRACT_DIRECTORY option is set to the string FORBIDDEN (case insensitive) for a user, then that user is not allowed to perform data extracts. An attempt by this user to use the data extraction facility results in an error: “You do not have permission to perform Extracts”.

If TEMP_EXTRACT_DIRECTORY is set to FORBIDDEN for the PUBLIC group, then no one can run data extraction.

If TEMP_EXTRACT_DIRECTORY is set to a valid directory path, temp extract files are placed in that directory, overriding a path specified in the TEMP_EXTRACT_NAMEn options.

If TEMP_EXTRACT_DIRECTORY is set to an invalid directory path, an error occurs: “Files does not exist File: <invalid path>”

If TEMP_EXTRACT_DIRECTORY is blank, then temp extract files are placed in directories according to their specification in TEMP_EXTRACT_NAMEn. If no path is specified as part of TEMP_EXTRACT_NAMEn, the extract files are by default placed in the server startup directory.

The TEMP_EXTRACT_DIRECTORY option provides increased security and helps control disk management by restricting the creation of large data extraction files to the directories for which a user has write access. DBA authority is required to set this option.

Types of extraction

There are three types of data extraction:

A binary extraction produces a file that can be loaded via a LOAD TABLE statement with an overall "binary" format and with a per column "binary with null byte" format.

The binary/swap extraction is the same as the binary extraction, except it is designed to be loaded on another machine with opposite endianness.

The ASCII extraction produces a text file.

The two options Temp_Extract_Binary and Temp_Extract_Swap determine which of the three types of extraction is done:

Type

Temp_Extract_Binary

Temp_Extract_Swap

binary

ON

OFF

binary/swap

ON

ON

ASCII

OFF

OFF

The default extraction type is ASCII.

Note that if the data is unloaded using the extraction facility with the TEMP_EXTRACT_BINARY option set ON, then you must use the LOAD TABLE statement BINARY WITH NULL BYTE parameter for each column when you load the binary data.

Column and row delimiters

In the case of an ASCII extraction, the default is to separate column values with commas, and end the row with a newline on UNIX platforms and with a carriage return/newline pair on Windows platforms. The strings are unquoted. If these defaults are not suitable, use the following options to change the delimiters:

The delimiter must occupy from 1 to a maximum of 4 bytes and must be valid in the collation order you are using, if you are using a multibyte collation order. Be sure to choose delimiters that do not occur in any of the data output strings themselves.

Note that the default for the Temp_Extract_Row_Delimiter option is the empty string. IQ converts the empty string default for this option to the newline on UNIX platforms and to the carriage return/newline pair on Windows platforms.

The option Temp_Extract_Column_Delimiter controls the delimiter between columns. If this option is set to the empty string '' for ASCII extractions, then the extracted data is written in fixed-width ASCII with no column delimiter. Numeric and binary data types are right-justified on a field of n blanks, where n is the maximum number of bytes needed for any value of that type. Character data types are left-justified on a field of n blanks.

NoteThe minimum column width in a fixed-width ASCII extraction is four bytes to allow the string “NULL” for a NULL value. For example, if the extracted column is CHAR(2) and Temp_Extract_Column_Delimiter is set to the empty string '', there are two spaces after the extracted data.

During ASCII extraction, the following options control the use of quotes:

Option

ASCII extraction action

Temp_Extract_Quotes

string fields enclosed in quotes

Temp_Extract_Quotes_All

all fields enclosed in quotes

Temp_Extract_Quote

specifies string to be used as the quote

The quote string specified in the Temp_Extract_Quote option has the same restrictions as delimiters. The default for this option is the empty string, which IQ converts to the single quote mark.

Representation of null values

The Temp_Extract_Null_As_Zero and Temp_Extract_Null_As_Empty options controls the representation of null values for ASCII extractions. When the Temp_Extract_Null_As_Zero option is set to ON, a null value is represented as follows:

When the Temp_Extract_Null_As_Empty option is set to ON, a null value is represented as '' (the empty string) for all data types.

Note that the quotes shown above are not present in the extract output file. When the Temp_Extract_Null_As_Zero and Temp_Extract_Null_As_Empty options are set to OFF, the string 'NULL' is used in all cases to represent a NULL value. OFF is the default value.

If Temp_Extract_Null_As_Zero is set to ON, the number of characters that an ASCII extract writes to a file for a CHAR or VARCHAR column equals the number of characters in the column, even if that number is less than four. In previous releases, Sybase IQ always returned at least four characters to accommodate the word NULL.

Message logging

When the Query_Plan option is ON, a timestamped list of the extracted columns appears in the IQ message log.

Enabling the data extraction facility

The data extraction options must be used with care.

WARNING! If you set the extract options, then execute a SELECT statement, and then execute a second SELECT statement without changing the extract file name, the output of the second SELECT overwrites the output of the first SELECT. Each time you execute a SELECT statement, whether it is one second later or a week later, extract starts over again, unless the Temp_Extract_Append option is set ON.

Also be aware that the extract options are set for the connection. If you set the extract options and another user connects to the database using the same connection, the extract facility is also enabled for that user. Your extraction output can be overwritten by another user on the same connection.

Similarly, if another user logs in using the same user ID, the output of queries run by this user is directed to the extract file until the option is disabled. If you are using extract, be sure to run your request from a unique user ID.

StepsEnabling data extraction options

  1. Save in a different file any old extract output you need to retain.

  2. Remove any previously used extract output files.

  3. Set the extraction options you require, making sure to set Temp_Extract_Name1 to the file path that is to receive the output.

    SETTEMPORARY ] OPTION option-name = option-value

  4. Issue a SELECT statement to extract the data you want.

  5. Reset Temp_Extract_Name1 to the empty string, or disconnect if set temporarily, when you are done with extractions.

Examples

Extracting to a single disk file The following statements extract to the single disk file daily_report.txt:

SET TEMPORARY OPTION Temp_Extract_Name1 = 'daily_report.txt';
SET TEMPORARY OPTION Temp_Extract_Name2 = '';
SELECT ....;
SET TEMPORARY OPTION Temp_Extract_Name1 = '';

Note that Temp_Extract_Name2 is set to the empty string before the SELECT statement is executed, to restrict output to a single file.

Also note that Temp_Extract_Name1 is set to the empty string after the SELECT statement to disable extraction. If extraction is not disabled, then the next SELECT statement executed overwrites the file daily_report.txt.

Extracting in append mode In this example, the disk output file hourly_report.txt is already created and has write permission set for the user sybase. The following statements extract to hourly_report.txt, appending the output from each SELECT statement to the end of the file:

SET TEMPORARY OPTION Temp_Extract_Append = ON;
SET TEMPORARY OPTION Temp_Extract_Name1 = 'hourly_report.txt';
SET TEMPORARY OPTION Temp_Extract_Name2 = '';
SELECT ....;
SELECT ....;
SELECT ....;
SET TEMPORARY OPTION Temp_Extract_Name1 = '';

All of the output from the three SELECT statements is written to the file hourly_report.txt. Temp_Extract_Name1 is set to the empty string after the last SELECT statement to disable extraction. If extraction is not disabled, then the output from the next SELECT statement executed is also added to the end of the file hourly_report.txt.

Extracting to multiple disk files The following statements extract to disk files file1.out, file2.out, and file3.out.

First set the file name options:

SET TEMPORARY OPTION Temp_Extract_Name1 = 'file1.out';
SET TEMPORARY OPTION Temp_Extract_Name2 = 'file2.out';
SET TEMPORARY OPTION Temp_Extract_Name3 = 'file3.out';
SET TEMPORARY OPTION Temp_Extract_Name4 = '';

Now limit the size of the files, for example to 1MB each, by setting the corresponding extract size options:

SET TEMPORARY OPTION Temp_Extract_Size1 = '1024';
SET TEMPORARY OPTION Temp_Extract_Size2 = '1024';
SET TEMPORARY OPTION Temp_Extract_Size3 = '1024';

The size options are in KB (1024 bytes).

With these settings, the extraction output is first written to file1.out. When the next row to be written to file1.out would cause the file size to exceed 1MB, the output is redirected to file2.out. When file2.out is full (writing another row to file2.out would cause the file size to exceed 1MB), the output is redirected to file3.out. An error is reported, if the size of file3.out exceeds 1MB before IQ extracts all rows.

Extraction limitations

The following restrictions and limitations apply to the data extraction facility:

Also note that when Temp_Extract_Name1 is set, you cannot perform these operations:

Extraction and events

Events do not support execution of statements that return result sets. The server log returns an error similar to the following:

Handler for event 'test_ev' caused SQLSTATE '09W03'
Result set not permitted in 'test_ev'

In order to execute a query through an event, create an event that calls a stored procedure and insert the stored procedure results into a temporary table. If extract is used, then the temporary table is always empty and requires little overhead.

For example:

CREATE PROCEDURE proc1()
BEGIN
	SET TEMPORARY OPTION temp_extract_name1 = 'testproc.out';
	SELECT * FROM iq_table;
END;

CREATE EVENT "test_ev" ENABLE HANDLER
BEGIN
	SELECT * INTO #tmp FROM proc1();
END;

TRIGGER EVENT test_ev;

Extraction with named pipes

Sybase IQ opens named pipes using nonblocking mode. This means that processes reading extract output from a named pipe should read from the pipe until there is no more data to be read, then close the pipe. Sybase IQ sends output data to the named pipe until there is no more data. On UNIX and Linux systems, Sybase IQ closes the named pipe when there is no more data to output. On Windows systems, Sybase IQ sends all of the output data, waits for the process reading the data to finish reading, then Sybase IQ closes the named pipe. Design applications to be used for reading these named pipes accordingly.