The data extraction facility improves performance dramatically for queries with a large result set. This facility currently consists of a set of database options, which are set using the SET OPTION command. Like other database options, the data extraction options can be set either as temporary or permanent. Ordinarily these options are set as temporary. The extract options are set for a connection.
The extract options allow the user to redirect the output of a SELECT statement from the standard interface to go directly to one or more disk files or named pipes. There are two advantages of using the extract options:
A binary format is supported, which allows loading the output data into the same or a different IQ database.
A SELECT statement with heavy output will run up to 4 times faster for ASCII output and up to 9 times faster for binary output.
There are 27 options that control the behavior of extract (listed with allowed values for the option, followed by the default value):
Option Name |
Allowed Values |
Default value |
---|---|---|
Temp_Extract_Append |
ON or OFF |
OFF |
Temp_Extract_Binary |
ON or OFF |
OFF |
Temp_Extract_Column_Delimiter |
string |
',' |
Temp_Extract_Directory |
string |
'' |
Temp_Extract_Name1 |
string |
'' |
Temp_Extract_Name2 |
string |
'' |
Temp_Extract_Name3 |
string |
'' |
Temp_Extract_Name4 |
string |
'' |
Temp_Extract_Name5 |
string |
'' |
Temp_Extract_Name6 |
string |
'' |
Temp_Extract_Name7 |
string |
'' |
Temp_Extract_Name8 |
string |
'' |
Temp_Extract_Null_As_Empty |
ON or OFF |
OFF |
Temp_Extract_Null_As_Zero |
ON or OFF |
OFF |
Temp_Extract_Quote |
string |
'' |
Temp_Extract_Quotes |
ON or OFF |
OFF |
Temp_Extract_Quotes_All |
ON or OFF |
OFF |
Temp_Extract_Row_Delimiter |
string |
'' |
Temp_Extract_Size1 |
platform specific* |
0 |
Temp_Extract_Size2 |
platform specific* |
0 |
Temp_Extract_Size3 |
platform specific* |
0 |
Temp_Extract_Size4 |
platform specific* |
0 |
Temp_Extract_Size5 |
platform specific* |
0 |
Temp_Extract_Size6 |
platform specific* |
0 |
Temp_Extract_Size7 |
platform specific* |
0 |
Temp_Extract_Size8 |
platform specific* |
0 |
Temp_Extract_Swap |
ON or OFF |
OFF |
*The default values for the Temp_Extract_Sizen options are platform specific:
AIX and HP-UX: 0 - 64GB
Sun Solaris: 0 - 512GB
Windows: 0 - 128GB
Linux: 0 - 512GB
When large file systems, such as JFS2, support file size larger than the default value, set TEMP_EXTRACT_SIZEn to the value that the file system allows. For example, to support lTB set option:
TEMP_EXTRACT_SIZE1 = 1073741824 KB
For all database options that accept integer values, Sybase IQ truncates any decimal option-value setting to an integer value. For example, the value 3.8 is truncated to 3.
The most important of these options is TEMP_EXTRACT_NAME1. If TEMP_EXTRACT_NAME1 is set to its default setting (the empty string), extraction is disabled and no output is redirected. To enable extraction, set Temp_Extract_Name1 to a possible pathname. Extract starts extracting into a file with that name. Be sure to choose the pathname to a file that is not otherwise in use. If the file does not already exist, the data extraction facility creates the file.
TEMP_EXTRACT_NAME1 is also used to specify the name of the output file, when the TEMP_EXTRACT_APPEND option is set ON. Both the directory or folder containing the named file and the named file must have write permission set for the user name used to start IQ (for example, sybase). In append mode, the data extraction facility adds extracted rows to the end of the file and does not overwrite the data that is already in the file. If the file does not exist, the data extraction facility creates the file.
WARNING! If you choose the pathname of an existing file and the TEMP_EXTRACT_APPEND option is set OFF (the default), the file contents will be overwritten. This may be what you want if the file is for a weekly report, for example, but is not what you want if the file is one of your database files.
The options TEMP_EXTRACT_NAME2 through TEMP_EXTRACT_NAME8 are used in addition to TEMP_EXTRACT_NAME1 to specify the names of multiple output files. These options must be used sequentially. For example, TEMP_EXTRACT_NAME3 has no effect unless both the options TEMP_EXTRACT_NAME1 and TEMP_EXTRACT_NAME2 are already set.
The options TEMP_EXTRACT_SIZE1 through TEMP_EXTRACT_SIZE8 are used to specify the maximum size of the corresponding output files. TEMP_EXTRACT_SIZE1 specifies the maximum size of the output file specified by TEMP_EXTRACT_NAME1, TEMP_EXTRACT_SIZE2 specifies the maximum size of the output file specified by TEMP_EXTRACT_NAME2, and so on.
Note that the default for the data extraction size options is 0. IQ converts this default to the following values:
device type |
size |
---|---|
disk file |
AIX and HP-UX: 0 – 64GB Sun Solaris & Linux: 0 – 512GB Windows: 0 – 128GB |
tape* |
524288KB (0.5GB) |
other |
unlimited |
*Tape devices currently are not supported.
TEMP_EXTRACT_APPEND is not compatible with the TEMP_EXTRACT_SIZEn options. If you try to restrict the size of the extract append output file, Sybase IQ reports an error.
If you are extracting to a single disk file or a single named pipe, leave the options TEMP_EXTRACT_NAME2 through TEMP_EXTRACT_NAME8 and TEMP_EXTRACT_SIZE1 through TEMP_EXTRACT_SIZE8 at their default values.
If the SELECT returns no rows and there is no output to redirect, an empty file of zero length is created. If multiple extract files are specified and there is not enough data to fill all of the files, all of the files are still created.
The TEMP_EXTRACT_DIRECTORY option controls whether a user is allowed to use the data extraction facility. It also controls the directory into which temp extract files are placed and overrides a directory path specified in the TEMP_EXTRACT_NAMEn options.
If the TEMP_EXTRACT_DIRECTORY option is set to the string FORBIDDEN (case insensitive) for a user, then that user is not allowed to perform data extracts. An attempt by this user to use the data extraction facility results in an error: “You do not have permission to perform Extracts”.
If TEMP_EXTRACT_DIRECTORY is set to FORBIDDEN for the PUBLIC group, then no one can run data extraction.
If TEMP_EXTRACT_DIRECTORY is set to a valid directory path, temp extract files are placed in that directory, overriding a path specified in the TEMP_EXTRACT_NAMEn options.
If TEMP_EXTRACT_DIRECTORY is set to an invalid directory path, an error occurs: “Files does not exist File: <invalid path>”
If TEMP_EXTRACT_DIRECTORY is blank, then temp extract files are placed in directories according to their specification in TEMP_EXTRACT_NAMEn. If no path is specified as part of TEMP_EXTRACT_NAMEn, the extract files are by default placed in the server startup directory.
The TEMP_EXTRACT_DIRECTORY option provides increased security and helps control disk management by restricting the creation of large data extraction files to the directories for which a user has write access. DBA authority is required to set this option.
There are three types of data extraction:
binary
binary/swap
ASCII
A binary extraction produces a file that can be loaded via a LOAD TABLE statement with an overall "binary" format and with a per column "binary with null byte" format.
The binary/swap extraction is the same as the binary extraction, except it is designed to be loaded on another machine with opposite endianness.
The ASCII extraction produces a text file.
The two options Temp_Extract_Binary and Temp_Extract_Swap determine which of the three types of extraction is done:
Type |
Temp_Extract_Binary |
Temp_Extract_Swap |
---|---|---|
binary |
ON |
OFF |
binary/swap |
ON |
ON |
ASCII |
OFF |
OFF |
The default extraction type is ASCII.
Note that if the data is unloaded using the extraction facility with the TEMP_EXTRACT_BINARY option set ON, then you must use the LOAD TABLE statement BINARY WITH NULL BYTE parameter for each column when you load the binary data.
In the case of an ASCII extraction, the default is to separate column values with commas, and end the row with a newline on UNIX platforms and with a carriage return/newline pair on Windows platforms. The strings are unquoted. If these defaults are not suitable, use the following options to change the delimiters:
Temp_Extract_Column_Delimiter
Temp_Extract_Row_Delimiter
Temp_Extract_Quote
Temp_Extract_Quotes
Temp_Extract_Quotes_All
The delimiter must occupy from 1 to a maximum of 4 bytes and must be valid in the collation order you are using, if you are using a multibyte collation order. Be sure to choose delimiters that do not occur in any of the data output strings themselves.
Note that the default for the Temp_Extract_Row_Delimiter option is the empty string. IQ converts the empty string default for this option to the newline on UNIX platforms and to the carriage return/newline pair on Windows platforms.
The option Temp_Extract_Column_Delimiter controls the delimiter between columns. If this option is set to the empty string '' for ASCII extractions, then the extracted data is written in fixed-width ASCII with no column delimiter. Numeric and binary data types are right-justified on a field of n blanks, where n is the maximum number of bytes needed for any value of that type. Character data types are left-justified on a field of n blanks.
The minimum column width in a fixed-width ASCII extraction is four bytes to allow the string “NULL” for a NULL value. For example, if the extracted column is CHAR(2) and Temp_Extract_Column_Delimiter is set to the empty string '', there are two spaces after the extracted data.
During ASCII extraction, the following options control the use of quotes:
Option |
ASCII extraction action |
---|---|
Temp_Extract_Quotes |
string fields enclosed in quotes |
Temp_Extract_Quotes_All |
all fields enclosed in quotes |
Temp_Extract_Quote |
specifies string to be used as the quote |
The quote string specified in the Temp_Extract_Quote option has the same restrictions as delimiters. The default for this option is the empty string, which IQ converts to the single quote mark.
The Temp_Extract_Null_As_Zero and Temp_Extract_Null_As_Empty options controls the representation of null values for ASCII extractions. When the Temp_Extract_Null_As_Zero option is set to ON, a null value is represented as follows:
'0' for arithmetic type
'' (the empty string) for the CHAR and VARCHAR character types
'' (the empty string) for dates
'' (the empty string) for times
'' (the empty string) for timestamps
When the Temp_Extract_Null_As_Empty option is set to ON, a null value is represented as '' (the empty string) for all data types.
Note that the quotes shown above are not present in the extract output file. When the Temp_Extract_Null_As_Zero and Temp_Extract_Null_As_Empty options are set to OFF, the string 'NULL' is used in all cases to represent a NULL value. OFF is the default value.
If Temp_Extract_Null_As_Zero is set to ON, the number of characters that an ASCII extract writes to a file for a CHAR or VARCHAR column equals the number of characters in the column, even if that number is less than four. In previous releases, Sybase IQ always returned at least four characters to accommodate the word NULL.
When the Query_Plan option is ON, a timestamped list of the extracted columns appears in the IQ message log.
The data extraction options must be used with care.
WARNING! If you set the extract options, then execute a SELECT statement, and then execute a second SELECT statement without changing the extract filename, the output of the second SELECT overwrites the output of the first SELECT. Each time you execute a SELECT statement, whether it is one second later or a week later, extract starts over again, unless the Temp_Extract_Append option is set ON.
Also be aware that the extract options are set for the connection. If you set the extract options and another user connects to the database using the same connection, the extract facility is also enabled for that user. Your extraction output can be overwritten by another user on the same connection.
Similarly, if another user logs in using the same user ID, the output of queries run by this user is directed to the extract file until the option is disabled. If you are using extract, be sure to run your request from a unique user ID.
Enabling data extraction options
Save in a different file any old extract output you need to retain.
Remove any previously used extract output files.
Set the extraction options you require, making sure to set Temp_Extract_Name1 to the file path that is to receive the output.
SET [ TEMPORARY ] OPTION option-name = option-value
Issue a SELECT statement to extract the data you want.
Reset Temp_Extract_Name1 to the empty string, or disconnect if set temporarily, when you are done with extractions.
Extracting to a single disk file The following statements extract to the single disk file daily_report.txt:
SET TEMPORARY OPTION Temp_Extract_Name1 = 'daily_report.txt'; SET TEMPORARY OPTION Temp_Extract_Name2 = ''; SELECT ....; SET TEMPORARY OPTION Temp_Extract_Name1 = '';
Note that Temp_Extract_Name2 is set to the empty string before the SELECT statement is executed, to restrict output to a single file.
Also note that Temp_Extract_Name1 is set to the empty string after the SELECT statement to disable extraction. If extraction is not disabled, then the next SELECT statement executed overwrites the file daily_report.txt.
Extracting in append mode In this example, the disk output file hourly_report.txt is already created and has write permission set for the user sybase. The following statements extract to hourly_report.txt, appending the output from each SELECT statement to the end of the file:
SET TEMPORARY OPTION Temp_Extract_Append = ON; SET TEMPORARY OPTION Temp_Extract_Name1 = 'hourly_report.txt'; SET TEMPORARY OPTION Temp_Extract_Name2 = ''; SELECT ....; SELECT ....; SELECT ....; SET TEMPORARY OPTION Temp_Extract_Name1 = '';
All of the output from the three SELECT statements is written to the file hourly_report.txt. Temp_Extract_Name1 is set to the empty string after the last SELECT statement to disable extraction. If extraction is not disabled, then the output from the next SELECT statement executed is also added to the end of the file hourly_report.txt.
Extracting to multiple disk files The following statements extract to disk files file1.out, file2.out, and file3.out.
First set the filename options:
SET TEMPORARY OPTION Temp_Extract_Name1 = 'file1.out'; SET TEMPORARY OPTION Temp_Extract_Name2 = 'file2.out'; SET TEMPORARY OPTION Temp_Extract_Name3 = 'file3.out'; SET TEMPORARY OPTION Temp_Extract_Name4 = '';
Now limit the size of the files, for example to 1MB each, by setting the corresponding extract size options:
SET TEMPORARY OPTION Temp_Extract_Size1 = '1024'; SET TEMPORARY OPTION Temp_Extract_Size2 = '1024'; SET TEMPORARY OPTION Temp_Extract_Size3 = '1024';
The size options are in KB (1024 bytes).
With these settings, the extraction output is first written to file1.out. When the next row to be written to file1.out would cause the file size to exceed 1MB, the output is redirected to file2.out. When file2.out is full (writing another row to file2.out would cause the file size to exceed 1MB), the output is redirected to file3.out. An error is reported, if the size of file3.out exceeds 1MB before IQ extracts all rows.
The following restrictions and limitations apply to the data extraction facility:
Extract works only with data stored in the IQ store.
Extract does not work on system tables or cross database joins.
Extract does not work with queries that use user-defined functions or system functions, except for the system functions suser_id() and suser_name().
A binary LOAD TABLE always trims blanks from VARCHAR data. If you have VARCHAR data with trailing blanks, they are not preserved on insert by a binary load.
Trailing zeros are padded onto VARBINARY data during the extract. For example, a field declared as varbinary(6), which contains the data 0x1234, is padded with zeros during extraction and is loaded after extraction as 0x123400.
Binary format will change in a future release.
If you need to reproduce floating point data exactly, use the binary option.
Tape devices are not supported at this time.
Also note that when Temp_Extract_Name1 is set, you cannot perform these operations:
LOAD, DELETE, INSERT or INSERT...LOCATION to a table that is the top table in a join
SYNCHRONIZE JOIN INDEX (issued explicitly or executed as part of CREATE JOIN INDEX)
INSERT...SELECT
Events do not support execution of statements that return result sets. The server log returns an error similar to the following:
Handler for event 'test_ev' caused SQLSTATE '09W03' Result set not permitted in 'test_ev'
In order to execute a query through an event, create an event that calls a stored procedure and insert the stored procedure results into a temporary table. If extract is used, then the temporary table is always empty and requires little overhead.
For example:
CREATE PROCEDURE proc1() BEGIN SET TEMPORARY OPTION temp_extract_name1 = 'testproc.out'; SELECT * FROM iq_table; END; CREATE EVENT "test_ev" ENABLE HANDLER BEGIN SELECT * INTO #tmp FROM proc1(); END; TRIGGER EVENT test_ev;
Sybase IQ opens named pipes with non-blocking mode. This means that processes reading extract output from a named pipe should read from the pipe until there is no more data to be read, then close the pipe. Sybase IQ sends output data to the named pipe until there is no more data. On UNIX and Linux systems, Sybase IQ closes the named pipe when there is no more data to output. On Windows systems, Sybase IQ sends all of the output data, waits for the process reading the data to finish reading, then Sybase IQ closes the named pipe. Applications to be used for reading these named pipes should be designed accordingly.