File CSV Input Adapter

Adapter type: dsv_in. The File CSV Input adapter reads a file in Event Stream Processor delimited format.

Use this adapter to poll new data appended to the data file. The file does not require a header. If the file includes a header, it specifies the field names.

Sample record formats for the data file:
1. hasHeader=true
delimiter=,
expectStreamNameOpcode=false

Ts,ItemID,Price,Quantity,WarehouseZipCode,DeliveryZipCode
2004/06/17 10:00:00.000000,SKU1276532,50.00,1,10012,94086
2004/06/17 10:00:05.000000,SKU6723143,23.00,2,10012,94043

2. expectStreamNameOpcode=true
delimiter=,

Trades_in,i,2004/06/17 10:00:00.000000,SKU1276532,50.00,1,10012,94086
Trades_in,i,2004/06/17 10:00:05.000000,SKU6723143,23.00,2,10012,94043

3. expectStreamNameOpcode=false
timestampFormat=%Y/%m/%d %H:%M:%S
delimiter=,

2004/06/17 10:00:00.000000,SKU1276532,50.00,1,10012,94086
2004/06/17 10:00:05.000000,SKU6723143,23.00,2,10012,94043

This adapter supports schema discovery in normal mode only. If you use the CCL ATTACH ADAPTER statement to attach an adapter, you must supply the adapter type.

The File CSV Input adapter operates in two loading modes: dynamic and normal. In normal mode, the adapter reads records from a single source file until there are no more records or you stop the process. In dynamic mode, the adapter reads records from a source directory on a first-come, first-served basis. These files follow a format defined by a regular expression or the filePattern parameter. You can also specify a POSIX standard regular expression, such as [a-z]{4}\.csv, or a wildcard character sequence that is supported by the native operating system, such as *.csv. For example, if the expression is *.csv, then the adapter searches for .csv files only. After reading each file, the adapter checks the directory for new files and continues processing. If there is an error in opening or reading a file, the adapter skips it.

In dynamic mode, if a file is overwritten or updated after the adapter processes it and the reprocess parameter is set to false, the file it is not processed again. If the removeAfterProcess parameter is set to false, the adapter does not delete the file after it processes it. Both reprocess and removeAfterProcess can be set to false simultaneously, but only one can be set to true at a time.

Property Label Description
Directory

Property ID: dir

Type: string

(Required for adapter operation and schema discovery) Specify the absolute path to the data files you want the adapter to read. For example, <username>/<folder name>. Default value is "." (current directory in which the Server is running).

Use a forward slash for both UNIX and Windows paths.

File (in Directory)

Property ID: file

Type: tables

(Dependent required) File to read. Set only in normal mode; in dynamic mode, the value is ignored. No default value..

Stream name, opcode expected

Property ID: expectStreamNameOpcode

Type: boolean

(Required) If true, the adapter interprets the first two fields as stream name and opcode respectively. Messages with unmatched stream names are discarded.

When you are using schema discovery on this adapter with this property enabled, two columns for the stream name and opcode are created in the schema. Manually remove these two columns from your schema. Default value is false.

Field Count

Property ID: fieldCount

Type: uint

(Optional) Count of fields in CSV file, if different from the value for the source stream. Default value is 0.

Repeat Date Field Count

Property ID: repeatCount

Type: int

(Optional) Number of times the input data is repeated. If set to -1, the input data is repeated indefinitely. Default value is 0.
Note: You can use this parameter to test a continuous streaming source.
Repeat Data Field Name

Property ID: repeatField

Type: string

(Optional) On each repeat, increment the value in this field. You must specify the stream column if repeatCount has a nonzero value. Default value is a hyphen (-).
  • If repeatCount has a nonzero value, specify the stream column name.
  • If the repeatColumn is a key column in the stream, ensure there are no duplicates when specifying multiple rows in the input file.
  • If the adapter is attached to a window, the repeatField must be a key column.
Delimiter

Property ID: delimiter

Type: string

(Advanced) Symbol used to separate the column. Default value is a comma ( , ).

Has Header

Property ID: hasHeader

Type: boolean

(Advanced) Determines whether the first line of the file contains the description of the fields. Default value is false

.
Directory (runtime)

Property ID: runtimeDir

Type: runtimeDirectory

(Advanced) Location of the data files at runtime, if the value is different from the location defined at discovery time. No default value.

File Pattern

Property ID: filePattern

Type: string

(Advanced) In dynamic mode, the filePattern looks up files in a directory. In normal mode, it is the pattern used to look up files for discovery.

If both filePattern and fileRegex are specified, filePattern is ignored and fileRegex is used.

filePattern allows wildcard characters supported by the native operating system. Default value is *.csv.
Regular Expression

Property ID: fileRegex

Type: string

(Optional) In dynamic mode, the regular expression property allows the user to specify a POSIX regular expression standard to find matching file names in the directory. If both filePattern and fileRegex are specified, then filePattern is ignored and fileRegex is used. No default value.

Poll Period (seconds)

Property ID: pollperiod

Type: uint

(Advanced) In normal mode, specifies the period for polling a file to check for new records. pollperiod is ignored in dynamic mode. Default value is 0.

Directory Poll Period

Property ID:dirPollPeriod

Type: uint

(Advanced) In dynamic mode, continuously checks for new files in the directory. Default value is 60 seconds.

Convert to Safe Opcodes

Property ID: safeOps

Type: boolean

(Advanced) Converts the opcodes INSERT and UPDATE to UPSERT, and converts DELETE to SAFEDELETE. Default value is false.

Skip Deletes

Property ID: skipDels

Type: boolean

(Advanced) Skips the rows with opcodes DELETE or SAFEDELETE. Default value is false.

Date Format

Property ID: dateFormat

Type: string

(Advanced) Format string for parsing date values. Default value is %Y-%m-%dT%H:%M:%S.

Timestamp Format

Property ID: timestampFormat

Type: string

(Advanced) Format string for parsing timestamp values. Default value is %Y-%m-%dT%H:%M:%S.

Block Size

Property ID: blockSize

Type: int

(Advanced) Number of records to block into one pseudotransaction. Default value is 1.

Use Envelopes

Property ID: useEnvelopes

Type: boolean

(Advanced) Specify the block type the adapter uses to pass data to the engine. If you specify a blockSize property greater than zero, by default, the adapter packages rows into transaction blocks to send to the engine. To get the adapter to package rows into envelope blocks instead, set this property to true. Default value is false.

Field Mapping

Property ID: permutation

Type: permutation

Mapping between Event Stream Processor and external fields, for example:

<esp_columnname>=<database_columnname>:<esp_columnname>=<database_columnname>. No default value.

PropertySet

Property ID: propertyset

Type: string

(Advanced) Specifies the name of the property set. Property sets are reusable sets of properties that are stored in the project configuration file. Using these sets allows you to move adapter configuration properties out of the CCL file and into the CCR file. If you specify the same properties in the project configuration file and the ATTACH ADAPTER statement, the values in the property set override the values defined in the ATTACH ADAPTER statement. No default value.

Dynamic Loading

Property ID: dynamicMode

Type: boolean

(Optional) Set to true to enable dynamic loading mode, which causes the adapter to read records from files as they arrive in the directory. Records are processed one after another on a first-come, first-serve basis. Default value is false.

Remove File After Processing

Property ID: removeAfterProcess

Type: boolean

(Optional) In dynamic mode, if removeAfterProcess is set to true, the file is removed from the directory after the adapter processes it. You cannot set this property to true if reprocess is also set to true. Default value is false

.
Reprocess File

Property ID: reprocess

Type: boolean

(Optional) In dynamic mode, if reprocess is set to true, the file is processed again after being updated or overwritten. You cannot set this property to true if removeAfterProcess is also set to true. Default value is false.

Known limitations:
  • In normal mode only, when polling, you can append to the file, but cannot overwrite or replace it. The stream names in the file rows are ignored and all data is sent to the same stream.
  • For discovery to work correctly, set the delimiter character and the header presence flag to match the actual data.
  • Do not mix files with different delimiters or files with and without headers in the same directory. Files with wrong delimiters or headers are incorrectly discovered.
Related reference
Adapter Support for Schema Discovery