Regular Expressions: Read From File Using a Regular Expression Adapter.

The Read From File Using a Regular Expression adapter reads an input file line by line, matches each line against a given regular expression, and produces rows.

Only rows that match all of the subexpressions are inserted into the stream, and only the portions of those rows that match the subexpressions are inserted into the stream. (Effectively, you are selecting both rows and columns from the input.)

POSIX regular expression syntax is used. This is compatible with Perl-style regular expressions. Use "..." to denote sub-expressions. The order of sub-expressions must match the Row descriptor order. Binding of RegEx matches to fields is performed by position. The name is ignored.

For example, if you search using the following regular expression:

/(J).*(SMITH)/

and the input rows are:

1) JOHN SMITH
2) JANE SMITHERS
3) JANE AUSTEN
4) DAVE SMITH
5) JANET SMITHSONIAN

This returns three rows (taken from rows 1, 2, and 5 above), each with 2 columns of output:

J SMITH
J SMITH
J SMITH

More sophisticated patterns allow you to extract a broader range of values, such as the complete last names.

Property Name (screen)

Property Name (Attach Adapter)

Type

Description

Filename

Filename

String

The name and path of the file to read data from. You may specify a file name and path either by typing it in or by clicking the Browse button for this field. The path is relative to the server's adapters base folder and must be underneath that base folder. For more information, see Setting The Base Folder For File Input/Output Adapters. For details about the Browse and Edit buttons to the right of the filename, see the discussion following this table.

Loop count

LoopCount

Integer Min: 0 Max: 2000000000 Default: 1

If the Loop count is 1, the file is read only once and the adapter stops sending data once the end of file is reached. If the Loop count is greater than 1, then after finishing reading the file, the adapter start readings again from the beginning. If the Loop Count is 0, the adapter repeats indefinitely.

Rate

Rate

Float

If this property is non-zero, then the adapter reads data from file at the given rate (per second). Any timestamps in the input file are ignored.

Timestamp Base

TimestampBase

Timestamp

The point at which time starts for this adapter, relative to the file's first timestamp. For example, if Timestamp base is 0 and the first row's timestamp is 5000000 (in microseconds), the first row is sent 5 seconds (5000000 microseconds) after the module starts. If blank, the first row is sent immediately after the module starts.

Set Timestamp To Current Time

UseCurrentTimestamp

Boolean

If set to true, the adapter overrides the timestamp specified in the file with the current system time. Defaults to false.

Regular expression for parsing rows

RegEx

String

Regular expression to use when parsing rows.

Fields

Fields

String

Comma-separated list of fields corresponding to sub-expressions.

Ignore Mismatch

IgnoreMismatch

Boolean

If true, a line not matching the regular expression is ignored. If false, a line that does not match the regular expression causes the adapter to raise an error condition and stops.

Log Mismatch

LogMismatch

Boolean

If true, lines that don't match the regular expression are logged. If false, the mismatched lines are dropped silently. This option is only used when Ignore Mismatch option is set to true.

Timestamp column format

TimestampColumnFormat

String

Specifies the format of the timestamp columns (for example, YYYY/MM/DD HH24:MI:SS.FF). If no timestamp format is specified, the adapter assumes that the timestamp is represented as a number of microseconds from 00:00:00 Jan 1, 1970 UTC/GMT. . Note that this format specifier applies to all input columns of type TIMESTAMP, not only to the row timestamp column. This means that ALL columns of type TIMESTAMP must be formatted the same way; you cannot specify independent formats for each TIMESTAMP column. For more information, see Reading, Writing, and Converting Timestamps

This adapter always has a 3-second startup delay before the first row is sent. If looping is used, the 3-second delay occurs only on the first iteration of the loop, not subsequent iterations.

Note: Each input stream has a property (see the stream's Properties tab in Studio) that can specify whether to use the current server timestamp value instead of the row timestamp set by the adapter. If this stream property is set to true, it overrides any row timestamp set by the adapter

The "Browse" button for the filename property: The Adapter Properties screen in Sybase CEP Studio allows you to specify an input path and file name for the file by clicking the Browse button and identifying the file. The path is relative to the adapters base folder, either as specified for Sybase CEP Server running on the same computer as Sybase CEP Studio, or as specified in the preferences for Sybase CEP Studio. For more information, see Setting The Base Folder For File Input/Output Adapters. In order for this feature to work properly, make sure that the Adapter's Base Folder field in Studio Settings is set to the same folder as the adapters base folder for Sybase CEP Server. The Base Folder setting for the Server is specified during the installation process, and can also be changed later in the Sybase CEP Server's c8-server.conf file. The base folder setting for Sybase CEP Studio can be set from the Tools->Settings command on the Sybase CEP Studio menu.

The "Edit" button for the filename property: The edit button opens an editor that will allow you to edit any file whose name you entered into the filename field. This allows you to correct errors in the data. If the file's extension is "csv" or "xml", Sybase CEP Studio opens the appropriate editor specified in the "External Tools" tab available from the menu item "Tools -> Settings". For files with other extensions, on Microsoft Windows the editor is the one specified by the operating system's file associations. On UNIX-like operating systems, Sybase CEP Studio opens the editor specified by the EDITOR environment variable.

Note:

When you click the Edit button, Sybase CEP Studio will look for the file in the Sybase CEP Repository, even if the adapters base folder is set to another location. You may need to use the Browse button (adjacent to the Edit button) to navigate to the desired directory before you try to edit the file.