CSV Data Stream Format

Sybase CEP Engine can use CSV Data Stream Format, which is compatible with the CSV data files format used by Microsoft Excel and many other applications.

The box below shows the valid syntax for the Data Stream. The syntax descriptions include characters (such as square brackets) that are not literals in this context but instead have special meanings. These special meanings are explained below:

Square Brackets:

Square brackets ("[]") indicate optional items.

For example, in the following line:

[ too ] bar

the "too" is optional, while the "bar" is required. If the "too" occurs, it must occur only once.
Ellipsis:

An ellipsis ("...") after an element means that the element may be repeated one or more times. For example, in the following line:

PLAIN_CHAR ...

the ellipsis means that after the first PLAIN_CHAR you may have additional PLAIN_CHARs.
Capitalized Words:

To avoid confusion with punctuation marks when a punctuation mark is part of what you must type, the punctuation mark is written as a capitalized word. For example, "COMMA ColumnName" means to put the comma character (",") followed by a column name. Similarly, QUOTE means to put one double quote character ( " ' ").

Putting all this together, the following line

Tuple = [Field [ , Field ... ] ] NEWLINE

means that a tuple can have 0 fields, 1 field, or more fields, and if there is more than 1 field then the fields must be separated by commas. The Tuple must be terminated with a NEWLINE. For example, all of the following are valid values for Tuple:

col1

col1, col2

col1, col2, col3

Each of these would of course be terminated with a NEWLINE. An empty line terminated with a NEWLINE would also be valid.

Stream = [ Headers ] [ Tuple ]
Tuple = [ Field [ , Field ... ] ] NEWLINE
   
Note: Fields correspond to tuple field values.
NonEmptyTuple = Field [ , Field ... ]
Headers = <same as Tuple, but fields correspond to tuple field names.>
Field = [QUOTE] [ PLAIN_CHAR ...] [QUOTE] | 
  " [ (ANY_NONQT_CHAR | DOUBLEQUOTE) ... ] "
QUOTE = single quote (ascii 0x22) or double quote ascii 0x27). 
Note that if you start a quoted string with a particular type 
of quote, you must close with the same type of quote.  For example: 
these are invalid: 
    'Hi"
    "Bye'
DOUBLEQUOTE = '""' representing a single '"' within the field value.
ANY_NONQT_CHAR = <any ASCII character except for '"' (0x22) and 
    including line separators>
PLAIN_CHAR = <any printable ASCII char except ',' and '"',
   i.e. ASCII 0x21 and ASCII 0x23 - 0x7e>
NEWLINE = CR  |  LF  |  CR LF
CR = the Carriage Return character (0x0D)
LF = the Line Feed character (0x0A)

By default, Sybase CEP reads and writes the NEWLINE using the convention for the current platform, for example, CR LF for Microsoft Windows and LF for UNIX-like operating systems.

A LineEndCRLF format option may be used to specify the line ending. If the value of this option is 'true', CRLF line ending is used; if it is 'false', LF ending is used.

For example, the CSV trades stream can look like the following:

Timestamp,Symbol,Price,Volume\r\n
"2005/01/28 10:23:54",ABC,11.40,300000\r\n
"2005/01/28 10:23:55",XYZ,32.84,1260000\r\n"
"2005/01/28 10:24:06",XYZ,32.74,6300000\r\n
"2005/01/28 10:24:32",ABC,12.01,50000\r\n
...

The CSV Data Stream format has the following options:

TitleRow (true or false) specifies whether the title row is present or not.
TimestampColumn (true or false) specifies whether the first column is the message's timestamp column or not.
TimestampColumnFormat (string) specifies the format of the TimestampColumn if it exists. If the format is not specified, then the row timestamp will be a 64-bit integer number whose value is the number of microseconds since midnight January 1, 1970 GMT.

Note that for input adapters:

If you do not use a title row, then the number of fields and the order of fields are assumed to be exactly the same as the number and order specified in the stream's schema.
If you do use a title row, then
- The column names in the title row must exactly match the column names specified in the stream's schema.
- The columns may appear in any order, as long as the order of the data values is the same as the order of the column names in the title row. The only exception is that if you use a row timestamp, then the row timestamp must always be the first column, and the name of that column does not matter.
- If there are "extra" columns in the input, those are ignored; only the columns whose names match are used.

For output adapters:

The order of the columns is the same as the order specified by the schema. There is no way to change this. If you specify that the row timestamp should be included in the output, then the row timestamp is always the first column.)
If you specify that the output should include a title row, the names of the columns in that title row are exactly the same as the names in the schema.