File/Hadoop Event XML Input Adapter Configuration

Configure the File/Hadoop Event XML Input adapter by specifying values for the ESP connector, formatter, and transporter modules in the adapter configuration file.

Logging

XML Element Description
Log4jProperty

Type: string

(Optional) Specify a full path to the log4j.properties logging file you wish to use. The default value is $ESP_HOME/adapters/framework/config/log4j.properties.

Transporter Module: File Input Transporter

The File Input transporter reads data from local files, wraps the data with string, and sends it to the next module specified in the adapter configuration file. Set values for this transporter in the adapter configuration file.

XML Element Description
Module

(Required) Element containing all information for this module. It contains a type attribute for specifying the module type.

For example, transporter.

InstanceName

Type: string

(Required) Instance name of the specific module you want to use. For example, MyInputTransporter.

Name

Type: string

(Required) Name of the module as defined in the modulesdefine.xml file. For example, <TransporterType>InputTransporter.

Next

Type: string

(Required) Instance name of the module that follows this one.

BufferMaxSize

Type: integer

(Advanced) Capacity of the buffer queue between this module and the next. The default value is 10240.

Parameters

(Required) Element containing the FileInputTransporterParameters element.

FileInputTransporterParameters

(Required) Element containing elements for the File Input transporter.

Dir

Type: string

(Required) Specify the absolute path to the data files which you want the adapter to read. For example, <username>/<foldername>. No default value.

To use Hadoop system files, use an HDFS folder uri instead of a local file system folder. For example, hdfs://<hdfsserver>:9000/<foldername>/<subfoldername>/<leaffoldername>.

To use Hadoop, download the binaries for Hadoop version 1.2.1 from http://hadoop.apache.org. Copy the hadoop-core.jar file (for example, for version 1.2.1 hadoop-core-1.2.1.jar) to %ESP_HOME%\adapters\framework\libj. Ensure you use a stable version rather than a beta.

Use a forward slash for both UNIX and Windows paths.

File

Type: string

(Required) Specify the file you want the adapter to read or the regex pattern to filter the files on a given directory. See the DynamicMode element. No default value.

AccessMode

Type: string

(Required) Specify an access mode:
  • rowBased – the adapter reads one text line at a time.
  • Streaming – the adapter reads a preconfigured size of bytes into a buffer.
No default value.
DynamicMode

Type: string

(Advanced) Specify a dynamic mode:
  • Static – the adapter reads the file specified in the Dir and File elements.
  • dynamicFile – the adapter reads the file specified in the Dir and File elements and keeps polling the new appended content. The polling period is specified in the PollingPeriod element.
  • dynamicPath – the adapter polls all the new files under the Dir element. Also, the File element acts as a regex pattern and filters out the necessary files.
The default value is Static. If DynamicMode has been set to dynamicPath and you leave the File element empty, the adapter reads all the files under the specified directory.

An example regex pattern is ".*\.txt", which selects only files that end with ".txt". In regex patterns, you must include an escape character, "\", before meta chars to include them in the pattern string.

PollingPeriod

Type: integer

(Advanced) Define the period, in seconds, to poll the specified file or directory. Set this element only if the value of the DynamicMode element is set to dynamicFile or dynamicPath.

The default value is 0, which, along with all other values less than 0, turns off polling.

RemoveAfterProcess

Type: boolean

(Optional) If this property is set to true, the file is removed from the directory after the adapter processes it. This element takes effect if the value of the DynamicMode element is set to dynamicPath and ignored if it is set to dynamicFile instead.

The default value is false.

ScanDepth

Type: integer

(Optional) Specify the depth of the schema discovery. The adapter reads the number of rows specified by this element value when discovering the input data schema.

The default value is three.

Formatter Module: XML String to ESP Formatter

The XML String to ESP formatter translates ESP XML strings to AepRecord objects.

XML Element Description
Module

(Required) Element containing all information for this module. It contains a type attribute for specifying the module type.

For example, formatter.

InstanceName

Type: string

(Required) Instance name of the specific module you want to use. For example, MyInputTransporter.

Name

Type: string

(Required) Name of the module as defined in the modulesdefine.xml file. For example, <TransporterType>InputTransporter.

Next

Type: string

(Required) Instance name of the module that follows this one.

BufferMaxSize

Type: integer

(Advanced) Capacity of the buffer queue between this module and the next. The default value is 10240.

Parallel

Type: boolean

(Optional) If set to true, the module runs as a separated thread. The default value is true.

Parameters

(Required) Element containing the XmlStringToEspFormatterParameters element.

XmlStringToEspFormatterParameters

(Required) Element containing the XML String to ESP formatter elements.

DateFormat

Type: string

(Optional) Format string for parsing date values. For example, yyyy-MM-dd'T'HH:mm:ss.

TimestampFormat

Type: string

(Optional) Format string for parsing timestamp values. For example, yyyy-MM-dd'T'HH:mm:ss.SSS.

ESP Connector Module: ESP Publisher

The ESP Publisher module obtains data from a transporter or formatter module and publishes it to an ESP project.

XML Element Description
Module

(Required) Element containing all information for this module. It contains a type attribute for specifying the module type.

For example, formatter.

InstanceName

Type: string

(Required) Instance name of the specific module you want to use. For example, MyInputTransporter.

Name

Type: string

(Required) Name of the module as defined in the modulesdefine.xml file. For example, <TransporterType>InputTransporter.

BufferMaxSize

Type: integer

(Advanced) Capacity of the buffer queue between this module and the next. The default value is 10240.

Parameters

(Required) Element containing the EspPublisherParameters element.

EspPublisherParameters

(Required) Element containing elements for the ESP publisher.

ProjectName

Type: string

(Required if adapter is running in standalone mode; optional if it is running in managed mode) Name of the ESP project to which the adapter is connected. For example, EspProject2.

This is the same project tag that you specify later in the adapter configuration file in the Name element within the Event Stream Processor (EspProjects) element.

If you are starting the adapter with the ESP project to which it is attached (that is, running the adapter in managed mode), you need not set this element as the adapter automatically detects the project name.

StreamName

Type: string

(Required if adapter is running in standalone mode; optional if it is running in managed mode) Name of the ESP stream to which the adapter publishes data.

If you are starting the adapter with the ESP project to which it is attached (that is, running the adapter in managed mode), you need not set this element as the adapter automatically detects the stream name.

MaxPubPoolSize

Type: positive integer

(Optional) Maximum size of the record pool. Record pooling, also referred to as block or batch publishing, allows for faster publication since there is less overall resource cost in publishing multiple records together, compared to publishing records individually.

Record pooling is disabled if this value is 1. The default value is 256.

MaxPubPoolTime

Type: positive integer

(Optional) Maximum period of time, in milliseconds, for which records are pooled before being published. If not set, pooling time is unlimited and the pooling strategy is governed by maxPubPoolSize. No default value.

UseTransactions

Type: boolean

(Optional) If set to true, pooled messages are published to Event Stream Processor in transactions. If set to false, they are published in envelopes. The default value is false.

SafeOps

Type: boolean

(Advanced) Converts the opcodes INSERT and UPDATE to UPSERT, and converts DELETE to SAFEDELETE. The default value is false.

SkipDels

Type: boolean

(Advanced) Skips the rows with opcodes DELETE or SAFEDELETE. The default value is false.

Event Stream Processor Elements

Event Stream Processor elements configure communication between Event Stream Processor and the File/Hadoop Event XML Input adapter.

XML Element Description
EspProjects

(Required) Element containing elements for connecting to Event Stream Processor.

EspProject

(Required) Element containing the Name and Uri elements. Specifies information for the ESP project to which the adapter is connected.

Name

Type: string

(Required) Specifies the unique project tag of the ESP project which the EspConnector (publisher/subscriber) module references.

Uri

Type: string

(Required) Specifies the total project URI to connect to the ESP project. For example, esp://localhost:19011/ws1/p1.

Security

(Required) Element containing all the authentication elements below. Specifies details for the authentication method used for Event Stream Processor.

User

Type: string

(Required) Specifies the user name required to log in to Event Stream Processor (see AuthType). No default value.

Password

Type: string

(Required) Specifies the password required to log in to Event Stream Processor (see espAuthType).

Includes an "encrypted" attribute indicating whether the Password value is encrypted. The default value is false. If set to true, the password value is decrypted using RSAKeyStore and RSAKeyStorePassword.

AuthType

Type: string

(Required) Method used to authenticate to the Event Stream Processor. Valid values are:
  • server_rsa – RSA authentication using keystore
  • kerberos – Kerberos authentication using ticket-based authentication
  • user_password – LDAP, SAP BI, and Native OS (user name/password) authentication

If the adapter is operated as a Studio plug-in, AuthType is overridden by the Authentication Mode Studio start-up parameter.

RSAKeyStore

Type: string

(Dependent required) Specifies the location of the RSA keystore, and decrypts the password value. Required if AuthType is set to server_rsa, or the encrypted attribute for Password is set to true, or both.

RSAKeyStorePassword

Type:string

(Dependent required) Specifies the keystore password, and decrypts the password value. Required if AuthType is set to server_rsa, or the encrypted attribute for Password is set to true, or both.

KerberosKDC

Type: string

(Dependent required) Specifies host name of Kerberos key distribution center. Required if AuthType is set to kerberos.

KerberosRealm

Type: string

(Dependent required) Specifies the Kerberos realm setting. Required if AuthType is set to kerberos.

KerberosService

Type: string

(Dependent required) Specifies the Kerberos principal name that identifies an Event Stream Processor cluster. Required if AuthType is set to kerberos.

KerberosTicketCache

Type: string

(Dependent required) Specifies the location of the Kerberos ticket cache file. Required if AuthType is set to kerberos.

EncryptionAlgorithm

Type: string

(Optional) Used when the encrypted attribute for Password is set to true. If left blank, RSA is used as default.

Related reference
Adapter Support for Schema Discovery