Configure the File/Hadoop XML Input adapter by specifying values for the ESP connector, formatter, and transporter modules in the adapter configuration file.
XML Element | Description |
---|---|
Log4jProperty |
Type: string (Optional) Specify a full path to the log4j.properties logging file you wish to use. The default value is $ESP_HOME/adapters/framework/config/log4j.properties. |
The File Input transporter reads data from local files, wraps the data with string, and sends it to the next module specified in the adapter configuration file.
XML Element | Description |
---|---|
Module |
(Required) Element containing all information for this module. It contains a type attribute for specifying the module type. For example, transporter. |
InstanceName |
Type: string (Required) Instance name of the specific module you want to use. For example, MyInputTransporter. |
Name |
Type: string (Required) Name of the module as defined in the modulesdefine.xml file. For example, <TransporterType>InputTransporter. |
Next |
Type: string (Required) Instance name of the module that follows this one. |
BufferMaxSize |
Type: integer (Advanced) Capacity of the buffer queue between this module and the next. The default value is 10240. |
Parameters |
(Required) Element containing the FileInputTransporterParameters element. |
FileInputTransporterParameters |
(Required) Element containing elements for the File Input transporter. |
Dir |
Type: string (Required) Specify the absolute path to the data files which you want the adapter to read. For example, <username>/<foldername>. No default value. To use Hadoop system files, use an HDFS folder uri instead of a local file system folder. For example, hdfs://<hdfsserver>:9000/<foldername>/<subfoldername>/<leaffoldername>. To use Hadoop, download the binaries for Hadoop version 1.2.1 from http://hadoop.apache.org. Copy the hadoop-core.jar file (for example, for version 1.2.1 hadoop-core-1.2.1.jar) to %ESP_HOME%\adapters\framework\libj. Ensure you use a stable version rather than a beta. Use a forward slash for both UNIX and Windows paths. |
File |
Type: string (Required) Specify the file you want the adapter to read or the regex pattern to filter the files on a given directory. See the DynamicMode element. No default value. |
AccessMode |
Type: string (Required) Specify an access mode:
|
DynamicMode |
Type: string (Advanced) Specify a dynamic mode:
An example regex pattern is ".*\.txt", which selects only files that end with ".txt". In regex patterns, you must include an escape character, "\", before meta chars to include them in the pattern string. |
PollingPeriod |
Type: integer (Advanced) Define the period, in seconds, to poll the specified file or directory. Set this element only if the value of the DynamicMode element is set to dynamicFile or dynamicPath. The default value is 0, which, along with all other values less than 0, turns off polling. |
RemoveAfterProcess |
Type: boolean (Optional) If this property is set to true, the file is removed from the directory after the adapter processes it. This element takes effect if the value of the DynamicMode element is set to dynamicPath and ignored if it is set to dynamicFile instead. The default value is false. |
ScanDepth |
Type: integer (Optional) Specify the depth of the schema discovery. The adapter reads the number of rows specified by this element value when discovering the input data schema. The default value is three. |
The XMLDoc Stream to ESP formatter parses XML format strings, extracts data according to the schema file specified in the adapter configuration file, and translates this data to AepRecord objects.
XML Element | Description |
---|---|
Module |
(Required) Element containing all information for this module. It contains a type attribute for specifying the module type. For example, formatter. |
InstanceName |
Type: string (Required) Instance name of the specific module you want to use. For example, MyInputTransporter. |
Name |
Type: string (Required) Name of the module as defined in the modulesdefine.xml file. For example, <TransporterType>InputTransporter. |
Next |
Type: string (Required) Instance name of the module that follows this one. |
BufferMaxSize |
Type: integer (Advanced) Capacity of the buffer queue between this module and the next. The default value is 10240. |
Parallel |
Type: boolean (Optional) If set to true, the module runs as a separated thread. The default value is true. |
Parameters |
(Required) Element containing the XmlDocStreamToEspFormatterParameters element. |
XmlDocStreamToEspFormatterParameters |
(Required) Element containing the XMLDoc Stream to ESP formatter elements. |
XmlElemMappingRowPattern |
Type: string (Required) Specify a pattern to indicate which XML elements in the XML doc are processed by the formatter. The matched elements are mapped to ESP rows whose attributes and child elements are mapped as columns of an ESP row. The adapter ignores any XML elements that do not match this pattern. This pattern is a subset of XPath expressions. The [/]?NCName[/NCName]* path expression is the only supported expression, where NCName (Non-Colonized Name) is the local name element without a prefix or namespace. If the elements in the path expression include a namespace URI (prefix), all of these elements belong to the same namespace. Provide the namespace in the XmlElemNamespace element. Here are some examples of valid path expressions:
|
XmlElemNamespace |
Type: string (Required) Specify the namespace URI for elements that appear in the pattern path expression. |
ColsMapping |
(Required) Element containing the Column element. |
Column |
Type: string (Required) Specify which attributes or child elements of the XML elements, which are matched by pattern path expression, to map to columns of the ESP row. For example, [<Column>XPath expression</Column>]+. The XPath expression is any valid XPath expression specified by an XPath specification. The XPath expression can begin only from the last XML element that appears in the path pattern expression or its decedent elements. The first <Column/> element value is mapped to the first column of an ESP row, the second <Column/> element value is mapped to the second column of an ESP row, and so on. |
DateFormat |
Type: string (Optional) Format string for parsing date values. For example, yyyy-MM-dd'T'HH:mm:ss. |
TimestampFormat |
Type: string (Optional) Format string for parsing timestamp values. For example, yyyy-MM-dd'T'HH:mm:ss.SSS. |
The ESP Publisher module obtains data from a transporter or formatter module and publishes it to an ESP project.
XML Element | Description |
---|---|
Module |
(Required) Element containing all information for this module. It contains a type attribute for specifying the module type. For example, formatter. |
InstanceName |
Type: string (Required) Instance name of the specific module you want to use. For example, MyInputTransporter. |
Name |
Type: string (Required) Name of the module as defined in the modulesdefine.xml file. For example, <TransporterType>InputTransporter. |
BufferMaxSize |
Type: integer (Advanced) Capacity of the buffer queue between this module and the next. The default value is 10240. |
Parameters |
(Required) Element containing the EspPublisherParameters element. |
EspPublisherParameters |
(Required) Element containing elements for the ESP publisher. |
ProjectName |
Type: string (Required if adapter is running in standalone mode; optional if it is running in managed mode) Name of the ESP project to which the adapter is connected. For example, EspProject2. This is the same project tag that you specify later in the adapter configuration file in the Name element within the Event Stream Processor (EspProjects) element. If you are starting the adapter with the ESP project to which it is attached (that is, running the adapter in managed mode), you need not set this element as the adapter automatically detects the project name. |
StreamName |
Type: string (Required if adapter is running in standalone mode; optional if it is running in managed mode) Name of the ESP stream to which the adapter publishes data. If you are starting the adapter with the ESP project to which it is attached (that is, running the adapter in managed mode), you need not set this element as the adapter automatically detects the stream name. |
MaxPubPoolSize |
Type: positive integer (Optional) Maximum size of the record pool. Record pooling, also referred to as block or batch publishing, allows for faster publication since there is less overall resource cost in publishing multiple records together, compared to publishing records individually. Record pooling is disabled if this value is 1. The default value is 256. |
MaxPubPoolTime |
Type: positive integer (Optional) Maximum period of time, in milliseconds, for which records are pooled before being published. If not set, pooling time is unlimited and the pooling strategy is governed by maxPubPoolSize. No default value. |
UseTransactions |
Type: boolean (Optional) If set to true, pooled messages are published to Event Stream Processor in transactions. If set to false, they are published in envelopes. The default value is false. |
SafeOps |
Type: boolean (Advanced) Converts the opcodes INSERT and UPDATE to UPSERT, and converts DELETE to SAFEDELETE. The default value is false. |
SkipDels |
Type: boolean (Advanced) Skips the rows with opcodes DELETE or SAFEDELETE. The default value is false. |
Event Stream Processor elements configure communication between Event Stream Processor and the File/Hadoop XML Input adapter.
XML Element | Description |
---|---|
EspProjects |
(Required) Element containing elements for connecting to Event Stream Processor. |
EspProject |
(Required) Element containing the Name and Uri elements. Specifies information for the ESP project to which the adapter is connected. |
Name |
Type: string (Required) Specifies the unique project tag of the ESP project which the EspConnector (publisher/subscriber) module references. |
Uri |
Type: string (Required) Specifies the total project URI to connect to the ESP project. For example, esp://localhost:19011/ws1/p1. |
Security |
(Required) Element containing all the authentication elements below. Specifies details for the authentication method used for Event Stream Processor. |
User |
Type: string (Required) Specifies the user name required to log in to Event Stream Processor (see AuthType). No default value. |
Password |
Type: string (Required) Specifies the password required to log in to Event Stream Processor (see espAuthType). Includes an "encrypted" attribute indicating whether the Password value is encrypted. The default value is false. If set to true, the password value is decrypted using RSAKeyStore and RSAKeyStorePassword. |
AuthType |
Type: string (Required) Method used to authenticate to the Event Stream Processor. Valid values are:
If the adapter is operated as a Studio plug-in, AuthType is overridden by the Authentication Mode Studio start-up parameter. |
RSAKeyStore |
Type: string (Dependent required) Specifies the location of the RSA keystore, and decrypts the password value. Required if AuthType is set to server_rsa, or the encrypted attribute for Password is set to true, or both. |
RSAKeyStorePassword |
Type:string (Dependent required) Specifies the keystore password, and decrypts the password value. Required if AuthType is set to server_rsa, or the encrypted attribute for Password is set to true, or both. |
KerberosKDC |
Type: string (Dependent required) Specifies host name of Kerberos key distribution center. Required if AuthType is set to kerberos. |
KerberosRealm |
Type: string (Dependent required) Specifies the Kerberos realm setting. Required if AuthType is set to kerberos. |
KerberosService |
Type: string (Dependent required) Specifies the Kerberos principal name that identifies an Event Stream Processor cluster. Required if AuthType is set to kerberos. |
KerberosTicketCache |
Type: string (Dependent required) Specifies the location of the Kerberos ticket cache file. Required if AuthType is set to kerberos. |
EncryptionAlgorithm |
Type: string (Optional) Used when the encrypted attribute for Password is set to true. If left blank, RSA is used as default. |