Using a Reference Table Query in a Join

This example shows the procedure for creating a reference table query, an input stream, and an output stream that implements a join of data from the reference table query and the input stream to add customer data to an incoming order.

  1. Obtain the necessary information about the external database table containing the information you wish to look up.
    • the name of the table containing the data
    • the schema for that table
    • the service to use to connect to it
  2. Create the reference table query in your project.

    For example, for a table named customerAddr, with an integer column, customerID, a string column, fullName, and a string column, address, that can be accessed by a service named databaseLookup, create a reference table query named customerRef:

    CREATE REFERENCE customerRef 
    SCHEMA (customerID integer, fullName string, address string)
    PRIMARY KEY (customerID)
    PROPERTIES service='databaseLookup',source='customerAddr',sourceSchema='addressSchema';

    The schema of the reference does not have to include all of the columns in the source table or list them in the same order. But, for each column that you do specify in the schema, the column name must match the column name in the database, and the datatype must be compatible with the datatype in the database.

    The customerID column is the primary key in the source table: it contains a unique number assigned to each customer. A primary key is not required to produce a stream: only a window. But, if you declare a primary key for the reference it must match the primary key in the table. So, whenever the table has a primary key, it is a good practice to make it the primary key of the reference table query (giving you the flexibility to produce either a stream or a window).

    When you specify a primary key in a reference table query, you must also include the sourceSchema property, which identifies the schema, in this case addressSchema, of the database containing the table.

  3. Create the input stream for events to enter the project.

    For an event which is an order placed by a customer, consisting of three integers, named orderID, customerID, and itemID, create an input stream named orderStream.

    CREATE INPUT STREAM orderStream
    SCHEMA (orderID integer, customerID integer, itemID integer);
  4. Create an output stream that implements a join of the incoming event with the data obtained from the reference table query.

    To create an output stream named orderWithCustomerInfoStream with three columns: the integer orderID, the string customerName, and the string customerAddress, the output stream joins the orderID from the orderStream stream, with the fullname and address from the customerRef reference table query when the customerID in the orderStream stream matches the customerID in the customerRef reference table query.

    CREATE OUTPUT STREAM orderWithCustomerInfoStream
    SCHEMA (orderID integer, customerName string, customerAddress string)
    AS SELECT orderStream.orderID, customerRef.fullName, customerRef.address
    FROM orderStream, customerRef
    WHERE orderStream.customerID=customerRef.customerID;

    While the column names in the output stream's schema declaration, customerName and customerAddress, do not have to match the column names in the table, fullName and address, the datatypes in the output stream's schema must match the datatypes in the table.