Detecting Source Changes

Learn how to capture changes in the source table, including new source records, modified source records, and records that have been deleted from the source. Methods can be combined to detect different types of data changes in one step.

In the case study scenario, the source database does not contain any change log information, so source and the target content must be compared to detect any changes. Since, in most cases, the source and target objects do not reside in the same database, a heterogeneous join needs to be performed.

Detecting New Source Records

Records that are added to the source after the last update of the target dimension table do not have a corresponding current record in the target dimension table.
  1. All source records are read using an appropriate Data Provider component.

    Note:

    All attributes that are transferred to the target dimension table are read, although only the key attribute is required for detection.

  2. The existence of a corresponding current record in the target dimension is checked for each source record based on the source key attribute.
    1. Choose an appropriate Lookup component.

      To perform calculations on the data to be transferred, consider using the lookup functionality of the Data Calculator component.

    2. Select the lookup data from target. As this is a simple existence check, only the original source keys are needed from all current target records. However, lookups always return a value for a key, so you must also select an appropriate return value.

    3. Add an attribute to the port structure to populate the lookup result. The lookup result determines whether a source record is newly added or existed previously. This attribute indicates the data state and allows data to be filtered in the next step of the transformation process. See Modifying Port Structures. The new attribute is selected as the value attribute of a Lookup component or the output attribute in the Data Calculator.

    4. Set an appropriate lookup default value. The default value is returned by the lookup for nonexisting keys. To ensure that the lookup value correctly indicates the existence of records, set it to a constant that is different from all lookup values for any existing keys.

      Example:
      • Source data – select Key, Name, Price from PRODUCT

      • Lookup data – select Key, '1' from PRODUCT_PRICE where Valid_To = '9999-12-31'

      • Value attribute – Exists (integer)

      • Default value – 0

      • Performing this lookup results in records with attributes Key, Name, Price, Exists. The value of the Exists attribute will be 1(lookup value) for all records existing in target, and 0 (default value) for all nonexisting records.

Detecting Modified Source Records

Records that are modified in the source after the last update of the target dimension table have a corresponding current record in the target dimension table, but the relevant values are changed.
  1. All source records are read using an appropriate Data Provider component.

    Note:

    All attributes that are transferred to the target dimension table are read, although only the key attribute is required for detection.

  2. The existence of a corresponding current record in the target dimension table is checked for each source record based on the source key attribute, and the values are compared.
    1. Choose an appropriate Lookup component.

      Use the Data Calculator component to look up multiple values for a single key attribute and perform comparisons.

    2. Select the lookup data from the target. The key attributes and all values to be compared for all current target records are read using an appropriate Data Provider component.

    3. Add an attribute to the port structure to populate the additional target key attribute. The lookup result determines whether a source record has been modified or is either new or unchanged. This attribute indicates the data state and allows data to be filtered in the next step of the transformation process. The update operation on the target must uniquely identify the current record, thus the date part of the target key needs to be populated as well. See Modifying Port Structures.

    4. Set an appropriate lookup default value. The default value is returned by the lookup for nonexisting keys. To ensure that the lookup value correctly indicates the existence of records, set it to a constant that is different from all lookup values for the existing keys.

    5. Look up all necessary target values. The first lookup uses the new target key attribute as the output attribute in the Data Calculator, thus indicating existence. The values to be compared are read into temporary variables.

    6. Compare source and target attribute values. The target key attribute is recalculated, based on a value comparison for the existing records.

      Example:
      • Source data – select Key, Name, Price from PRODUCT

      • Lookup data – select Key, Valid_From, Price from PRODUCT_PRICE where Valid_To='9999-12-31'

      First check existence by reading the effective date from target:
      • Output attribute – Valid_From

      • Default value – 0

      Read current target price into temporary variable:
      • Output Attribute – Tmp_Price

      • Default Value – 0

      If a current target record exists (Valid_From is not 0), compare Price and Tmp_Price. Recalculate Valid_From to 0 if the Price has not changed. Performing these lookups and calculations results in records with Key, Name, Price, and Valid_From attributes. Valid_From either contains the effective date of the target record to be updated or contains 0, indicating new and unchanged records.

Detecting Deleted Source Records

Records that have been deleted from the source after the last update of the target dimension table still have a corresponding current record in the target dimension table.
  1. Key attributes of all current records in the target dimension table are read using an appropriate Data Provider component.

  2. The existence of a corresponding record in the source is checked for each current target record based on the source key attribute.
    1. Choose an appropriate Lookup component. See Lookup components.

      If your source data does not reside in a database, use the lookup functionality of the Data Calculator component.

    2. Select the lookup data from source. As this is a simple existence check, only the source keys are needed from all source records. However, lookups always return a value for a key, so you must select an appropriate return value as well.

    3. Add an attribute to the port structure to populate the lookup result. The lookup result determines whether a source record has been deleted. This attribute indicates the data state and allows the data to be filtered in the next step of the transformation process. The new attribute is selected as the value attribute of a Lookup component or the output attribute of a Data Calculator rule. See Modifying Port Structures.

    4. Set an appropriate lookup default value. The default value is returned by the lookup for nonexisting keys. To ensure that the lookup value correctly indicates the existence of records, set it to a constant that is different from all lookup values for the existing keys.

      Example:
      • Target data – select Key, Valid_From from PRODUCT_PRICE where Valid_To='9999-12-31'

      • Lookup data – select Key, '0' from PRODUCT

      • Value attribute – Removed (integer)

      • Default value – 1

      Performing this lookup results in records with attributes Key, Valid_From, Removed. The value of the Removed attribute will be 0 (lookup value) for all existing source records and 1 (default value) for all nonexisting records.

Alternatives

  • If the source is a database that provides ascending indicator for insertions, updates, or deletions (like autoincrements, modification dates, and so forth), the DB Data Provider Index Load component can be used to read records changed since the last load only.

  • Use a Staging component to load relevant data from both the source and target to the same database. New, modified, and deleted records are then detected by extracting data from the stage using a full outer join.

Related concepts
Data Calculator JavaScript
DB Data Provider Index Load
DB Staging
Related tasks
Modifying Port Structures
Related reference
Source Components
Lookup Components