PRIMARY KEY Clause

Specifies the primary key for a delta stream or window.

Syntax

PRIMARY KEY (column [,...]) | PRIMARY KEY DEDUCED

Components

column The name of a column in the element's schema

Usage

A primary key uniquely identifies a record, and is required for windows and delta streams.

The primary key is normally treated as "strict." Any records that violate consistency rules, such as an insert of an existing record, or update or delete for a nonexistent record, are discarded and reported in the log.

The primary key is treated as "lax" when a keep policy is placed on a window. The expiration of records caused by the KEEP clause creates inconsistencies with incoming records. An insert on an existing record is treated as an update, and an update on a nonexistent record is treated as an insert. A delete on a nonexistent record is silently ignored (as safedelete). This behavior manifests when two records in a chain have expiry policies, and it is apparent that the target window has a smaller expiry period.

Usage: Explicit Primary Key

An explicitly defined primary key uses the PRIMARY KEY clause and refers to one or more columns of the window or delta stream's schema. When a primary key is specified, the engine enforces the constraint, and erroneous operations are flagged as bad records and discarded at runtime. To avoid this issue, ensure the primary key is defined correctly.

Usage: Deduced Primary Key

If the primary key is specified as PRIMARY KEY DEDUCED, the compiler automatically deduces the primary key. If the primary key cannot be deduced, a compilation error is generated.

The primary key is deduced as follows:
  • Primary keys cannot be deduced for input windows and Flex operators. They need to be explicitly specified.
  • For single source queries, except aggregations, the primary key is deduced from the source. All the key columns from the source need to be copied verbatim for the key deduction to succeed.
  • For aggregation the primary keys are the columns in the projection containing the group by expressions.
    Note: All GROUP BY clauses needs to be included in the projection list. If the same expression appears in more than one column then the first column with the GROUP BY clause is made the primary key.
For joins, the following rules apply:
  • For a left outer join and right outer join the keys are derived from the outer side. For example, the left side in the case of a left join and the right side in the case of a right join. All key columns from the outer side must be present in the projection for the primary key deduction to work correctly.
  • For a inner join it depends on the cardinality of the join. For a one-many cardinality the key is derived from the many side. For a many-many cardinality the deduced key is combination of the keys from both sides of a join. For a one-one the key is deduced from one of the sides. The side that is chosen as a key cannot be reliably determined. In all cases the candidate key columns must be copied from the sources directly for key deduction to work correctly.
  • For a full outer join the columns containing only a coalesce() function with the key fields of both sides of the join as arguments is deduced to be the key column.
  • For the joins of multiple windows, these rules are applied transitively