Retention

A retention policy specifies the maximum number of rows or the maximum period of time that data are retained in a window.

In CCL, you can specify a retention policy when defining a Window. You can also create an Unnamed Window by specifying a retention policy on a Window or Delta Stream when it is used as a source to another element.

Retention is specified through the KEEP clause. You can limit the number of records in a window based on either the number, or age, of records in the window. These methods are referred to as count-based retention and time-based retention, respectively. Or, you can use the ALL modifier to explicitly specify that the window should retain all records.
Note: If you do not specify a retention policy, the window retains all records. This can be dangerous: the window can keep growing until all memory is used and the system shuts down. The only time you should have a window without a KEEP clause is if you know that the window size will be limited by incoming delete events.

Including the EVERY modifier in the KEEP clause produces a Jumping Window, which deletes all of the retained rows when the time interval expires or a row arrives that would exceed the maximum number of rows.

Specifying the KEEP clause with no modifier produces a Sliding Window, which deletes individual rows once a maximum age is reached or the maximum number of rows are retained.

Note: You can specify retention on input windows (or windows where data is copied directly from its source) using either log file-based stores or memory-based stores.  For other windows, you can only specify retention on windows with memory-based stores

Count-based Retention

In a count-based policy, a constant integer specifies the maximum number of rows retained in the window. You can use parameters in the count expression.

A count-based policy also defines an optional SLACK value, which can enhance performance by requiring less frequent cleaning of memory stores. A SLACK value accomplishes this by ensuring that there are no more than N + S rows in the window, where N is the retention size and S is the SLACK value. When the window reaches N + S rows, the system purges S rows. The larger the SLACK value, the better the performance, since there is less cleaning required.
Note: The SLACK value cannot be used with the EVERY modifier, and thus cannot be used in a Jumping Windows retention policy.

The default value for SLACK is 1, which means that after the window reaches the maximum number of records, every new record inserted deletes the oldest record. This causes a significant impact on performance. Larger slack value s improve performance by reducing the need to constantly delete rows.

Count-based retention policies can also support retention based on content/column values using the PER sub-clause. A PER sub-clause can contain an individual column or a comma-delimited list of columns. A column can only be used once in a PER sub-clause. Specifying the primary key or autogenerate columns as a column in the PER sub-clause will result in a compiler warning. This is because these are unique entities for which multiple values cannot be retained.

The following example creates a Sliding Window that retains the most recent 100 records that match the filter condition. Once there are 100 records in the window, the arrival of a new record causes the deletion of the oldest record in the window.

CREATE WINDOW Last100Trades PRIMARY KEY DEDUCED 
KEEP 100 ROWS
AS SELECT * FROM Trades
WHERE Trades.Volume > 1000;

Adding the SLACK value of 10 means the window may contain as many as 110 records before any records are deleted.

CREATE WINDOW Last100Trades PRIMARY KEY DEDUCED 
KEEP 100 ROWS SLACK 10
AS SELECT * FROM Trades
WHERE Trades.Volume > 1000;

This example creates a Jumping Window named TotalCost from the source stream Trades. This window will retain a maximum of ten rows, and delete all ten retained rows on the arrival of a new row.

CREATE WINDOW TotalCost
PRIMARY KEY DEDUCTED
AS SELECT 
					trd.*,
					trd.Price * trd.Size TotalCst
FROM Trades trd
KEEP EVERY 10 ROWS;

The following example creates a sliding window that retains 2 rows for each unique value of Symbol. Once 2 records have been stored for any unique Symbol value, arrival of a third record (with the same Symbol value) will result in deletion of the oldest stored record with the same Symbol value.

CREATE SCHEMA TradesSchema (
        Id integer,
        TradeTime date,
        Venue string,
        Symbol string,
        Price float,
        Shares integer )
;

CREATE INPUT WINDOW TradesWin1
    SCHEMA TradesSchema
    PRIMARY KEY(Id)
    KEEP 2 ROWS PER(Symbol)
;

Time-based Retention

In a Sliding Windows time-based policy, a constant interval expression specifies the maximum age of the rows retained in the window. In a Jumping Window time-based retention policy, all the rows produced in the specified time interval are deleted after the interval has expired.

The following example creates a Sliding Window that retains each record received for ten minutes. As each individual row exceeds the ten minute retention time limit, it is deleted.

CREATE WINDOW RecentPositions PRIMARY KEY DEDUCED
KEEP 10 MINS
AS SELECT * FROM Positions;

This example creates a Jumping Window named Win1 that keeps every row that arrives within the 100 second interval. When the time interval expires, all of the rows retained are deleted.

CREATE WINDOW Win1
PRIMARY KEY DEDUCED
AS SELECT * FROM Source1
KEEP EVERY 100 SECONDS;

The PER sub-clause supports content-based data retention, wherein data is retained for a specific time period (specified by an interval) for each unique column value/combination. A PER sub-clause can contain a single column or a comma-delimited list of columns, but you can use each column only once in the same PER clause.

Note: Time based windows retain data for a specified time regardless of their grouping.

The following example creates a jumping window that retains 5 seconds worth of data for each unique value of Symbol.

CREATE SCHEMA TradesSchema (
        Id integer,
        TradeTime date,
        Venue string,
        Symbol string,
        Price float,
        Shares integer )
;

CREATE INPUT WINDOW TradesWin2
    SCHEMA TradesSchema
    PRIMARY KEY(Id)
    KEEP EVERY 5 SECONDS PER(Symbol)
;

Retention Semantics

When the insertion of one or more new rows into a window triggers deletion of preexisting rows (due to retention), the window propagates the inserted and deleted rows downstream to relevant streams and subscribers. However, the inserted rows are placed before the deleted rows, since the inserts trigger the deletes.