KEEP clause

Specifies a window policy in a Create window statement or in the From clause of a Query Statement, Database statement, or Remote Procedure statement.

Syntax

time_policy | count_policy
Components

time_policy

Specify that the window maintain rows based on time. See time_policy for more information.

count_policy

Specify that the window maintain rows based on count. See count_policy for more information.

time_policy

KEEP { { [EVERY] interval [OFFSET BY interval] } | FOR interval_col | UNTIL times_list } [ PER column [...] | UNGROUPED ]
Components

interval

An Interval literal specifying the maximum age of rows in the window or the amount of time to shift the starting point for the time calculation.

interval_col

The name of a column of type Interval.

times_list

A list of times indicating when the window should be emptied. See times_list for more information.

column

The name of a column.

times_list

time_spec | (time_spec [, ...])
Component

time_spec

A time specification. See time_spec for more information.

time_spec

' [ SUN | MON | TUE | WED | THU | FRI | SAT ] hour : minute [ : second [. fraction ]] [timezone]'
Components

hour

A value from 0 to 23 indicating the hour of the day. Must be preceded by at least one space.

minute

A value from 0 to 59 indicating the minute.

second

A value from 0 to 59 indicating the second.

fraction

A value from 0 to 999999 indicating the fraction of a second.

timezone

A string representing the time zone. If omitted, assumes the local time zone. See "Sybase CEP Time Zone Database" in the Sybase CEP Integration Guide for more information about valid time zone strings. Must be preceded by at least one space.

count_policy

KEEP { [EVERY] count BUCKETS { BY column } [...] } | { { [EVERY] count ROW[S] } | { LAST [ROW] } | { count { LARGEST | SMALLEST } [DISTINCT] ROW[S] { BY column } [...] } | { ALL [ROW[S]] } [ { PER column } [...] | UNGROUPED ] }
Components

count

The number of rows or buckets to maintain.

column

The name of a column.

Usage

The KEEP clause defines a policy for a named or unnamed window. Named windows are created with a Create Window statement, while unnamed windows are defined in the FROM clause.

The window definition determines the manner in which the window keeps state. Rows are retained and removed from the window based on the window definition. Note that, when a window serves as a data source for a query in a query chain, the window's policy has no effect on which rows are propagated down the chain, or on how quickly the rows are propagated. Window policies keep information about the state of rows, but do not delay them, or prevent them from traveling down the chain. Window policies also do not prevent removed or discarded rows from propagating down the chain if the rows are not filtered out by other clauses in the query.

Window policies include time-based policies and count-based policies:

In most cases, a window definition can include a single count-based policy, a single time-based policy, or one of each. However, the count-based BUCKETS window policy cannot be combined with a time-based policy.

Both count-based and time-based policies can be sliding or jumping:

The following table describes the different types of window policies:

Window Type

Basic Syntax

Behavior

Sliding Time-Based

KEEP interval

or

KEEP FOR interval_col

Each new row arriving in the window is retained for a specific length of time and is then removed, regardless of the number of other rows contained in the window, or of the other rows' arrival time.

In the case of KEEP interval, every arriving row is retained for the same interval, specified by constant-interval-expression, for example 2 MINUTES.

In the case of KEEP FOR interval-col, every arriving row is retained for the interval specified in its interval-col column. Since the intervals specified in this column can be different for different rows, rows can be retained for different time intervals.

Jumping Time-Based

KEEP EVERY interval

or

KEEP UNTIL times-list

In the case of KEEP EVERY interval, time is subdivided into interval segments. During each interval, the window accumulates whatever rows arrive. At the end of the interval, all rows held in the window are removed and the process starts over.

In the case of KEEP UNTIL times-list, a comma-separated list of one or more times is specified in the window policy. The window accumulates whatever rows arrive until the first applicable time in times-list, then removes all rows and starts over. Components of times-list can include STRING literals and STRING type parameters. Times listed in times-list can include the day of the week. If no day of the week is specified, rows are removed at the indicated time daily. The listed times can also specify a time zone. If no time zone is specified, rows are removed according to local time.

Note:

Time changes, such as the change between standard and daylight savings time, may cause undesired results in the behavior of KEEP UNTIL windows when the time change occurs. To avoid this problem, Sybase recommends omitting times of the day or week that coincide with the time change from your time list.

Sliding Count-Based

KEEP count ROW

or

KEEP count ROWS

or

KEEP LAST ROW

A variation of sliding count-based windows can also be defined to keep buckets of rows, instead of individual rows See WINDOW BUCKETS for more information.

The window accumulates arriving rows until it reaches the maximum specified by count or the word LAST (KEEP LAST ROW is the equivalent of KEEP 1 ROW).

Once the maximum is reached, every new row arriving in the window displaces one of the other rows already held in the window. When no additional syntax is used, the displaced row is the oldest row held by the window. This default behavior of the sliding count-based window using the count (but not the KEEP LAST) syntax can be changed with the use of the LARGEST or SMALLEST keyword, as described later.

Jumping Count-Based

KEEP EVERY count ROW

or

KEEP EVERY count ROWS

A variation of jumping count-based windows can also be defined to keep buckets of rows instead of individual rows. See WINDOW BUCKETS for more information.

The window accumulates arriving rows until it reaches the maximum specified by integer-expression. When the next row arrives, all older rows held in the window are removed, and the window begins accumulating rows again until it reaches the maximum.

Multi-Policy

One count-based policy and one time-based policy, as described above.

The window retains and removes rows according to both policies.

Keep All

KEEP ALL

or

KEEP ALL ROWS

All rows arriving in the window are retained. No rows are removed from the window.

Warning!  

Because a Keep All window retains all rows that arrive in the stream, it is potentially very costly in terms of resources, and may eventually use up all available memory on your system. Sybase recommends using this type of window only when you are sure that doing so will not exceed your allocated resources, or when you have provided a means of explicitly deleting rows from the window with a Delete statement.

Window Buckets

Count-based windows can retain a specified number of groups of rows (called buckets), instead of the specified number of individual rows. These types of windows are defined using the BUCKETS keyword, and include one or more BY subclauses, each of which includes a column reference corresponding to the window's schema. Rows that contain the same values in all of the columns identified by the BY subclauses are grouped into a single bucket. The window policy determines how many buckets the window can hold.

Every arriving row is placed into a bucket, based on the values it contains in the columns named in the BY subclauses. If a bucket for the row does not already exist, a new bucket is created.

Syntax

Behavior

KEEP count BUCKETS BY column [ BY column] [ ...]

The window accumulates buckets up to the maximum specified by the count. Once the maximum number of buckets is reached, every new bucket created in the window displaces the oldest existing bucket.

KEEP EVERY count BUCKETS BY column [ BY column] [ ...]

The window creates buckets up to the maximum specified by the count. Once the maximum number of buckets is reached and a subsequent new bucket is created, all existing buckets held in the window are removed, and the window begins accumulating buckets again until it reaches the maximum.

A window policy that uses buckets cannot be used in the same window with a time-based window policy.

Keeping Largest or Smallest Rows

By default, sliding count-based windows retain the newest rows, and remove the oldest. For windows that do not use buckets, this behavior can be modified with the addition of the LARGEST or SMALLEST keyword, with or without the DISTINCT keyword, which causes the window to keep only rows with the largest or smallest available values, as described below. At least one BY subclause is required with this syntax, and multiple BY subclauses can be used.

Syntax

Behavior

... count LARGEST ROWS...

Initially the window accumulates rows until it reaches the maximum specified by count. A new row arriving in the window after the window is full is evaluated for the value of col1, where col1 is the column specified in the first BY subclause. If this value is larger than at least one other row already in the window, the new row is retained, and the row with the smallest col1 value is removed. If the incoming row's col1 is smaller than col1 of all the current rows, the incoming row is expired.

If col1 is the same for both the new row and all the other rows in the window, and a second BY subclause is specified, the same comparison is performed on the basis of the column specified in the second BY subclause. If the values in the second column are also the same for all rows in the window and the incoming row, the same comparison is performed for the next BY subclause, if one exists, and so on in order of BY subclauses from left to right, until either a value differential is found, or no more BY subclauses remain.

If, at any point, the value in the appropriate column of the incoming row is determined to be smaller than the corresponding value of all the rows in the window, the incoming row is discarded. If no differences in value are found at the end of this procedure, however, the oldest row is removed.

... count LARGEST DISTINCT ROWS...

A new row arriving in the window is checked against the existing rows to determine if the new row is distinct from all the others. Distinction is determined by comparing the values of all columns specified by all BY subclauses in the window policy for all the current rows in the window and the incoming row. If two rows contain the same values in all these columns, the rows are not distinct.

If the incoming row is not distinct from another row in the window, the new row is retained, and the older row is removed. Initially the window accumulates distinct rows until it reaches the maximum specified by count. If a new row is distinct, the value of its first BY subclause column is compared to all the other rows in the window. If any rows have a smaller value in this column than the incoming row, the new row is retained, and the row with the smallest value is removed. If the incoming row has a smaller value than any of the existing rows, it is discarded.

If the window policy has multiple BY subclauses and the incoming row and all the current rows have the same value in the column indicated by the first BY subclause, the column specified by the second BY subclause is evaluated for the row with the smallest value, and so on through any other subsequent BY subclauses, until a row with a different and smaller value is found.

... count SMALLEST ROWS...

Initially the window accumulates rows until it reaches the maximum specified by count. A new row arriving in the window after the window is full is evaluated for the value of col1, where col1 is the column specified in the first BY subclause. If this value is smaller than at least one other row already in the window, the new row is retained, and the row with the largest col1 value is removed. If the incoming row's col1 is larger than the value of col1 in all the current rows, the incoming row is discarded.

If col1 is the same for both the new row and all the other rows in the window, and a second BY subclause is specified, the same comparison is performed on the basis of the column specified in the second BY subclause. If the values in the second column are also the same for all rows in the window and the incoming row, the same comparison is performed for the next BY subclause, if one exists, and so on in order of BY subclauses from left to right, until either a value differential is found, or no more BY subclauses remain.

If, at any point, the value in the appropriate column of the incoming row is determined to be larger than the corresponding value of all the rows in the window, the incoming row is discarded. If no differences in value are found at the end of this procedure, however, the oldest row is removed.

... count SMALLEST DISTINCT ROWS...

A new row arriving in the window is checked against the existing rows to determine if the new row is distinct from all the others. Distinction is determined by comparing the values of all columns specified by all BY subclauses in the window policy for all the current rows in the window and the incoming row. If two rows contain the same values in all these columns, the rows are not distinct.

If the incoming row is not distinct from another row in the window, the new row is retained, and the older row is removed. Initially the window accumulates distinct rows until it reaches the maximum specified by count. If a new row is distinct, the value of its first BY subclause column is compared to all the other rows in the window. If any rows have a larger value in this column than the incoming row, the new row is retained, and the row with the smallest value is removed. If the incoming row has a larger value than any of the existing rows, it is discarded.

If the window policy has multiple BY subclauses and the incoming row and all the current rows have the same value in the column indicated by the first BY subclause, the column specified by the second BY subclause is evaluated for the row with the largest value, and so on through any other subsequent BY subclauses, until a row with a different and larger value is found.

Effect of PER, GROUP BY, and UNGROUPED

If a window policy includes one or more PER subclauses, a separate window is effectively created for every unique combination of values found in the combination of column references listed by the PER subclauses. For example, the following window policy keeps ten rows for every unique combination of values found in Col1 and Col2:

KEEP 10 ROWS PER Col1 PER Col2

Additionally, when a GROUP BY clause is used in the same query as an unnamed window definition, the GROUP BY clause creates implicit PER clauses on the window policy, based on the column references contained in the GROUP BY. For example, the following two code snippets create windows that behave the same:

GROUP BY Symbol, Broker, Price
FROM Stream1 KEEP 10 ROWS
FROM Stream1 KEEP 10 ROWS PER Symbol PER Broker PER Price 

When the query includes both a GROUP BY clause and an explicit PER clause, however, the explicit PER clause supersedes the implicit PER created by the GROUP BY. For example, the following query causes the window policy to retain ten rows per each unique value in the Broker column, not per the values in the Symbol column:

INSERT INTO OutStream
SELECT MAX(Price)
FROM Stream1 KEEP 10 ROWS PER Broker
GROUP BY Symbol;

The effect of the GROUP BY clause on an unnamed window can also be overridden by the addition of the UNGROUPED keyword, which causes the window policy to retain rows according to the specified number or rows or interval, disregarding the columns specified in the GROUP BY clause. For example, the following query keeps only one set of 50 rows, regardless of the values in the Symbol column:

INSERT INTO OutStream
SELECT AVG(Price)
FROM Stream1 KEEP 50 ROWS UNGROUPED
GROUP BY Symbol;

In most cases, the use of the PER, GROUP BY, or UNGROUPED clauses has no effect on the retention behavior of time-based window policies, and on the KEEP ALL window, since time-based policies retain rows for a specified period of time or until a specified time, regardless of their grouping, and KEEP ALL retains all rows regardless of grouping. However, the use of PER, GROUP BY, or UNGROUPED does affect the output of the FIRST and LAST functions, which retrieve the first or last row of the current group (instead of the contents of the window) when a PER or GROUP BY clause is present.

See Also

Examples

The following jumping count-based named window definition keeps every 15 rows:

CREATE WINDOW Window1 
SCHEMA (Col1 INTEGER, Col2 STRING) 
KEEP EVERY 15 ROWS;

The following time-based jumping window offsets the beginning of its interval calculation by six minutes:

INSERT INTO OutStream 
SELECT * 
FROM Stream1 KEEP EVERY 60 MINUTES OFFSET BY 6 MINUTES;

This example keeps the three distinct rows with the largest value in the Price column for every stock symbol listed in the StockSymbol column:

INSERT INTO OutStream
SELECT *
FROM Trades KEEP 3 LARGEST DISTINCT ROWS BY Trades.Price 
     PER Trades.StockSymbol;

This window creates a separate bucket for every arriving row with a distinct stock symbol. Rows with stock symbols for which buckets have already been created are placed into those buckets. After the third bucket is created, any new bucket displaces the oldest bucket in the window:

CREATE WINDOW NewestSymbols SCHEMA StockSchema
KEEP 3 BUCKETS BY StockSymbol;

This window expires rows at 5:00pm every weekday:

INSERT INTO OutStream
SELECT *
FROM Stream2
KEEP UNTIL ('MON 17:00:00', 'TUE 17:00:00', 'WED 17:00:00', 'THU 17:00:00', 'FRI 17:00:00');