Aggregation functions typically require the server to iterate over every element in a group. For this reason, the performance of the aggregation operator is inversely proportional to the size of the group.
Aggregation functions can be used in a SELECT statement along with a GROUP BY clause or over event caches in SPLASH inside UDFs and FLEX operators.
For the SUM, COUNT, AVG, and valueInserted aggregation functions, the server can perform additive optimization, where the function executes in constant time. In such cases, the time it takes to perform an operation is the same regardless of group size.
In a SELECT statement, the server can perform additive optimization provided functions eligible for optimization are used in all values being selected, with the exception of the columns referenced in the GROUP BY clause.
The following SELECT is optimized for additive optimization since all non-GROUP BY columns (name, counter, summary) only use additive aggregation functions (that is, valueInserted, SUM, and COUNT).
CREATE OUTPUT WINDOW AggResult SCHEMA (id INTEGER, name STRING, counter INTEGER, summary FLOAT) PRIMARY KEY DEDUCED AS SELECT BaseInput.intData_1 AS id, valueInserted(BaseInput.strData_1) AS name, count(BaseInput.intData_1) AS counter, sum(BaseInput.dblData_1) AS summary FROM BaseInput GROUP BY BaseInput.intData_1 ;
The following SELECT is not optimized for additive optimization since one of the non-GROUP BY columns (name) directly selects a column which cannot be computed additively.
CREATE OUTPUT WINDOW AggResult SCHEMA (id INTEGER, name STRING, counter INTEGER, summary FLOAT) PRIMARY KEY DEDUCED AS SELECT BaseInput.intData_1 AS id, BaseInput.strData_1 AS name, count(BaseInput.intData_1) AS counter, sum(BaseInput.dblData_1) AS summary FROM BaseInput GROUP BY BaseInput.intData_1 ;
When applying aggregation functions over an event cache, additive optimization is turned on when using the SUM, COUNT, AVG, or valueInserted functions only in the ON clause of a FLEX operator. The additive optimization does not apply when functions are used inside a UDF.
The following Flex stream computes the SUM in the ON clause additively, since the SUM function is computed additively and the used EventCaches (e0,e1) are declared locally.
CREATE INPUT WINDOW In1 SCHEMA (c1 INTEGER, c2 STRING, c3 INTEGER, summary FLOAT) PRIMARY KEY (c1, c2); CREATE FLEX MyFlex IN In1 OUT OUTPUT WINDOW FlexOut SCHEMA (c1 INTEGER, c2 INTEGER, c3 INTEGER, c4 INTEGER) PRIMARY KEY (c1, c2) BEGIN declare eventCache(In1, coalesce) e0; eventCache(In1, coalesce) e1; end; ON In1 { { output setOpcode([c1=In1.c1;c2=In1.c2;|c3=sum(e0.c1);c4=sum(e1.c3);],getOpcode(In1)); } }; END;
The following Flex stream is not computed additively , since the STDDEV function cannot be computed additively.
CREATE INPUT WINDOW In1 SCHEMA (c1 INTEGER, c2 STRING, c3 INTEGER) PRIMARY KEY (c1, c2); CREATE FLEX MyFlex IN In1 OUT OUTPUT WINDOW FlexOut SCHEMA (c1 INTEGER, c2 INTEGER, c3 INTEGER, c4 FLOAT) PRIMARY KEY (c1, c2) BEGIN declare eventCache(In1, coalesce) e0; eventCache(In1, coalesce) e1; end; ON In1 { { output setOpcode([c1=In1.c1;c2=In1.c2;|c3=sum(e0.c1);c4=stddev(e1.c3);],getOpcode(In1)); } }; END;
Another restriction is that additive optimizations are disabled when functions are used inside nonlinear statements (if, while, for, and case statements). To enable additive optimizations when using a function within a nonlinear statement, assign the result of the function to a variable outside of the statement. Then use the variable inside the nonlinear statement.
The following SELECT is not optimized for additive optimization since one of the expressions (CASE) in the SELECT list is a nonlinear expression.
CREATE OUTPUT WINDOW AggResult SCHEMA (id INTEGER, name STRING, counter INTEGER, summary FLOAT) PRIMARY KEY DEDUCED AS SELECT BaseInput.intData_1 AS id, valueInserted(BaseInput.strData_1) AS name, CASE WHEN (count(BaseInput.intDATA_1) < 100) THEN 0 ELSE 1 END AS counter, sum(BaseInput.dblData_1) AS summary FROM BaseInput GROUP BY BaseInput.intData_1 ;